Report

Operant Reward Learning in Aplysia: Neuronal Correlates and Mechanisms

See allHide authors and affiliations

Science  31 May 2002:
Vol. 296, Issue 5573, pp. 1706-1709
DOI: 10.1126/science.1069434

Abstract

Operant conditioning is a form of associative learning through which an animal learns about the consequences of its behavior. Here, we report an appetitive operant conditioning procedure inAplysia that induces long-term memory. Biophysical changes that accompanied the memory were found in an identified neuron (cell B51) that is considered critical for the expression of behavior that was rewarded. Similar cellular changes in B51 were produced by contingent reinforcement of B51 with dopamine in a single-cell analog of the operant procedure. These findings allow for the detailed analysis of the cellular and molecular processes underlying operant conditioning.

Learning about relations between stimuli [i.e., classical conditioning (1)] and learning about the consequences of one's own behavior [i.e., operant conditioning (2)] constitute the major part of our predictive understanding of the world. Although the neuronal mechanisms underlying appetitive and aversive classical conditioning are well studied (e.g., 38), a comparable understanding of operant conditioning is still lacking. Published reports include invertebrate aversive conditioning (e.g.,912) and vertebrate operant reward learning (e.g., 13). In several forms of learning, dopamine appears to be a key neurotransmitter involved in reward (e.g.,14). Previous research on dopamine-mediated operant reward learning in Aplysia was limited to in vitro analogs (15–18). In this report, we overcome this limitation by developing both in vivo and single-cell operant procedures and describe biophysical correlates of the operant memory.

The in vivo operant reward learning paradigm was developed using the consummatory phase (i.e., biting) of feeding behavior inAplysia. This model system has several features that we hoped to exploit. The behavior occurs in an all-or-nothing manner and is thus easily quantified (see supplemental video). The circuitry of the underlying central pattern generator (CPG) in the buccal ganglia is well characterized (19). The anterior branch of the esophageal nerve (En2) (Fig. 1A) is both necessary and sufficient for effective reinforcement during in vivo classical conditioning and in vitro analogs of classical and operant conditioning (15–18, 20–23). Presumably, En2 conveys information about the presence of food during ingestive behavior. Consequently, we investigated the role of En2 in the reinforcement pathway by recording from it in freely behaving Aplysia via chronically implanted extracellular hook-electrodes (24) (see supplemental methods) (Fig. 1A). Little nerve activity was observed during spontaneous biting in the absence of food (Fig. 1, B1), whereas bouts (duration: ∼3 s) of high-frequency (∼30 Hz) activity in En2 were recorded during the ingestion of food (Fig. 1, B2). Specifically, this activity was observed in conjunction with ingestion movements of the odontophore/radula (a tonguelike organ). Electrical stimulation of En2 might thus be used to substitute for food reinforcement in an operant conditioning paradigm. Therefore, in vivo stimulation of En2 at approximately the frequency and duration as observed during feeding was made contingent upon each spontaneous bite in freely behaving animals (see supplemental methods). Such a preparation is unique among studies of learning in invertebrates and analogous to commonly used self-stimulation procedures in rats (e.g., 13).

Figure 1

In vivo recordings and behavioral results. (A) Schematic representation of electrode placement. (B1) Activity in En2 during spontaneous bites in the absence of food. Depicted are three bites (arrows). (B2) Activity in En2 during biting and swallowing behavior in the presence of food. Seven bite-swallows are shown (arrows). (C and D) Behavioral results. (C) Spontaneous bite rate in the final unreinforced test phase immediately after training. There was a significant difference among the three groups (Kruskal-Wallis ANOVA, H2 = 9.678, p< 0.008). A post-hoc analysis revealed that the number of bites in the contingently reinforced group was significantly higher than both control and yoked groups (Mann-Whitney U tests,U = 16.5, p < 0.007, andU = 24.0, p < 0.05, respectively). The two control groups did not differ significantly (Mann-WhitneyU test, U = 29.0, p = 0.07). (D) Spontaneous bite rate in the unreinforced test phase 24 hours after the beginning of the experiment. There was a significant difference among the three groups (Kruskal-Wallis ANOVA, H2 = 11.9, p < 0.003). The number of bites taken by the contingent reinforcement group was higher than the two control groups (Mann-Whitney U tests, U = 1.5,p < 0.009, control; and U = 0.0,p < 0.004, yoke). The two control were not significantly different (Mann-Whitney U test,U = 9.5, p = 0.17). In this and subsequent illustrations, bar graphs display means ±s.e.m.

One day after implanting the electrodes, animals were assigned to one of three groups: (i) a control group without any stimulation, (ii) a contingent reinforcement group for which each bite during training was followed by En2 stimulation, or (iii) a yoked control group that received the same sequence of stimulations as the contingent group, but the sequence was uncorrelated with their behavior (25). Animals that had been contingently reinforced showed significantly more spontaneous bites during a 5-min test period than did both control groups, regardless of whether they were tested immediately after training (Fig. 1C) or 24 hours later (Fig. 1D). These results indicate that during 10 min of contingent stimulation, the animals acquired an operant memory that lasted for at least 24 hours.

We next sought to identify changes in the nervous system that were associated with the behavioral modification. The neural activity that underlies the radula movements during feeding is generated by the buccal CPG. This neural network consists of sensory, inter-, and motor neurons that continue to produce buccal motor patterns (BMPs), even when the ganglia are removed from the animal (15). In the intact animal, ingestion-like BMPs correspond to radula movements transporting food through the buccal mass into the foregut, as opposed to rejection-like BMPs that correspond to radula movements that remove inedible objects from the foregut (24). Buccal neuron B51 is pivotal for the selection of BMPs. Specifically, B51 exhibits a characteristic, sustained, all-or-nothing level of activity (plateau potential) during ingestion-like BMPs. Moreover, B51 can gate transitions between BMPs. Direct depolarization of B51 leads to the production of ingestion-like BMPs, whereas hyperpolarization inhibits ingestion-like BMPs (18). We thus examined whether the observed increase in number of bites was associated with an increase in excitability of B51.

To test the hypothesis that B51 was a site of memory storage for operant conditioning, another set of animals was conditioned (26). Immediately after the last training period, the animals were anaesthetized and dissected, and the buccal ganglia were prepared for intracellular recording (see supplemental methods). Resting membrane potential, input resistance, and burst threshold were measured in B51. Burst threshold was defined as the amount of depolarizing current needed to elicit a plateau potential [see also (16, 18)]. Cells from the contingent group exhibited a significant decrease in burst threshold (Fig. 2A) and a significant increase in input resistance (Fig. 2B), as compared to cells from the yoked control. The resting membrane potential did not differ among the groups (27). The decrease in burst threshold and increased input resistance both increase the probability of B51 becoming active and thus increase the probability that a BMP will become ingestion-like. Our data validate an in vitro analog of operant conditioning in isolated buccal ganglia (16) and extend the research to include operant conditioning in freely movingAplysia.

Figure 2

Changes in burst threshold and input resistance in B51 after operant training. (A) Burst threshold. (A1) and (A2) Intracellular recordings from B51 cells from a matched pair of contingently reinforced and yoked control animals. Depolarizing current pulses were injected into each B51 until the cell generated a plateau potential. In this example, a 6-nA current pulse was sufficient to generate a plateau potential in B51 from a contingently reinforced animal (A1), whereas 14 nA were required to generate a plateau potential in B51 from the corresponding yoked-control animal (A2). (A3) Summary data. B51 cells from the contingent reinforcement group required significantly less current to elicit the plateau potential (Mann-Whitney U test, U = 59.5,p < 0.03). (B) Input resistance. (B1) and (B2) Intracellular recordings from B51 cells from both contingently reinforced and yoked control animals. Hyperpolarizing current pulses were injected into B51 and the cells' input resistance was measured. In this example, the membrane potential of B51 from a contingently trained animal (B1) deflected more in response to the current pulse than the potential of B51 from a yoked control animal (B2). (B3) Summary data. B51 input resistance was significantly increased in contingently reinforced animals (Mann-Whitney U test,U = 37.0, p < 0.002).

Although the expression of intrinsic changes in the membrane properties of B51 was associated with operant conditioning, the maintenance of these changes could be due to extrinsic factors such as a tonic change in modulatory input to B51. If so, the locus of the associative neuronal mechanism may be upstream of B51. Moreover, as B51 is active during ingestion-like BMPs, the changes in B51 could be the effect of repeated activation, rather than a cause of operantly conditioned animals taking more bites than do the yoked control animals. To solve this question, we isolated the neuron in primary cell culture and developed a single-cell analog of the operant procedure. B51 neurons were removed from naı̈ve Aplysia and cultured (see supplemental methods). Dopamine mediates reinforcement in an in vitro analog of operant conditioning (17), and En2 is rich in dopamine-containing processes (28). Therefore, reinforcement was mimicked by a brief (6 s) iontophoretic “puff” of dopamine onto the neuron. Because B51 exhibits a plateau potential during each ingestion-like BMP, this reinforcement was made contingent upon a plateau potential elicited by injection of a brief depolarizing current pulse. Contingent reinforcement of such B51 activity in the ganglion with En2stimulation is sufficient for in vitro operant conditioning (18). Two experimental groups were examined. Building on the experience with in vitro operant conditioning (18), we administered seven supra-threshold current pulses in a 10-min period to a contingent reinforcement group. Dopamine was iontophoresed immediately after cessation of the plateau potential. An unpaired group received the same number of depolarizations and puffs of dopamine, but dopamine iontophoresis was delayed by 40 s after the plateau potential. Contingent application of dopamine produced a significant decrease in burst threshold (Fig. 3A) and a significant increase in input resistance (Fig. 3B). Apparently, processes intrinsic to B51 are responsible for the induction and maintenance of the biophysical changes associated with operant reward learning.

Figure 3

Contingent-dependent changes in burst threshold and input resistance in cultured B51. (A) Burst threshold. (A1) and (A2) Intracellular recordings from a pair of contingently reinforced and unpaired neurons. Depolarizing current pulses were injected into B51 before (pre-test) and after (post-test) training. In this example, contingent reinforcement led to a decrease in burst threshold from 0.8 to 0.5 nA (A1), whereas it remained at 0.7 nA in the corresponding unpaired cell (A2). (A3) Summary data. The contingently reinforced cells had significantly decreased burst thresholds (Mann-Whitney U test,U = 0.0, p < 0.004). (B) Input Resistance. (B1) and (B2) Intracellular recordings from a pair of contingently reinforced and unpaired control neurons. Hyperpolarizing current pulses were injected into B51 before (pre-test) and after (post-test) training. In this example, contingent reinforcement lead to an increased deflection of the B51 membrane potential in response to the current pulse (B1), whereas the deflection remained constant in the corresponding unpaired cell (B2). (B3) Summary data. The contingently reinforced cells had significantly increased input resistances (Mann-Whitney Utest, U = 3.5, p < 0.03).

The combination of rewarding a simple behavior with physiologically realistic, in vivo stimulation uncovered neuron B51 as one site where operant behavior and reward converge (see supplemental discussion). The results presented here suggest that intrinsic cell-wide plasticity contributes to operant reward learning. Such cell-wide plasticity is also associated with operant conditioning in insects (10). Although B51 is a key element in the neural circuit for feeding, the quantitative contribution of the changes in B51 to the expression of the behavioral changes needs to be elucidated. Given the number of neurons in the feeding CPG (19), it is likely that B51 will not be the only site of plasticity during operant conditioning (nor will cell-wide plasticity likely be the only mechanism). However, the persistent involvement of contingent-dependent cell-wide plasticity in B51 in different levels of successively reduced preparations suggests an important role for this mechanism.

Research on Aplysia has provided key insights into mechanisms of aversive conditioning that are evolutionary conserved. The utility of this model system for learning and memory has now been extended to dopamine-mediated reward learning on the behavioral, network, and cellular level. Our study expands a growing body of literature that shows that dopamine is an evolutionary conserved transmitter used in reward systems. Future research onAplysia will likely provide insights into the subcellular effects of dopamine reward, an area currently under intense investigation in vertebrates (8, 13).

  • * These authors contributed equally to this work.

  • To whom correspondence should be addressed. E-mail: john.h.byrne{at}uth.tmc.edu

REFERENCES AND NOTES

View Abstract

Stay Connected to Science

Navigate This Article