Report

Genetically Determined Differences in Learning from Errors

See allHide authors and affiliations

Science  07 Dec 2007:
Vol. 318, Issue 5856, pp. 1642-1645
DOI: 10.1126/science.1145044

Abstract

The role of dopamine in monitoring negative action outcomes and feedback-based learning was tested in a neuroimaging study in humans grouped according to the dopamine D2 receptor gene polymorphism DRD2-TAQ-IA. In a probabilistic learning task, A1-allele carriers with reduced dopamine D2 receptor densities learned to avoid actions with negative consequences less efficiently. Their posterior medial frontal cortex (pMFC), involved in feedback monitoring, responded less to negative feedback than others' did. Dynamically changing interactions between pMFC and hippocampus found to underlie feedback-based learning were reduced in A1-allele carriers. This demonstrates that learning from errors requires dopaminergic signaling. Dopamine D2 receptor reduction seems to decrease sensitivity to negative action consequences, which may explain an increased risk of developing addictive behaviors in A1-allele carriers.

“You learn from your mistakes,” people say. We usually learn from both positive and negative action outcomes, which induce reinforcement of successful and avoidance of erroneous behavior, respectively (1). The relative amount of learning from successes and errors varies across individuals as a result of disease or pharmacological intervention (2). Can even our genetic makeup influence the way we learn from errors? An important factor in the use of negative and positive feedback for learning seems to be the neurotransmitter dopamine (35). A human genetic polymorphism (DRD2-TAQ-IA) is known to modulate dopamine D2 receptor density. The A1 allele is associated with a reduction in D2 receptor density by up to 30% (68). This reduction has been linked to multiple addictive and compulsive behaviors (9, 10), which suggests some insensitivity to negative consequences of self-destructive behavior. This might be linked to a general deficit in learning from errors. Here, we report patterns of brain activity underlying a reduced ability to use negative feedback for avoidance learning in carriers of the A1 allele. Our findings suggest a genetically driven change in the dynamic interaction of performance monitoring and long-term memory formation. When action outcomes call for adaptations, a performance-monitoring system in the posterior medial frontal cortex (pMFC) signals the need for adjustments (11, 12). The rostral cingulate zone (RCZ), located in the pMFC, has been suggested to be involved in learning from errors (13, 14). A neurobiological theory holds that this region receives dopaminergic teaching signals from the midbrain coding whether an event is better or worse than predicted (14). In close interaction with the performance-monitoring system, the basal ganglia, in particular the nucleus accumbens (NAC), play a majorrole in reward-based learning (12, 1517). Moreover, the performance-monitoring system needs to interact with the hippocampal formation to enable learning of stimulus-reward associations.

To investigate neural activity related to error-based learning, we recorded functional magnetic resonance imaging (fMRI) data from 26 healthy male subjects grouped by genotype [A1-allele carrier, A1+ group, n = 12; non–A1-allele carrier, A1– group, n = 14 (18)]. We used a probabilistic learning task sensitive to dopaminergic manipulations (2). Participants had to learn to choose the more-often rewarded symbols from pairs of stimuli presented in random order. After each choice, probabilistic feedback was provided (Fig. 1, top). After learning, participants were confronted in a behavioral posttest with the same symbols, now paired with symbols other than the one from the learning phase [supporting online material (SOM), table S1]. This allowed us to disentangle preference for the most-often rewarded symbol “A” and avoidance of the least-often rewarded symbol “B.”

Fig. 1.

Probabilistic learning task, behavioral and computational results. (Top) Stimuli, reward probabilities (percent positive feedback), and schematic trial sequence of the probabilistic learning task (2). (Bottom, left) Result of the behavioral post test: Choosing the good symbol (A) and avoiding the bad one (B) differs between the two genetic groups (group × selection interaction: F1,24 = 8.1, P = 0.009). (Bottom, right) Certainty of the given response resulting from the computational model, binned into bins of 20 trials each and differentiated between the two genetic groups.

The groups defined by the presence or the absence of the A1 allele did not differ in the average frequency of selecting favorable symbols nor in the rate of negative feedback; however, we found a remarkable group difference in avoidance learning (Fig. 1, bottom left) (SOM text). In the posttest, the A1+ group avoided the negative symbol B significantly less than they chose the positive symbol A (P = 0.03). Moreover, their avoidance of B was reduced compared with the A1– group (P = 0.03), who did not show a significant difference between selecting A and avoiding symbol B (P = 0.17). Consistent with this behavior, they also showed a reduced negative feedback–related fMRI signal in the RCZ (x =4, y = 24, z = 33; z score = 3.5, 324 mm3) compared with the A1– group (Fig. 2A, and table S2). In a Bayesian analysis (18, 19), we observed a posterior probability of 95.8% for a group difference in RCZ activity induced by negative feedback. Moreover, only members of the A1– group showed positive correlations with negative feedback–related RCZ activity and preference for the A symbol (r = 0.53, P = 0.05) and avoidance of the B symbol (r = 0.55, P = 0.04). A further strong signal increase on negative feedback in the right middorsal prefrontal cortex [x = 40, y = 21, z = 27; z score = 4.3, middle frontal gyrus (MFG)] was found only in the A1– group (posterior probability of group difference: 97.1%).

Fig. 2.

Genetic influences on the fMRI results. Only clusters with at least 81 mm3 activated at z ≥ 3.09 are shown. For visualization, the map thresholds are set at z = 2.33 (unless stated otherwise). (A) (Left) The contrast between negative and positive feedback for the two genetic groups is shown projected onto a coronal slice (y =24) and two sagittal slices (x = 4 and x = 16); red, negative feedback > positive feedback; blue, positive feedback > negative feedback. (Right) Percent signal change for positive and negative feedback taken from RCZ (x =4, y =24, z = 33). (B) Parametric within subject fMRI analysis using the certainty of the given response as a regressor, projected onto a coronal (y = –39) and a sagittal (x = 22) slice. HIP, hippocampus. (C) Psychophysiological interaction analysis between RCZ (x =4, y = 24, z = 33) and other brain areas, projected onto a coronal (y = –42) and two sagittal (x = –26 and x = 16) slices. Red, stronger interaction in the first third than in the last third of the experiment; blue, stronger interaction in the last than in the first third.

To study learning over the course of the probabilistic learning task, we modeled subjects' behavior using a modified Rescorla-Wagner reinforcement learning model (20) (fig. S1). In this computational model, the difference in activity of the output neurons provides a trial-by-trial estimate of certainty of the given response. The A1– group reached a significantly higher response certainty in the last third (t = 2.2, P = 0.04). The development of the certainty over the course of the experiment is shown in Fig. 1 (bottom right). In both groups, the curves resemble a logarithmic learning curve with a steep increase in the first third and an asymptotic course at the end of the experiment. After an initial period of about 200 trials, the A1– group developed a higher response certainty than the A1+ group. For both genetic groups, response certainty negatively correlated with pMFC activity (fig. S2). Note that, in the A1– group, the time course of certainty, which reflected learning progress, showed a positive correlation with activity in the posterior hippocampus bilaterally (x = 22, y = –39, z =6; z score = 3.9, 216 mm3 and x = –23, y = –39, z =3; z score = 3.5, 81 mm3), whereas no such correlation was found in A1+ participants [Bayesian posterior probability of group difference: right, 94.9%; left, 96.2% (Fig. 2B) and table S3].

How does feedback monitoring in the RCZ interact with forming memories in the hippocampus? Anatomically, these areas are connected via the cingulate bundle. To investigate learning-related changes in functional interactions of the RCZ and other brain areas over time, we performed a psychophysiological interaction analysis (PPI) (21). The experiment was divided into three parts of equal length. We then contrasted the functional connectivity of the RCZ observed in the first third with the connectivity observed in the last third of the learning experiment, thereby capturing the difference between steep rule acquisition in the beginning and more stable rule exploitation at the end. Again, in the A1– group, we observed a significant change over time: In the first third of the experiment, the functional coupling between the RCZ activity and the bilateral hippocampus was substantially stronger than in the last third (Fig. 2C). The A1+ group showed no such correlation (Bayesian posterior probability of group difference: left hippocampus, 99.98%; right hippocampus, 99.91%). Furthermore, only the A1– group showed a similar change in functional coupling between NAC and RCZ over the time course of the experiment (Bayesian posterior probability: 99.54%). The NAC, another major target of dopaminergic projections, has also been implicated in feedback-based decision-making (12, 2224). The fMRI signal in the NAC on both sides was increased by positive feedback as compared with negative feedback (Fig. 2A). This reward-related activity increase was reduced in the A1+ group in the right NAC (x = 16, y =9, z = –6; z score = –3.96; Bayesian posterior probability of group difference: 94.8%; on the left side, posterior probability reached only 74.1%).

Taken together, our results confirm that dopamine plays a major role in performance monitoring and behavioral modification for reaching optimal performance levels: Alterations in dopaminergic transmission lead to corresponding alterations in negative feedback processing and, related to this, to differences in learning from negative feedback. It appears that reduced dopamine D2 receptor density is associated with reduced capacity to learn negative characteristics of a stimulus from negative feedback. High receptor density in the A1– group is associated with clear avoidance of the most negative stimulus, whereas a reduced receptor density in A1+ subjects is not. Corresponding to this, subjects with a reduced receptor density show a weaker blood oxygen level–dependent (BOLD) response to negative feedback in the performance-monitoring network consisting of the pMFC and basal ganglia. In the pMFC, this difference was specific to negative feedback; its response to positive feedback and negative correlation with certainty (12) did not differ between groups (Fig. 2A). Negative feedback– related pMFC activity predicted posttest performance in the A1– group, which suggests that they used negative feedback for avoidance, as well as preference learning. Interestingly, anterior insular activity, thought to be involved in autonomic responses to errors (12), was present in both groups, which suggests that the genotype-effect is specific to learning from errors. The differential activity in the MFG, a brain region commonly found in working memory tasks (25), may suggest that A1– participants used a monitoring-within-memory strategy of keeping track with selection outcome history. This speculation is supported by the role of prefrontal D2 receptors in working memory functions (26, 27).

Hence, the genetically driven differences in avoidance learning seem to result from a weaker neuronal response to negative feedback. Reduced monitoring signals are less likely to influence the memory system. This is supported by the finding of a reduced interaction of performance monitoring in pMFC and memory-formation in the hippocampus.

It is noteworthy that the fMRI signal reduction in the A1– group is specific to performance monitoring–related processes. It does not generalize to other task-specific activity (SOM text and fig. S4).

At first sight, our findings that subjects with lower D2-receptor densities show reduced avoidance learning may appear to conflict with results indicating that patients with Parkinson's disease on medication, i.e., with enhanced dopaminergic transmission, have problems in learning the negative value of stimuli (2). This apparent discrepancy can be resolved by a recent study, which revealed a higher rate of dopamine synthesis in the striatum for subjects with the A1+ configuration compared with A1– subjects (28). A reduction in D2 receptors could also affect D2 autoreceptors, which in turn leads to a higher synthesis rate of dopamine. Accordingly, transmission via the unaffected D1 receptors should be strengthened, whereas modulation of phasic postsynaptic D2 activity should be relatively reduced. This should lead to a relative decrease in avoidance learning and a shift to learning mainly from positive reinforcement (2, 5). Parkinson's disease is often treated with tonically acting direct D2 agonists, which also reduce phasic modulations at postsynaptic D2 receptors. A phasic decrease in dopamine, as is suggested to occur after negative feedback (4, 14), may thus be less effective in both studies. This dulled D2-mediated dopaminergic signal in turn would finally lead to a weaker hemodynamic response in the RCZ.

Many studies have found relations between a reduced dopamine D2 receptor density and addiction, obesity, or compulsive gambling (9, 10, 29). It may be speculated that the insensitivity to negative consequences of an action, as described above, is one feature of a low D2 receptor configuration and promotes behavior that could threaten health or social interactions.

Supporting Online Material

www.sciencemag.org/cgi/content/full/318/5856/1642/DC1

Materials and Methods

SOM Text

Figs. S1 to S4

Tables S1 to S3

References

References and Notes

View Abstract

Navigate This Article