Research Article

Local D2- to D1-neuron transmodulation updates goal-directed learning in the striatum

See allHide authors and affiliations

Science  31 Jan 2020:
Vol. 367, Issue 6477, pp. 549-555
DOI: 10.1126/science.aaz5751

A learning mechanism in the striatum

An intriguing characteristic of the striatum is the random spatial distribution and high degree of intermingling between expression of dopamine receptor types 1 (D1) and 2 (D2) within striatal projection neurons (SPNs). The resulting highly entropic mosaic extends through a homogeneous space and is mostly devoid of histological boundaries. The rules established locally by D1- and D2-expressing SPNs (D1-SPNs and D2-SPNs) are thus likely critical in defining how functional territories develop throughout the striatum. Matamales et al. found that activated D2-SPNs access and modify developing behavioral programs encoded by regionally defined ensembles of transcriptionally active D1-SPNs. This process is slow because it depends on the molecular integration of additive neuro-modulatory signals. However, with time, it creates the regional functional boundaries that are necessary to identify and shape specific learning in the striatum.

Science, this issue p. 549


Extinction learning allows animals to withhold voluntary actions that are no longer related to reward and so provides a major source of behavioral control. Although such learning is thought to depend on dopamine signals in the striatum, the way the circuits that mediate goal-directed control are reorganized during new learning remains unknown. Here, by mapping a dopamine-dependent transcriptional activation marker in large ensembles of spiny projection neurons (SPNs) expressing dopamine receptor type 1 (D1-SPNs) or 2 (D2-SPNs) in mice, we demonstrate an extensive and dynamic D2- to D1-SPN transmodulation across the striatum that is necessary for updating previous goal-directed learning. Our findings suggest that D2-SPNs suppress the influence of outdated D1-SPN plasticity within functionally relevant striatal territories to reshape volitional action.

In changing environments, it is adaptive for humans and other animals flexibly to adjust their actions to maximize reward. Extinction learning allows individuals to withhold instrumental actions when their consequences change. Rather than erasing such actions from one’s repertoire, current views propose that extinction generates new inhibitory learning that, when incorporated into previously acquired behavior, acts selectively to reduce instrumental performance (1).

Associative learning theory identifies the negative prediction errors produced by the absence of an anticipated reward as the source of the inhibitory learning underlying instrumental extinction (2). Such signals are thought to involve pauses in dopamine (DA) activity, and this pattern is well suited to alter plasticity in the posterior dorsomedial striatum (DMS), a key structure encoding the action-outcome associations necessary for goal-directed learning (3). Nevertheless, the way complex DA signals (4) alter postsynaptic circuits in the DMS to shape goal-directed learning remains unknown.

Within the DMS, the plasticity associated with goal-directed learning involves glutamate release timed to local DA activity to alter intracellular cyclic adenosine monophosphate (cAMP)–dependent pathways in postsynaptic neurons, a function that involves slow temporal scales (5) and that leads to gene transcription necessary for learning (6). This activity is distributed across two major subpopulations of spiny projection neurons (SPNs)—the principal targets of DA (7). These are completely intermixed within the striatum and express distinct DA receptor subtypes that respond to DA in an opposing manner: Half express type 1 receptors and trigger powerful cAMP signaling in DA-rich states (D1-SPNs), whereas the other half express type 2 receptors and show robust signaling in DA-lean states (D2-SPNs) (8). Given that positive and negative prediction errors during appetitive learning are known to influence DA release (4, 9), we hypothesized that prediction errors during reward and extinction learning generate distinctive molecular activation patterns in D1- and D2-SPNs across the striatum to provide a molecular signature identifying those regions most relevant for plasticity.

Nucleosomal response in SPNs captures goal-directed learning

We first established whether intracellular signaling in SPNs undergoes functional reorganization across the striatum during goal-directed learning. We trained mice to acquire rewarded instrumental actions, where a lever press (action) was either briefly or more extensively associated with the delivery of food (outcome) (Fig. 1, A and B). In group Novice, initial acquisition was marked by a spontaneous increase in lever press frequency during the first session of training, which was used to flag the approximate time at which the action-outcome contingency was first experienced (fig. S1, A and B). By contrast, mice in group Expert received 19 days of additional training (Fig. 1B), clearly increasing lever pressing across days (fig. S1C and table S1).

Fig. 1 Nucleosomal response mapping reveals learning-related territories in the striatum.

(A) Mice were trained to associate an action (lever press) with an outcome (food pellet). (B) Four groups of eight mice received different levels of instrumental conditioning. 1stC, first Contingency. (C) Immunodetection of phosphorylated nucleosomes (phospho-Ser10-histone H3; P-H3) identifies transcriptionally active (ta) neurons in the posterior striatum. P-H3 immunoreactivity was specifically detected in the nuclei (DAPI+) of SPNs (DARPP-32+). (D) Digitized reconstruction of taSPNs throughout the striatum in “Novice,” “Expert,” and their control groups (4362 SPNs mapped). Right panels: return maps of inter-action-intervals (IAIs) for lever presses (blue) and magazine checks (orange). Each data point represents the time delay to its preceding (x) and succeeding (y) behavioral element. (E) taSPN density (cells/mm2) and overall action rate in the different training groups (eight mice per group, both striata). (F) Identification of taSPNs in the striatum of a trained drd2-eGFP mouse. Arrows: taD2-SPNs (downward) and taD1-SPNs (upward). (G) P-H3+ nuclei densities of each neuronal type in the striatum of the different training groups. Note the different scale on the y axes. *, simple effects (table S1).

We next assessed whether the different levels of training were represented in the signaling patterns in striatal SPNs. We used immunodetection of phosphorylated histone H3 on serine 10 (P-H3), a ubiquitous transcriptional activation marker that is rapidly induced in SPNs in response to different DA states (6, 8). We found a robust P-H3 signal in the nucleus of striatal neurons that colabeled with DARPP-32, a marker of SPNs (Fig. 1C), suggesting that projection neurons—relative to other types of striatal neurons—were transcriptionally active under these conditions. Wide-field, high-resolution mapping identified different levels of transcriptionally active SPNs (taSPNs) across groups, with clear territorial differences in their distribution (Fig. 1D). Compared to Non Contingent controls—exposed to the lever and receiving as many rewards but noncontingently—Novice mice showed a high density of taSPNs concentrated in the DMS, consistent with the role of this region in action-outcome encoding (3). By contrast, when compared to their Yoked controls, group Expert showed an increase in taSPN density that distributed laterally, in support of the functional lateralization expected from extensively trained actions (Fig. 1D) (10). Critically, we found a clear dissociation between taSPN density and the extent of overall performance (i.e., lever presses and magazine checks) (Fig. 1, D and E, and table S1). This allowed us to link goal-directed learning with the induction of DA-promoted transcriptional activity in SPNs. The nuclear P-H3 signal was detected in D1- as well as D2-SPN subtypes (Fig. 1F), indicating that both neuronal systems were sensitive to the DA states underpinning goal-directed learning. D1 neurons were more transcriptionally active than D2 neurons in all training groups (Fig. 1G), and taD1/taD2-SPN ratios remained constant (fig. S1D and table S1).

Regional overlap of activated taSPN subpopulations predicts extinction learning

To compare the activation patterns of D2- and D1-neurons in the striatum during instrumental and extinction learning, we mapped and classified large numbers of taSPNs in whole striatal sections of drd2-eGFP (enhanced green fluorescent protein) mice (fig. S2). We trained two groups of mice on an increasing fixed ratio (FR) reinforcement schedule where access to each food outcome relied on a predictable instrumental effort (Fig. 2A). The groups showed indistinguishable performance with very similar increases in lever press rate across training (fig. S3A and table S2). On day 16, group Extinction underwent an altered training session in which lever pressing activated the food dispenser, but no outcomes were delivered. This manipulation generated vigorous responding for “no-reward” (Ø) that was comparable to that of nonextinguished mice (Instrumental controls) for almost half of the session, at which point their cumulative performance decayed (Fig. 2B, fig. S3B, and table S2).

Fig. 2 Functional confluence of projection systems in the striatum promotes extinction of learned actions.

(A) Mice (eight per group) were trained on increasing fixed ratio (FR) reinforcement schedules prior to extinction (day 16). (B) Cumulative and average lever press performance during instrumental (control) and extinction sessions (day 16). (C) Distribution of taD2- and taD1-SPNs in the posterior striatum of mice in (B). Plots show up to three density clusters from aligned hemisections in each group (4044 SPNs mapped). (D) Reconstruction of taD2- and taD1-SPN overlapping territories in the striatum. (E and F) Extent of taD2- and taD1-SPN territory exclusion. Data are percentages of taD2-SPN territories segregated from taD1-SPNs (E) and vice versa (F). Insets: overall D2/D1 (E) and D1/D2 (F) P-H3+ SPN ratios. (G) Genetic lesion of D2-SPNs in the DMS through AAV-FLEx-taCasp3-TEVp system. DARPP-32+-eGFP- SPNs remained intact (fig. S4A). (H) Sham and Lesioned mice were trained as in (A) but underwent additional extinction testing on day 17. (I) Lever press performance in both groups across instrumental training. Right: return maps of collective IAIs on day 15. (J and K) Cumulative presses per minute on days 16 (J) and 17 (K). Insets: average press performance (top) and linear regression slope (Slp) analysis (bottom). (L) Raster plots and frequency histograms of pooled lever press data preceding the delivery of each pseudoreward (red). (M) Digitized reconstruction of taD2- and taD1-SPNs after test. Insets: taD2- and taD1-SPN overlapping territories. (N) Extent of taD2- and taD1-SPN territory overlap (% of D1). (O) P-H3+ nuclei density in D2-SPNs (left axis, purple) and D1-SPNs (right axis, orange) in the DMS. Left: regions quantified in each group. [(E), (F), (N), and (O)]: Two hemisections per mouse, eight mice per group; *, overall/simple effect (black) and interaction (red). n.s., not significant (table S2).

Mapping of taSPNs in entire striatal sections revealed that overall densities of taD2- and taD1-SPNs were similar in Instrumental and Extinction groups (fig. S3, C and D, and table S2). However, density-based cluster analysis showed that taD2- and taD1-SPNs followed characteristic spatial distributions across the posterior striatum in each group (Fig. 2C). In mice undergoing a rewarded session, each system tended to occupy nonoverlapping areas in the DMS, with taD1-SPNs segregated to lateral territories (Fig. 2C, top). In animals undergoing extinction, we found a high level of convergence in DMS areas, with very few neurons detected laterally (Fig. 2C, bottom). Extinction mice exhibited a marked increase in the proportion of taD2-SPN territories that overlapped with functional D1-SPN areas specifically in the DMS (Fig. 2, D and E), as well as a higher proportion of taD1-SPN clusters sharing space with taD2-SPNs in this same region (Fig. 2, D and F, and table S2).

D2-SPNs in the DMS are required to encode extinction learning

We hypothesized that recruiting activated D2-SPNs in the DMS is directly related to inhibitory learning during extinction. We selectively removed D2-SPNs from the DMS where we had observed high taD2- and taD1-SPN confluence (Fig. 2D) through genetic ablation in adult adora2a-Cre::drd2-eGFP hybrid mice (Fig. 2G and fig. S4A). After instrumental training, Lesioned and Sham mice were given a 10-min extinction learning session (day 16) followed by an extinction test 24 hours later (Fig. 2H). This protocol was aimed at detecting differences on test due to deficient integration of extinction learning 24 hours earlier. D2-SPN ablation had no effect on the initial acquisition of instrumental contingencies; both groups showed very similar levels of performance across sessions (Fig. 2I, left) and an indistinguishable response structure on day 15 (Fig. 2I, right, and table S2). Likewise, both groups similarly reduced lever press performance during the 10-min extinction learning session (Fig. 2J and table S2), although performance differed 24 hours later: Lesioned mice accumulated a higher number of presses across the session and showed a higher average level of pressing and a steeper linear regression slope (Fig. 2K and table S2). This increase in performance was not due to an overall increase in lever press rates but rather to recurrent and persistent performance during the extinction session (Fig. 2L; fig S4, B to D; and table S2).

We then analyzed the distribution of taD2- and taD1-SPNs that accumulated in the posterior striatum during the 20-min test on day 17. We again found an increased confluence of taD2- and taD1-SPNs in the DMS of Sham mice, which still displayed substantial overlap despite being assessed on the second day of extinction (Fig. 2M and table S2). Conversely, the absence of D2-SPNs in Lesioned mice was associated with a high density of taD1-SPNs in the DMS (Fig. 2, M and N, and table S2). Density analysis in the DMS revealed that taD2- and taD1-SPNs followed opposing density patterns in Sham and Lesioned mice: Higher densities of taD2-SPNs predicted low densities of taD1-SPNs and vice versa (Fig. 2O and table S2).

D2-SPNs spatially rearrange D1-SPN plasticity

We sought to establish whether D2- and D1-SPNs functionally interact within convergent striatal territories using pharmacological compounds known to induce robust and widespread intracellular signaling in each system (8). Systemic injection of raclopride (RAC; a D2-receptor antagonist) to drd2-eGFP mice induced a strong nucleosomal response mostly in D2-SPNs that extended throughout the posterior striatum, whereas GBR12783 (GBR; a DA-transporter inhibitor) induced an even stronger activation with similar distribution but mostly in D1-SPNs (Fig. 3A). We injected four groups of drd2-eGFP mice with different combinations of vehicle, RAC (t = 0), and GBR (t = 15) and recorded their ambulatory locomotor activity in an open field arena prior to perfusion (t = 30) (Fig. 3B). GBR injection strongly increased locomotion in both groups that received it, irrespective of the RAC injection (Fig. 3, C to G, and table S3). In RAC-treated mice, taD2-SPNs dominated most of the striatal space (Fig. 3, H and I), whereas a reverse pattern was observed after GBR treatment (Fig. 3, H and J). The combination of both drugs, however, resulted in a high density of taD2-SPNs and a low density of taD1-SPNs throughout the posterior striatum, indicating that prior injection of RAC prevented the effects of GBR on D1-SPN nuclear activity (Fig. 3K). This large-scale D2-to-D1 suppression was reproduced when RAC was combined with a full D1 agonist (fig. S5). Moreover, the distribution of taD2 and taD1-SPN peak density areas tended to occupy spatially distinct regions of the striatum in all groups, regardless of the pharmacological cocktail received (Fig. 3, H to K, lower panels). This opposition was supported by a strong overall treatment × neuron interaction (Fig. 3L and table S3). Paradoxically, animals with D2-dominated (and D1-inhibited) SPN signaling (Fig. 3K) showed unaltered GBR-induced hyperlocomotion (Fig. 3G), again suggesting that nuclear activity in SPNs can be dissociated from behavioral performance (compare Fig. 1E).

Fig. 3 Overstimulated SPN systems compete for space in the striatum.

(A) Confocal micrographs showing the effects of D2R blockade (Raclopride, 0.3 mg/kg) and dopamine transporter inhibition (GBR12783, 15 mg/kg) on taSPNs. (B) Two-injection experimental design applied to each group prior to perfusion (eight mice per group). Ambulation was measured in an open field. (C) Ambulation (beam brakes) recorded after the second injection (min 15 to 30). (D to G) Ambulatory activity per minute. Right: ambulatory trajectory (start: blue; finish: red) in one example mouse after the second injection. (H to K) Top: maps of taD2- and taD1-SPNs in the striatum of an example mouse. Bottom: distribution contour plots delimitating regions of increasing P-H3+ nuclear density in D2- (left) and D1- (right) SPN systems separately. Isodensity curves are pseudocolored from low (blue) to high (red) relative densities (31,542 taD1-SPNs and 29,100 taD2-SPNs mapped). (L) Quantification of P-H3+ nuclei (counts × 103) distributed in D2- and D1-SPNs in each group (9 to 12 sections per group). *, significant overall/simple effect (black) and interaction (red). n.s., not significant (table S3).

Broad connectivity of the local striatal network supports an extensive D2- to D1-SPN transmodulation

We reasoned that the local network in the striatum may reflect large-scale connectivity biases consistent with the magnitude of the D2-to-D1 modulation reported above. To investigate this, we used a quantitative network-level approach to measure connectivity biases using unilateral injections of the herpes simplex virus 1 (HSV1) H129tdTomato in the posterior striatum of drd2-eGFP mice (Fig. 4A). Because this virus moves along synaptically connected neurons in the anterograde direction (11), we mapped transduced tdTomato+ SPNs across large territories to reveal the broad connectivity patterns established within the striatum (Fig. 4B and fig. S6A). Only ~20% of the transduced SPNs were D2-SPNs, whereas up to ~80% were D1-SPNs (Fig. 4, C to E, and table S4). These numbers were similar across all areas of infection, regardless of whether they were ipsi- or contralateral to the initial injection (fig. S6, B to D, and table S4). Injection of the virus in the prelimbic cortex (i.e., one upstream synapse) provided very similar weights of transduced D1- (~80%) and D2- (~20%) SPNs (fig. S6, E to G, and table S4).

Fig. 4 SPN subtypes show large-scale unidirectional transconnectivity.

(A) Anterograde transsynaptic HSV1-H129-tdTomato virus was unilaterally injected in the posterior striatum of drd2-eGFP mice. (B) In a hypothetical striatum with symmetrical SPN-SPN contacts, the “founder” cells first integrating the virus are expected to infect similar proportions of surrounding SPNs (D2 and D1). (C) Infected, Td-Tomato+, particles were classified as D2- and D1-SPNs according to their eGFP and DARPP-32 content (fig. S6A). (D) Digitized reconstruction of infected D2- and D1-SPNs in an entire striatal hemisection. (E) Percentage of particles classified as D2- and D1-SPNs (fig. S6, B to D) (10 hemisections, five mice). (F) HSV1-H129Floxed virus (permanently switches from eGFP to tdTomato if it encounters Cre) was bilaterally injected in the striatum of adora2a-Cre and drd1a-Cre mice. (G) Infection in cis involves infection through isolated lineages (green and red). Infection in trans involves infection across lineages (yellow; see materials and methods). (H) Viral spread and digitized reconstruction of red-labeled (tdTom) and yellow-labeled (tdTom + eGFP) SPNs in entire striatal hemisections of each Cre line. (I) Percentage of red and yellow SPNs in each Cre line (fig. S7) (13 to 16 hemisections in four mice per line). (J) Unilateral genetic lesion of D2-SPNs in the DMS (as in Fig. 2G). (K) Quantification of D2-SPN density (left axis) and taD1-SPN density (right axis) in control and lesioned sides after Rac + GBR pharmacological treatment (Fig. 3B) (five mice). (L) Mediolateral continuum from lesioned to intact striatal territories after drug treatment. Bottom: green (eGFP) and red (P-H3) fluorescence plot profile across the medio-lateral continuum above. m.g.v, mean gray value. *, significant overall/simple effect (black) and interaction (red). n.s., not significant (table S4).

To confirm that the enhanced connectivity in D1-SPNs was influenced by a D2-to-D1 drive, we infected the striatum with HSV1 H129Floxed virus, an anterograde tracing approach in which transsynaptic labeling switches from green to red in the presence of Cre (12) (Fig. 4F and fig. S7, A to C). This method allowed us to quantify cross-system connectivity by assessing the proportion of (non–Cre-expressing) neurons that received dual green and red infection within a critical temporal window (Fig. 4G). Intrastriatal injection in adora2a-Cre mice revealed equal proportions of single (tdTom)– and double (tdTom + eGFP)–labeled SPNs (Fig. 4, H and I, and table S4). By contrast, infection in drd1a-Cre mice produced a much lower proportion of double-labeled SPNs (Fig. 4, H and I, and table S4). The percentages of double- and single-labeled neurons in drd1a-Cre mice matched those previously observed with the nonswitchable virus in drd2-eGFP (i.e., 20 and 80%; compare Fig. 4, E and I). Again, these same proportions were consistently found across all infected striatal areas analyzed, irrespective of the area of spread (fig. S7, C to E). The two major projection systems in the striatum establish very different local connectivity, with D2-SPNs making substantial connections with D1-SPNs but not vice versa. The magnitude of this asymmetry provides full neuroanatomical support for the large-scale D2- to D1-SPN transmodulation observed in this study.

To address the functional relevance of this connectivity bias, we evaluated the effects of RAC and GBR cocktail on striata where D2-SPNs had been genetically ablated in the DMS (fig. S8, A and B). After recovery, mice with unilateral D2-SPN depletions (Fig. 4J) were treated with RAC and GBR prior to perfusion (as in Fig. 3B), and patterns of transcriptional activation were contrasted between control and lesioned sides. The density of taD1-SPNs was inversely proportional to the density of D2-SPNs in control and lesioned sides (Fig. 4K and table S4). Analysis of P-H3+ nuclei across contiguous striatal territories spanning lesioned and intact areas revealed that the identity of the transcriptionally active neurons transitioned from D2-SPNs (in intact territories) to D1-SPNs (in D2-ablated territories) (Fig. 4L). Similarly, D2-SPN removal in the DMS opened windows of uncontrolled D1-SPN activation (fig. S8C).

D2-SPNs shape other sources of flexibility in goal-directed learning

Thus far, our results suggest that D2-SPNs shape the changes in striatal plasticity necessary for flexible encoding of goal-directed learning. In the case of inhibitory learning during extinction, this process is compatible with the role ascribed to DA in negative prediction error scenarios: A sustained pause of phasic DA in defined striatal territories may lead to recruitment of D2-SPNs and, with it, the D2-to-D1 transmodulation reported here. We next sought to evaluate the role of D2-SPNs in flexible learning that should not overtly depend on a negative reward prediction error. We induced subtle changes in the identity predictions of preexisting action-outcome relationships by reversing the outcome congruence between pairs of action-outcome associations. We trained mice with bilateral D2-SPN ablations in the DMS (fig. S8, D and E) and their sham controls on two action-outcome associations (A1-O1 and A2-O2), which generated identical performance in both groups (Fig. 5A and table S5). We then verified whether both A-O contingencies had been correctly encoded by giving the mice an outcome-specific devaluation test, which evaluated the effect of sensory-specific satiety on one or the other outcome on choice between the two trained actions (13) (Fig. 5B). Both groups correctly encoded the initial contingencies (i.e., A1-O1 and A2-O2); satiety on one of the outcomes (e.g., O1) reduced performance of the action associated with that outcome in training (A1; devalued) relative to the other, still valued, action (A2; valued) (Fig. 5B and table S5). We then explored whether these mice could incorporate new information by training them with the outcome identities reversed for 5 days (i.e., A1-O2 and A2-O1) (Fig. 5C and table S5) prior to a second outcome-specific devaluation test (Fig. 5D). Whereas Sham mice were able to show flexible encoding and could adjust their choice according to the new A-O associations, mice with D2-SPN ablation failed to do so (Fig. 5D and table S5).

Fig. 5 D2-SPNs control the updating of learning.

Bilateral genetic lesions of D2-SPNs were performed in the DMS of adora2a-Cre::drd2-eGFP hybrid mice (fig. S8, D and E) (eight mice per group). Initial learning: (A) Sham and Lesioned mice were trained to two action–outcome (A-O) contingencies, resulting in increased performance (press/min) across days. (B) Initial devaluation test: a choice (A1 versus A2) was presented after having sated the mice on one or the other outcome (O1/O2) over consecutive days. Graph shows performance on the valued (blue: provides nonsated O) and devalued (gray: provides sated O) levers. Additional learning: (C) Mice were then trained to the reversed A-O contingencies, which rapidly increased press/min performance. (D) A new round of devaluation and choice tests were presented [as in (C)]. *, significant overall effect (black) and interaction (red). n.s., not significant (table S5).


One of the most intriguing characteristics of the striatum is the random spatial distribution and high degree of intermingling between its D1 (direct) and D2 (indirect) projection systems, a feature that is actively promoted developmentally (14) and that has been retained throughout evolution (15). The result is a highly entropic binary mosaic that extends through an expansive and homogeneous space and that is mostly devoid of histological boundaries (16). Such organization is unusual in the brain and can be seen as an adaptation to provide an optimal postsynaptic scaffold for the integration of regionally meaningful neuromodulatory signals (17). In such a plain, borderless environment, the rules established locally by D1 and D2-SPNs are likely to be critical in defining functional territories throughout the striatum, and this, we propose, is the key process shaping striatal-dependent learning.

Our study suggests that the striatum takes full advantage of this “one-to-one” binary mosaic structure, in which activated D2-SPNs access and modify developing behavioral programs encoded by regionally defined ensembles of transcriptionally active D1-SPNs (what we call D2-to-D1 transmodulation). We propose that this process is slow, as it depends on the molecular integration of additive neuromodulatory signals (5), but could, with time, create the functional boundaries necessary to identify and shape specific learning in the striatum. A good example of this sort of dynamic, persistent neuromodulation is the recently described “wave-like” motion of DA signals throughout the mediolateral axis of the striatum (17). Beyond offering a broad solution to the credit assignment problem, recurrent waves of neuromodulatory activity in defined striatal areas could provide the kind of unbiased signal that, in the context of the molecular dichotomies established by D1 and D2 receptors (8), shape the striatal mosaic into meaningful transcriptional motifs. In the case of extinction learning, as observed here, noisy alternations between DA-rich and DA-lean states within the DMS appear to generate a mixed population of activated SPNs comprising both D1 and D2 systems. This regional overlap lays the groundwork for the local one-to-one modulation that shapes and integrates new learning, limiting outdated D1-SPN function in the case of extinction learning, and segregating new and existing territories of plasticity in the case of action-outcome identity reversal.

Supplementary Materials

Materials and Methods

Figs. S1 to S8

Tables S1 to S5

References (1825)

References and Notes

Acknowledgments: We thank Z. Skrbis for technical assistance. Funding: This work was supported by the Australian Research Council (Grants DE160101275 to J.B.-G., DP19010251 to J.B.-G. and M.M. and DP150104878 to B.W.B.) and by a Fellowship from the NHMRC of Australia to B.W.B. (GNT1079561). Author contributions: M.M., B.W.B. and J.B.-G. conceived the study and designed the experiments. M.M. and J.B.-G. performed behavioral experiments. J.B.-G. performed surgeries and viral manipulations. A.E.M. and S.B.M. designed and produced the HSV viruses. A.E.M. and J.D.M. performed the experiments with HSV viruses. M.M. performed quantitative imaging analyses. M.M. and J.B.-G. performed statistical analyses. M.M., B.W.B., and J.B.-G. wrote the paper. Competing interests: The authors declare no competing interests. Data and materials availability: All source data supporting the conclusions of the study are published in Figshare data repository ( AAV vectors and transgenic mouse lines were obtained under material transfer agreements with Addgene, the University of North Carolina, Jackson Laboratories and the Rockefeller University.

Stay Connected to Science

Navigate This Article