Report

Network Resets in Medial Prefrontal Cortex Mark the Onset of Behavioral Uncertainty

See allHide authors and affiliations

Science  05 Oct 2012:
Vol. 338, Issue 6103, pp. 135-139
DOI: 10.1126/science.1226518

Changing Your Belief

The ability to display behavioral flexibility depends on an internal representation of the environment—a framework of beliefs that can be adjusted on the basis of experience. Recording from multiple electrodes in the rat medial prefrontal cortex, Karlsson et al. (p. 135) investigated how ensembles of neurons changed their activity during the performance of a task in which the animal had to update its knowledge of reward contingencies. The results suggest that changes in perceived action-outcome contingencies were associated with abrupt switches in neuronal representations in the rat medial prefrontal cortex.

Abstract

Regions within the prefrontal cortex are thought to process beliefs about the world, but little is known about the circuit dynamics underlying the formation and modification of these beliefs. Using a task that permits dissociation between the activity encoding an animal’s internal state and that encoding aspects of behavior, we found that transient increases in the volatility of activity in the rat medial prefrontal cortex accompany periods when an animal’s belief is modified after an environmental change. Activity across the majority of sampled neurons underwent marked, abrupt, and coordinated changes when prior belief was abandoned in favor of exploration of alternative strategies. These dynamics reflect network switches to a state of instability, which diminishes over the period of exploration as new stable representations are formed.

The ability of animals to display behavioral flexibility depends on an internal representation of the environment—a framework of beliefs shaped by experience (13). How beliefs are encoded remains unclear. Some models posit gradual updates to the internal representation (4, 5); others invoke abrupt jumps, or resets, in representation (69). Such resets may occur after a marked environmental change has been detected, causing prior information to be discarded. A new internal representation is then constructed by resampling the environment. Indeed, sudden behavioral transitions from the pursuit of a single behavioral option to exploration have been described (10, 11); such findings suggest that prior beliefs can be abandoned in favor of a state of “knowing nothing.”

The medial prefrontal cortex (mPFC) has been implicated in the estimation of relevant environmental statistics that guide the selection of an appropriate behavioral strategy. In primates and rodents, single mPFC neuron activity correlates with the outcomes of previous decisions over diverse time scales (1216) and can change abruptly in response to global changes of task rules (17, 18). In primates, the mPFC is required for rapid behavioral adaptation to changing action-outcome contingencies (19); in humans, its activity correlates with the volatility of the environment—a parameter known to set the rate of such adaptation (5). Activity patterns in the mPFC are consistent with them representing an animal’s belief about the environment’s governing rules: In essence, the mPFC encodes multiple parameters associated with the structure of a task (2022), including those that can only be inferred using its internal representation (23). But it remains unclear how ensemble activity in the mPFC changes during a period of uncertainty, when animals have sufficient evidence to reject an existing set of beliefs.

We examined mPFC neural ensemble dynamics in a rat behavioral task designed not only to provide a readout of the transition to exploration but also to enable us to distinguish activity encoding internal beliefs from activity encoding sensation, perception, or behavioral output. In this task, an uncued change in outcome probability causes a state of uncertainty that manifests itself as a decreased resolve to pursue a single behavioral option and an increased exploration of previously rejected choices (Fig. 1). Trial rejection rates often declined transiently after block transitions, prior to a switch in choice preference (Fig. 1B, boxed region in raw data, and Fig. 1D), although some transitions in choice preference happened without transient resampling of both trial types (fig. S1B). Reaction times for accepted trials increased during these periods of decreased rejection rate, as expected for exploratory bouts (Wilcoxon rank sum test, P < 0.01 for 10 trials after versus before acceptance change points).

Fig. 1

Rejection rate as a measure of confidence. (A) Sequential, tone-cued presentation of two behavioral options, with distinct reward probabilities (between 0.15 and 0.8, changed every 150 to 300 trials). The revealed option can be rejected on each presentation. (B) Rejection behavior around a block transition. Note the transient decrease in rejection (boxed region) followed by a commitment to reject the other trial type. (C) Standard scores for rejection rates (±SEM) across blocks of trials. Preferential (left) and total (right) rejection rates are higher for high reward probability contrast (high contrast, >0.5; low, ≤0.5). ***P < 0.005, Wilcoxon rank sum test. (D) Mean normalized rejection rate (±SEM) during transitions in acceptance preference. Note the transient decline in trial rejection around transitions in acceptance. Data in (C) and (D) are from 137 sessions, 12 animals.

Unlike deterministic tasks, in which a rule change can be inferred from a single reward omission, detection of change in this task requires gradual evidence accumulation, separating in time the point of abrupt environmental change from the point of awareness. Furthermore, the stochastic nature of action-outcome association invariably leads animals to sample both sides. Thus, even after the animal has detected a change in the environment, analysis of changes in neural activity can be restricted to trials with the same motor output, which is important because changes in motor output can themselves lead to large changes in mPFC activity (24). In addition, at switches in behavioral preference, the local outcome probability will be constant for trials to the same side. This task design therefore permits us to determine whether abrupt dynamics in neural activity are linked to changes in the internal state of the animal.

Within this context, we asked whether representation resets occur in the mPFC when the animal’s confidence in representation of task parameters declines. As in other prefrontal areas, mPFC representation of individual task parameters is distributed across large neuronal networks (21, 25). A sudden reset in the representation of the majority of task-related parameters that should accompany a decision to suddenly abandon old beliefs would likely be revealed as a change in activity that is not only rapid, but also coordinated and widespread across the neuronal ensemble.

We first searched for neurophysiological evidence of abrupt, coordinated, and widespread changes in neural dynamics while blind to the behavioral state of the animal. Retrospectively, we determined whether changes in neural dynamics correlated with behavioral state changes. During each recording session, we found occasions when unexpectedly large changes in neural activity occurred between sequential trials for many of the simultaneously recorded cells (see Fig. 2, A to C, for example and metric description). We refer to the 87 abrupt, coordinated, and widespread changes in population activity, detected across 29 recording sessions, as network transitions. These transitions could not be accounted for by instability in neural recordings (fig. S4), by variance in the animal’s spatial trajectory (fig. S5), or by large gaps in the time between trials (fig. S6).

Fig. 2

Network transitions in the mPFC. (A) Coordinated activity change for 5 of 15 cells active during left-bound acceptance trials. Time is aligned to the three analysis points (blue, green, and red vertical lines at top, corresponding to colors in cartoon below) when spatial trajectories were stereotyped. Arrow denotes network transition. (B) Firing rate changes at each analysis point. Center: Dashed black lines illustrate the slopes of the firing rate; gray rectangles are used for reference of x-axis scaling. Right: Absolute slopes versus trial. (C) Detection of network transitions. Top: Absolute firing-rate slope medians across the population of recorded cells [for (A)]. Bottom: Number of times one would expect to see this level of firing rate change per session by chance. ***P < 0.005. (D) Left: Two scenarios of network state evolution. Upper left: Pre- and post-transition locations form tight separate clusters (red crosses denote centroids of clusters). Lower left: Gradual progression. Right: Relative ensemble distances to pre- and post-transition centroids. (E) Network transitions with a given fraction of ensemble displaying firing rate changes of ≥0.75 (top) or ≥1.0 (bottom) of mean firing rate. (F) Mean population rate before and after network transition (5 trials on either side). Points cluster near the identity line. In (D) to (F), data are shown for all 87 transitions.

Do these network transitions exhibit the type of dynamics expected when old beliefs are abandoned and task representations are rapidly reset? To determine the abruptness of the activity change across the network, we used a high-dimensional representation of the activity. With the state of the network on each trial represented as a point in this high-dimensional space, we asked whether points for the 10 trials covering the network transition fell into separable clusters, with the network suddenly jumping from one cluster to another within a single trial (Fig. 2D, left panel). Abrupt dynamics were evident for all transitions, which suggests that the associated network-wide activity changes occurred rapidly (Fig. 2D, right panel, and supplementary materials). We next determined the fraction of neurons that displayed firing rate changes above set thresholds. Around the time of a network transition, 72% of all cells displayed abrupt changes of at least 0.75 of their mean firing rate, and 58% showed changes at least equal to their mean firing rate (Fig. 2E), whereas the average firing rate of the ensemble remained unchanged (Fig. 2F; Wilcoxon rank sum, Z = 1.16, not significant).

We next evaluated the temporal relationship between network transitions and rejection behavior. At occurrences of network transitions, the rejection rate typically declined suddenly (Fig. 3, A and B; Wilcoxon rank sum, Z = 6.99, P < 10−11 for rejection rates in the first versus the second 5 trials of the 10-trial window around the network transitions; see also figs. S7 and S8). Conversely, behavioral segments with negative change in rejection rate were preferentially associated with network transitions for all ranges of acceptance-preference change, reaching significance for medium and high ranges (Fig. 3C). Thus, network transitions do not mark simple changes in behavioral preference or changes in task context. Similarly, a decline in the rejection rate predicted network transitions much more accurately than did local changes in reward rate (fig. S9). The observed abrupt changes in neural activity could not be explained by a sudden change in behavioral output between the two trials flanking the network transition due to the onset of exploratory behavior. With the task design, the manifestation of a decision to resample the previously nonpreferred option had to wait for a variable number of trials, until that option was randomly presented by the computer. In fact, there was no change in local history for a sizable fraction of all network transitions, as the two flanking trials were also consecutive in time (25 of 87; fig. S10); thus, a change in the animal’s internal state remained as the only likely explanation.

Fig. 3

Network transitions and the onset of exploration. (A) Two network transitions. Top: Activity dynamics of seven cells around the first transition. Dashed line indicates transition; dashed ovals at analysis point 1 mark the transient variable state for each cell. Note synchronized onset of the state change, but variable length of the intermediate state. Middle: Expected frequency for observed median change in firing rate (Fig. 2) during 80 right-bound acceptance trials. Bottom: Trial rejection rates; red circles indicate the 10 peritransition trials. (B) Normalized rejection rates (means ± SEM) for the 10 peritransition trials for all transitions. Network transitions occur between shaded regions. ***P < 0.005. (C) Network transition frequency for nine behavior types (acceptance slope boundaries for equal partitioning: 0, 0.0069, 0.0252, and 0.1019; rejection slope: 0.0728, 0.0108, –0.0114, and –0.0783). Note strong bias for network transitions during decreases in rejection rate for periods with medium (Z-test for proportions, Z = 2.50, P < 0.013) and high acceptance change (Z = 2.43, P < 0.016, Wilcoxon rank sum test). *P < 0.05.

We also examined the evolution of the dynamics of the network around the time of abrupt transitions. A signature of an upcoming network transition was found in local field potentials (LFPs) as a transient increase in the power of the high-gamma (65 to 140 Hz) band during the feedback period (analysis time point 3) of the trial preceding the transition (Fig. 4A). No significant change in LFPs was observable at other analysis time points or in the theta (5 to 10 Hz) or low-gamma (25 to 55 Hz) bands. In contrast to the change in LFPs, which was highly localized in time, the dynamics of mPFC spiking activity around network transitions were significantly more complex. Average trial-to-trial variability in activity began to increase gradually a few trials after the change in reward probabilities (Fig. 4B; median = 17 trials). After the network transition, trial-to-trial variability remained elevated throughout the exploratory bout (Fig. 4B; 12 trials). To test whether the network returned to the same (or a similar) state after an exploratory bout, we computed a trial-to-trial similarity matrix for the ensemble state (Fig. 4C). Blocks of relative network stability flanked the transient period of increased trial-to-trial variability surrounding the detected network transitions (Fig. 4B). In the minority of cases, the two states were not statistically different, and in a few cases the network likely returned to the same state after the exploratory period (Fig. 4C, left). For most network transitions, however, the subsequent stable network state was significantly different from the preceding state (Fig. 4C, right; 65% of all transitions, Wilcoxon rank sum test comparing pairwise distances within and between states).

Fig. 4

Network dynamics around transitions. (A) High-gamma band (65 to 140 Hz) power of the local field potential at analysis point 3 for 10 peritransition trials. Trials 0 and 1 flank the transition (arrow). (B) Trial-to-trial rate variability for 10 trial windows centered on different trials around each detected network transition. Values are average percentiles across all detected network transitions for individual relative window positions (±SEM). The dashed box indicates trials (–5 to 5) for which variability is overestimated because of the large network transition. The orange period denotes median time between imposed change and network transition; the green period is the approximate period of exploration based on trial rejection dynamics after network transition. Top: Average ensemble firing rate in the same 10-trial sliding window. (C) Similarity matrices for two network transitions from different behavioral sessions derived using Euclidean distance in network state space. The time of network transition and start of new stable activity are indicated along the diagonal.

Our findings reveal abrupt, coordinated, and widespread changes in neural ensemble activity in the mPFC as animals detect a change in their environment. These observed dynamics in the mPFC are linked to moments when an animal’s confidence in its estimates or beliefs about the task environment declines. A previous study suggested that abrupt changes in neural activity represent moments of sudden insight (18). Our analysis did not identify a strong (if any) correlation between network transitions and moments when trial rejection returns to a stable, high level. There are two plausible explanations for this apparent discrepancy: (i) The transitions seen by the earlier study reflect abrupt behavioral changes, or (ii) in a setting of stochastic action-outcome associations, the transition to confidence in the newly established beliefs (and the underlying activity change) is more gradual.

The detection of a highly selective increase in high-gamma power in the mPFC during the feedback period on the trial preceding the network transition suggests that the decision to explore is made at that time. A recent study of mPFC processing in a task that required monkeys to search through a space of possible hypotheses while searching for a rewarded target (17) described an increase in high-gamma power during the feedback period throughout the search process; the authors interpreted this increase in high-gamma power as a signature of the exploratory period. However, an alternative interpretation that explains both data sets is that high-gamma power in the mPFC is increased whenever the current hypothesis about the world is about to be discarded.

The trigger for abrupt changes in mPFC network dynamics remains unclear. The coordinated nature of the change in mPFC activity raises the possibility that abrupt network reorganization is the result of the action of a neuromodulator, potentially noradrenaline (26, 27). Phasic noradrenaline release in target areas such as the mPFC may trigger network resets whenever rapid behavioral adaptation is warranted in light of unexpected uncertainty (6, 2628).

After changes in reward contingencies, the temporal link between network dynamics and behavior spans three distinct phases. The slow increase in neural ensemble variability during the first phase is associated with a period when evidence of environmental change is likely to be accumulated (15). The subsequent abrupt network transition correlates with the onset of exploration. The final gradual return of the ensemble variability to baseline is associated with the period of exploratory sampling when new beliefs about task parameters are formed. Although the possibility that the internal representation of the environment’s governing rules might also be stored in other brain areas cannot be excluded, the tight temporal link between network dynamics and behavior, taken together with existing lesion and physiological evidence (5, 15, 19, 22, 23), suggests that the mPFC monitors or even directs the evolution of this representation.

Supplementary Materials

www.sciencemag.org/cgi/content/full/338/6103/135/DC1

Materials and Methods

Figs. S1 to S10

References (2932)

References and Notes

  1. Acknowledgments: We thank N. Ozel and B. Shields for technical help; K. Vicari for task illustration; and T. Jessell, W. Denk, V. Jayaraman, A. Leonardo, J. Dudman, E. Pastalkova, G. Murphy, K. Svoboda, R. Desimone, S. Druckmann, S. Eddy, and N. Spruston for comments on the manuscript. Supported by the Howard Hughes Medical Institute.
View Abstract

Stay Connected to Science

Navigate This Article