Research Article

Spontaneous behaviors drive multidimensional, brainwide activity

See allHide authors and affiliations

Science  19 Apr 2019:
Vol. 364, Issue 6437, eaav7893
DOI: 10.1126/science.aav7893

Neuron activity across the brain

How is it that groups of neurons dispersed through the brain interact to generate complex behaviors? Three papers in this issue present brain-scale studies of neuronal activity and dynamics (see the Perspective by Huk and Hart). Allen et al. found that in thirsty mice, there is widespread neural activity related to stimuli that elicit licking and drinking. Individual neurons encoded task-specific responses, but every brain area contained neurons with different types of response. Optogenetic stimulation of thirst-sensing neurons in one area of the brain reinstated drinking and neuronal activity across the brain that previously signaled thirst. Gründemann et al. investigated the activity of mouse basal amygdala neurons in relation to behavior during different tasks. Two ensembles of neurons showed orthogonal activity during exploratory and nonexploratory behaviors, possibly reflecting different levels of anxiety experienced in these areas. Stringer et al. analyzed spontaneous neuronal firing, finding that neurons in the primary visual cortex encoded both visual information and motor activity related to facial movements. The variability of neuronal responses to visual stimuli in the primary visual area is mainly related to arousal and reflects the encoding of latent behavioral states.

Science, this issue p. eaav3932, p. eaav8736, p. eaav7893; see also p. 236

Structured Abstract


In the absence of sensory inputs, the brain produces structured patterns of activity, which can be as large as or larger than sensory-driven activity. Ongoing activity exists even in primary sensory cortices and has been hypothesized to reflect recapitulation of previous sensory experiences, or expectations of possible sensory events. Alternatively, ongoing activity could be related to behavioral and cognitive states.


Much previous work has linked spontaneous neural activity to behavior through one-dimensional measures like running speed and pupil diameter. However, mice perform diverse behaviors consisting of whisking, licking, sniffing, and other facial movements. We hypothesized that there exists a multidimensional representation of behavior in visual cortex and that previously reported “noise” during stimulus presentations may in fact be behaviorally driven. To investigate this, we recorded the activity of ~10,000 neurons in visual cortex of awake mice using two-photon calcium imaging, while simultaneously monitoring the facial movements using an infrared camera. In a second set of experiments, we recorded the activity of thousands of neurons across the brain using eight simultaneous Neuropixels probes, again videographically monitoring facial behavior.


First, we found that ongoing activity in visual cortex is high dimensional: More than a hundred latent dimensions could be reliably extracted from the population activity. We found that a third of this activity could be predicted by a multidimensional model of the mouse’s behavior, extracted from the face video. This behaviorally related activity was not limited to visual cortex. We observed multidimensional representations of behavior in electrophysiological recordings from frontal, sensorimotor, and retrosplenial cortex; hippocampus; striatum; thalamus; and midbrain. Even though both behavior and neural activity contained fast–time scale fluctuations on the order of 200 ms, they were only related to each other at a time scale of about 1 s. We next investigated how this spontaneous, behavior-related signal interacts with stimulus responses. The representation of sensory stimuli and behavioral variables was mixed in the same neurons: The fractions of each neuron’s variance explained by stimuli and by behavior were only slightly negatively correlated, and neurons with similar stimulus responses did not have more similar behavioral correlates. Nevertheless, at a population level, the neural dimensions encoding motor variables overlapped with those encoding visual stimuli along only one dimension, which coherently increased or decreased the activity of the entire population. Activity in all other behaviorally driven dimensions continued unperturbed regardless of sensory stimulation.


The brainwide representation of behavioral variables suggests that information encoded nearly anywhere in the forebrain is combined with behavioral state variables into a mixed representation. We found that these multidimensional signals are present both during ongoing activity and during passive viewing of a stimulus. This suggests that previously reported noise during stimulus presentations may consist primarily of behavioral-state information. What benefit could this ubiquitous mixing of sensory and motor information provide? The most appropriate behavior for an animal to perform at any moment depends on the combination of available sensory data, ongoing motor actions, and purely internal variables such as motivational drives. Integration of sensory inputs with motor actions must therefore occur somewhere in the nervous system. Our data indicate that it happens as early as primary sensory cortex.

Large-scale neural population recordings can be predicted from behavior.

We used new recording technologies to simultaneously monitor the activity of ~10,000 neurons in a single brain area and ~3000 neurons from across the brain (top left). These neurons showed slow–time scale patterns of coactivation restricted to subsets of neurons which were distributed across the brain (top right). The patterns of neural activity appeared to be driven by specific spontaneous behaviors that the animals engaged in during the experiment. We tracked these spontaneous behaviors by projecting a video recording of the mouse face onto a set of canonical “eigenfaces” (bottom left) and used these projections to predict a large fraction of the neural activity (bottom right). t, time; PC, principal component.


Neuronal populations in sensory cortex produce variable responses to sensory stimuli and exhibit intricate spontaneous activity even without external sensory input. Cortical variability and spontaneous activity have been variously proposed to represent random noise, recall of prior experience, or encoding of ongoing behavioral and cognitive variables. Recording more than 10,000 neurons in mouse visual cortex, we observed that spontaneous activity reliably encoded a high-dimensional latent state, which was partially related to the mouse’s ongoing behavior and was represented not just in visual cortex but also across the forebrain. Sensory inputs did not interrupt this ongoing signal but added onto it a representation of external stimuli in orthogonal dimensions. Thus, visual cortical population activity, despite its apparently noisy structure, reliably encodes an orthogonal fusion of sensory and multidimensional behavioral information.

In the absence of sensory inputs, the brain produces structured patterns of activity, which can be as large as or larger than sensory-driven activity (1). Ongoing activity exists even in primary sensory cortices and has been hypothesized to reflect recapitulation or expectation of sensory experience. This hypothesis is supported by studies that found similarities between sensory-driven and spontaneous firing events (27). Alternatively, ongoing activity could be related to behavioral and cognitive states. The firing of sensory cortical and sensory thalamic neurons correlates with behavioral variables such as locomotion, pupil diameter, and whisking (823). Continued encoding of nonvisual variables when visual stimuli are present could at least in part explain the trial-to-trial variability in cortical responses to repeated presentation of identical sensory stimuli (24).

The influence of trial-to-trial variability on stimulus encoding depends on its population-level structure. Variability that is independent across cells—such as the Poisson-like variability produced in balanced recurrent networks (25)—presents little impediment to information coding, because reliable signals can still be extracted as weighted sums over a large enough population. By contrast, correlated variability has consequences that depend on the form of the correlations. If correlated variability mimics differences in responses to different stimuli, it can compromise stimulus encoding (26, 27). Conversely, correlated variability in dimensions orthogonal to those encoding stimuli has no adverse impact on coding (28) and might instead reflect encoding of signals other than visual inputs.

Spontaneous cortical activity reliably encodes a high-dimensional latent signal

To distinguish between these possibilities, we characterized the structure of neural activity and sensory variability in mouse visual cortex. We simultaneously recorded from 11,262 ± 2282 (mean ± SD) excitatory neurons, over nine sessions in six mice using two-photon imaging of a fluorescent calcium indicator in an 11-plane configuration (29) (Fig. 1, A and B, and Movie 1). These neurons’ responses to classical grating stimuli revealed robust orientation tuning as expected in visual cortex (fig. S1).

Fig. 1 Structured ongoing population activity in visual cortex.

(A and B) Two-photon calcium imaging of ~10,000 neurons in visual cortex using multiplane resonance scanning of 11 planes spaced 35 μm apart. Detected neurons are randomly colored for visualization. (C) Distribution of pairwise correlations in ongoing activity, computed in 1.2-s time bins (yellow). Gray indicates distribution of correlations after randomly time-shifting each cell’s activity. (D) Distribution of pairwise correlations for each recording (showing 5th and 95th percentiles). (E) First PC (normalized) versus running speed in 1.2-s time bins. a.u., arbitrary units. (F) Example time course of running speed (green), pupil area (gray), whisking (light green), and first PC of population activity (magenta dashed). (G) Neuronal activity, with neurons sorted vertically by first PC weighting and the same time segment as in (F). (H) Same neurons as in (G), sorted by a manifold embedding algorithm. (I) SVCA method for estimating reliable variance. (J) Example time courses of SVCs from each cell set in the test epoch (1.2-s bins). (K) Same as (J), plotted as a scatter plot. The r value is the Pearson correlation between cell sets, which is an estimate of that dimension’s reliable variance. (L) Percentage of reliable variance for successive dimensions. (M) Reliable variance spectrum, power law decay of exponent 1.14. (N) Percentage of each SVC’s total variance that can be reliably predicted from arousal variables [colors as in (F)]. (O) Percentage of total variance in first 128 dimensions explainable by arousal variables.

Movie 1. Spontaneous activity of more than 10,000 neurons in visual cortex of awake mice.

Two-photon calcium imaging of 11 planes spaced 35 μm apart. Movie speed is 10× real time.

We began by analyzing spontaneous activity in mice free to run on an air-floating ball. Six of nine recordings were performed in darkness, but we did not observe differences between these recordings (shown in red in Figs. 1 and 2) and recordings with gray screen (yellow in Figs. 1 and 2). Mice spontaneously performed behaviors such as running, whisking, sniffing, and other facial movements, which we monitored with an infrared camera.

Ongoing population activity in visual cortex was highly structured (Fig. 1, C to H). Correlations between neuron pairs were reliable (fig. S2), and their spread was larger than would be expected by chance (Fig. 1, C and D), suggesting structured activity (30). Fluctuations in the first principal component (PC) occurred over a time scale of many seconds (fig. S3) and were coupled to running, whisking, and pupil diameter. These arousal-related variables correlated with each other (fig. S4, A and B) and together accounted for about 50% of the variance of the first neural PC (Fig. 1E and fig. S4C). Correlation with the first PC was positive or negative in approximately similar numbers of neurons (57 ± 6.7% SE positive), indicating that two large subpopulations of neurons alternate their activity (Fig. 1, F and G). The slowness of these fluctuations suggests a different underlying phenomenon to previously studied “up and down phases” (5, 3134), which alternate at a much faster time scale (~100 ms instead of multiple seconds) and correlate with most neurons positively. Indeed, up-down phases could not even have been detected in our recordings, which scanned the cortex every 400 ms.

Spontaneous activity had a high-dimensional structure, more complex than would be predicted by a single factor such as arousal. We sorted the raster diagram so that nearby neurons showed strong correlations (Fig. 1H and fig. S5). Position on this continuum bore little relation to actual distances in the imaged tissue (fig. S6), suggesting that this activity was not organized topographically.

Despite its noisy appearance, spontaneous population activity reliably encoded a high-dimensional latent signal (Fig. 1, I to K). We devised a method to identify dimensions of neural variance that are reliably determined by common underlying signals, termed shared variance component analysis (SVCA). We divided the recorded neurons into two spatially segregated sets and divided the recorded time points into two equal halves (training and test; Fig. 1I). The training time points were used to find the dimensions in each cell set’s activity that maximally covary with each other. These dimensions are termed shared variance components (SVCs). Activity in the test time points was then projected onto each SVC (Fig. 1J), and the correlation between projections from the two cell sets (Fig. 1K) provided an estimate of the reliable variance in that SVC (see methods and appendix). The fraction of reliable variance in the first SVC was 97% (Fig. 1, K and L), implying that only 3% of the variance along this dimension reflected independent noise. The reliable variance fraction of successive SVCs decreased slowly, with the 50th SVC at ~50% reliable variance and the 512th at ~9% (Fig. 1L).

The magnitude of reliable spontaneous variance was distributed across dimensions according to a power law of exponent 1.14 (Fig. 1M). This value is larger than the power law exponents close to 1.0 seen for stimulus responses (35) but still indicates a high-dimensional signal. The first 128 SVCs together accounted for 86 ± 1% SE of the complete population’s reliable variance, and 67 ± 3% SE of the total variance in these 128 dimensions was reliable. Arousal variables accounted for 11 ± 1% SE of the total variance in these 128 components (16% of their reliable variance) and primarily correlated with the top SVCs (Fig. 1, N and O). Thousands of neurons were required to reliably characterize activity in hundreds of dimensions, and the estimated reliability of higher SVCs increased with the number of neurons analyzed (fig. S7, A to E), suggesting that recordings of larger populations would identify yet more dimensions.

Ongoing neural activity encodes multidimensional behavioral information

Although arousal measures only accounted for a small fraction of the reliable variance of spontaneous population activity, it is possible that higher-dimensional measures of ongoing behavior could explain a larger fraction (Fig. 2, A to C, and Movie 2). We extracted a 1000-dimensional summary of the motor actions visible on the mouse’s face by applying principal components analysis to the spatial distribution of facial motion at each moment in time (36). The first PC captured motion anywhere on the mouse’s face and was strongly correlated with explicit arousal measures (fig. S4B), whereas higher PCs distinguished different types of facial motion. We predicted neuronal population activity from this behavioral signal using reduced rank regression: For any N, we found the N dimensions of the video signal predicting the largest fraction of the reliable spontaneous variance (Fig. 2D).

Fig. 2 Multidimensional behavior predicts neural activity.

(A) Frames from a video recording of a mouse’s face. t and t+1 indicate two consecutive frames. (B) Motion energy, computed as the absolute value of the difference of consecutive frames. (C) Spatial masks corresponding to the top three PCs of the motion energy movie. The scale on the right indicates low-motion energy (blue) to high-motion energy (red). (D) Schematic of reduced rank regression technique used to predict neural activity from motion energy PCs. (E) Cross-validated fraction of successive neural SVCs predictable from face motion (blue), together with fraction of variance predictable from running, pupil, and whisking (green), and fraction of reliable variance (the maximum explainable, gray; compare with Fig. 1L). (F) Prediction of population activity from behavior. Raster representation (top) of ongoing neural activity in an example experiment, with neurons arranged vertically as in Fig. 1H so correlated cells are close together. Prediction of this activity (bottom) from facial videography (predicted using separate training time points). (G) Percentage of the first 128 SVCs’ total variance that can be predicted from facial information, as a function of number of facial dimensions used. (H) Prediction quality from multidimensional facial information, compared with the amount of reliable variance, both as a percentage of variance explained. (I) Adding explicit running, pupil, and whisker information to facial features provides little improvement in neural prediction quality, measured as percentage of variance explained. (J) Prediction quality as a function of time lag used to predict neural activity from behavioral traces.

Movie 2. Multidimensional spontaneous behaviors.

Movie speed is 5× real time.

This multidimensional behavior measure predicted approximately twice as much variance as the three arousal variables (Fig. 2, D to J, and Movie 3). To visualize how multidimensional behavior predicts ongoing population activity, we compared a raster representation of raw activity (vertically sorted as in Fig. 1H) with the prediction based on multidimensional videography (Fig. 2F; see fig. S5 for all recordings). To quantify the quality of prediction, and the dimensionality of the behavioral signal encoded in visual cortex, we focused on the first 128 SVCs (accounting for 86% of the population’s reliable variance). The best one-dimensional predictor extracted from the facial motion movie captured the same amount of variance as the best one-dimensional combination of whisking, running, and pupil (Fig. 2G). Prediction quality continued to increase with up to 16 dimensions of videographic information (and beyond, in some recordings), suggesting that visual cortex encodes at least 16 dimensions of motor information. These dimensions together accounted for 21 ± 1% SE of the total population variance (31 ± 3% of the reliable variance; Fig. 2H), substantially more than the three-dimensional model of neural activity using running, pupil area, and whisking (11 ± 1% SE of the total variance, 17 ± 1% SE of the reliable variance). Moreover, adding these three explicit predictors to the video signal increased the explained variance by less than 1% (Fig. 2I), even though the running signal provided information not derived from video. A neuron’s predictability from behavior was not related to its cortical location (fig. S8). The time scale with which neural activity could be predicted from facial behavior was ~1 s (Fig. 2J and fig. S7H), reflecting the slow nature of these behavioral fluctuations.

Movie 3. Spontaneous behaviors are correlated with spontaneous neural activity.

Video of mouse face recorded simultaneously with neural activity. Movie speed is 10× real time.

Behaviorally related activity is spread across the brain

Patterns of spontaneous V1 activity were a reflection of activity patterns spread across the brain (Fig. 3, A to E). To show this, we used eight Neuropixels probes (37) to simultaneously record from frontal, sensorimotor, visual, and retrosplenial cortex, hippocampus, striatum, thalamus, and midbrain (Fig. 3, A and B). The mice were awake and free to rotate a wheel with their front paws. From recordings in three mice, we extracted 2296, 2668, and 1462 units stable across ~1 hour of ongoing activity and binned neural activity into 1.2-s bins, as for the imaging data.

Fig. 3 Behaviorally related activity across the forebrain in simultaneous recordings with eight Neuropixels probes.

(A) Reconstructed probe locations of recordings in three mice. (B) Example histology slice showing orthogonal penetrations of eight electrode tracks (DiI fluorescent dye, red) through a calbindin-counterstained (green) horizontal section. (C) Comparison of mean correlation between cell pairs in a single area, with mean correlation between pairs with one cell in that area and the other elsewhere. Each dot represents the mean over all cell pairs from all recordings, color coded as in (D). (D) Mean correlation of cells in each brain region with first PC of facial motion. Error bars represent standard deviation. (E) Prediction of population activity from behavior. Raster representation (top left) of ongoing population activity for an example experiment, sorted vertically so nearby neurons have correlated ongoing activity. Prediction of this activity (bottom left) from facial videography. Anatomical location of neurons along this vertical continuum (top right), with each point representing a cell, colored by brain area as in (C) and (D), and the x-axis showing the neuron’s depth from the brain surface. (F) Percentage of population activity explainable from orofacial behaviors as a function of dimensions of reduced rank regression. Each curve shows average prediction quality for neurons in a particular brain area, using the same color coding as in (D). (G) Explained variance as a function of time lag between neural activity and behavioral traces. Each curve shows the average for a particular brain area, using the same color coding as in (D). (H) Same as (G) in 200-ms bins.

Neurons correlated most strongly with others in the same area but also correlated with neurons in other areas, suggesting nonlocalized patterns of neural activity (Fig. 3C). All areas contained neurons positively and negatively correlated with the arousal-related top facial motion PC, although neurons in thalamus showed predominantly positive correlations (Fig. 3D, P < 10−8, two-sided Wilcoxon sign-rank test). Sorting the neurons by correlation revealed a rich activity structure (Fig. 3E). All brain areas contained a sampling of neurons from the entire continuum (Fig. 3E, right), suggesting that a multidimensional structure of ongoing activity is distributed throughout the brain. This spontaneous activity spanned at least 128 dimensions, with 35% of the variance of individual neurons reliably predictable from population activity (fig. S9).

Similar to visual cortical activity, the activity of brainwide populations was partially predictable from facial videography (Fig. 3, F to H). Predictability of brainwide activity again saturated around 16 behavioral dimensions, which predicted on average across areas 21.9% of the total variance (40% of the estimated maximum possible) (Fig. 3F). The amount of behavioral modulation differed between brain regions, with neurons in thalamus predicted best (35.7% of total variance, 59% of estimated maximum). The time scale of videographic prediction was again broad: Neural activity was best predicted from instantaneous behavior, decaying slowly over time lags of multiple seconds (Fig. 3, G and H, and fig. S10), with a full width at half-max of 2.5 ± 0.4 s (mean ± SE). Neural population activity showed coherent structure at time scales faster than this behavioral correlation (280 ± 43 ms, mean ± SE) (fig. S10). The fast–time scale structure modulated nearly all neurons in the same direction, leading to rapid fluctuations in the total population rate (up and down phases); by contrast, the structure seen at lower time scales was dominated by alternation in the activity of different neuronal populations, and steadier total activity [figs. S10 to S12 and (38)].

Stimulus-evoked and ongoing activity overlap along one dimension

We next asked how ongoing activity and behavioral information relates to sensory responses (Fig. 4, A and B). We thus interspersed blocks of visual stimulation (flashed natural images, presented one per second on average) with extended periods of spontaneous activity (gray screen), while imaging visual cortical population activity (Fig. 4A). During stimulus presentation, the mice continued to exhibit the same behaviors as in darkness, resulting in a similar distribution of facial motion components (Fig. 4B).

Fig. 4 Neural subspaces encoding stimuli and spontaneous and behavioral variables overlap along one dimension.

(A) PCs of facial motion energy (top) and firing of 10 example V1 neurons (bottom). (B) Comparison of normalized face motion energy variance for each PC during stimulus presentation and spontaneous periods. Color represents recording identity. (C) The percentage of stimulus-related variance in each dimension of the shared subspace between stimulus- and behavior-driven activity. (D) Distribution of cells’ weights (a.u.) on the single dimension of overlap between stimulus and behavior subspaces. (E) Schematic of population activity geometry. Stimulus- and behavior-driven subspaces are orthogonal, while a single dimension [gray; characterized in panels (C) and (D)] is shared. (F) Stimulus decoding analysis for 32 natural image stimuli from 32 dimensions of activity in the stimulus-only, behavior-only, and spontaneous-only subspaces, together with randomly chosen 32-dimensional subspaces. The y-axis shows the fraction of stimuli that were identified incorrectly. (G) Example of neural population activity projected onto the subspaces defined in (E). (H) Amount of variance of each of the projections illustrated in (E) and (G), during stimulus presentation and spontaneous periods. Each point represents summed variances of the dimensions in the subspace corresponding to the symbol color for a single experiment. (I) Projection of neural responses to two example stimuli into two dimensions of the stimulus-only subspace. Each dot is a different stimulus response. Red is the fit of each stimulus response using the multiplicative gain model. (J) Same as (I) for the behavior-only subspace. (K) Fraction of variance in the stimulus-only subspace explained by constant response on each trial of the same stimulus (avg. model), multiplicative gain that varies across trials (mult. model), and a model with both multiplicative and additive terms (affine model). (L) The multiplicative gain on each trial (red) and its prediction from the face motion PCs (blue).

There were not separate sets of neurons encoding stimuli and behavioral variables; instead, representations of sensory and behavioral information were mixed together in the same cell population. The fractions of each neuron’s variance explained by stimuli and by behavior were only slightly negatively correlated [fig. S13; correlation coefficient (r) = −0.18, P < 0.01, Spearman’s rank correlation], and neurons with similar stimulus responses did not have more similar behavioral correlates (fig. S13; r = −0.005, P > 0.05).

The subspaces encoding sensory and behavior information overlapped in only one dimension (Fig. 4, C to E). The space that encoded behavioral variables contained 11% of the total stimulus-related variance, 96% of which was contained in a single dimension (Fig. 4C) with largely positive weights onto all neurons (85% positive weights, Fig. 4D). Similarly, the space of ongoing activity, defined by the top 128 principal components of spontaneous firing, contained 23% of the total stimulus-related variance, 86% of which was contained in one dimension (85% positive weights). Thus, overlap in the spaces encoding sensory and behavioral variables arises primarily because both can change the mean firing rate of the population: The precise patterns of increases and decreases about this change in mean were essentially orthogonal (Fig. 4E). Analysis of electrophysiological recordings confirmed that the relationship between stimulus-driven and spontaneous activity was dominated by a single shared dimension: The correlation between spontaneous and signal correlations was greatly reduced after projecting out this one-dimensional activity (fig. S14).

Stimulus decoding analysis further confirmed that information about sensory stimuli was concentrated in the stimulus-only subspace. To show this, we trained a linear classifier to identify which stimulus had been presented, from activity in different 32-dimensional neural subspaces. Decoding from the stimulus space yielded a cross-validated error rate of 10.1 ± 4.0%; activity in the spontaneous- or behavior-only spaces yielded errors of 53.1 ± 6.4 and 56.8 ± 6.7%, no better than randomly chosen dimensions (Fig. 4F).

To visualize how the V1 population integrated sensory and behavior-related activity, we examined the projection of this activity onto three orthogonal subspaces: a multidimensional subspace encoding only sensory information (stimulus only), a multidimensional subspace encoding only behavioral information (behavior only), and the one-dimensional subspace encoding both (stimulus-behavior shared dimension) (Fig. 4G and fig. S15). During gray-screen periods, there was no activity in the stimulus-only subspace, but when the stimuli appeared, this subspace became very active. By contrast, activity in the behavior-only subspace was present before stimulus presentation and continued unchanged when the stimulus appeared. The one-dimensional shared subspace showed an intermediate pattern: Activity in this subspace was weak before stimulus onset and increased when stimuli were presented. Similar results were seen for the spontaneous-only and stimulus-spontaneous spaces (Fig. 4G, lower traces). Across all experiments, variance in the stimulus-only subspace was 119 ± 81 SE times larger during stimulus presentation than during spontaneous epochs (Fig. 4H), whereas activity in the shared subspace was 19 ± 12 SE times larger; activity in the face-only and spontaneous-only subspaces was only modestly increased by sensory stimulation (1.4 ± 0.13 SE and 1.7 ± 0.2 SE times larger, respectively).

Trial-to-trial variability in population responses to repeated stimulus presentations reflected a combination of multiplicative modulation in the stimulus space and additive modulation in orthogonal dimensions. To visualize how stimuli affected activity in these subspaces, we plotted population responses to multiple repeats of two example stimuli (Fig. 4, I and J). When projected into the stimulus-only space, the resulting clouds were tightly defined with no overlap (Fig. 4I), but in the behavior-only space, responses to the two stimuli were directly superimposed (Fig. 4J). Variability within the stimulus subspace consisted of changes in the length of the projected activity vectors between trials, resulting in narrowly elongated clouds of points (Fig. 4I), consistent with previous reports of multiplicative variability in stimulus responses (3942). A model in which stimulus responses are multiplied by a trial-dependent factor accurately captured the data, accounting for 89 ± 0.1% SE of the variance in the stimulus subspace (Fig. 4K). Furthermore, the multiplicative gain on each trial could be predicted from facial motion energy (r = 0.61 ± 0.02 SE, cross-validated) and closely matched activity in the shared subspace (r = 0.73 ± 0.06 SE, cross-validated; Fig. 4L). Although ongoing activity in the behavior-only subspace and visual responses in the stimulus-only subspace added independently, we did not observe additive variability within the stimulus-only space itself: An “affine” model also including an additive term did not significantly increase explained variance over the multiplicative model (P > 0.05, Wilcoxon rank-sum test). Similar results were obtained when analyzing responses to grating stimuli rather than natural images (fig. S16).


Ongoing population activity in visual cortex reliably encoded a latent signal of at least 100 linear dimensions, and possibly many more. The largest dimension correlated with arousal and modulated about half of the neurons positively and half negatively. At least 16 further dimensions were related to behaviors visible by facial videography, which were also encoded across the forebrain. The dimensions encoding motor variables overlapped with those encoding visual stimuli along only one dimension, which coherently increased or decreased the activity of the entire population. Activity in all other behavior-related dimensions continued unperturbed regardless of sensory stimulation. Trial-to-trial variability of sensory responses comprised additive ongoing activity in the behavior subspace, and multiplicative modulation in the stimulus subspace, resolving apparently conflicting findings concerning the additive or multiplicative nature of cortical variability (3942).

Our data are consistent with previous reports describing low-dimensional correlates of locomotion and arousal in visual cortex (8, 1016, 33) but suggest that these results were glimpses of a much larger set of nonvisual variables encoded by ongoing activity patterns. Sixteen dimensions of facial motor activity can predict 31% of the reliable spontaneous variance. The remaining dimensions and variance might in part reflect motor activity not visible on the face or only decodable by more advanced methods (4348), or they might reflect internal cognitive variables such as motivational drives.

Many studies have reported similarities between spontaneous activity and sensory responses (27). We also observed a similarity but found it arose nearly exclusively from one dimension of neural activity. This dimension summarized the mean activity of all cells in the population, and variations along it reflected both spontaneous alternation of up and down phases and differences in mean population response between stimuli. These results therefore demonstrate that the statistical similarity of firing patterns during stimulation and ongoing activity need not imply recapitulation of previous sensory experiences but merely that cortex exhibits mean rate fluctuations with or without sensory inputs. Although our results do not exclude that genuine recapitulation could occur in other behavioral circumstances, they reinforce the need for careful statistical analysis before drawing this conclusion: Even a single dimension of common rate fluctuation is sufficient for some previously applied statistical methods to report similar population activity (49).

The brainwide representation of behavioral variables suggests that information encoded nearly anywhere in the forebrain is combined with behavioral-state variables into a mixed representation. We found that these multidimensional signals are present both during ongoing activity and during passive viewing of a stimulus. Recent evidence indicates that they may also be present during a decision-making task (50). What benefit could this ubiquitous mixing of sensory and motor information provide? The most appropriate behavior for an animal to perform at any moment depends on the combination of available sensory data, ongoing motor actions, and purely internal variables such as motivational drives. Integration of sensory inputs with motor actions must therefore occur somewhere in the nervous system. Our data indicate that it happens as early as primary sensory cortex. This is consistent with neuroanatomy: Primary sensory cortex receives innervation both from neuromodulatory systems carrying state information and from higher-order cortices that can encode fine-grained behavioral variables (9). This and other examples of pervasive whole-brain connectivity (5154) may coordinate the brainwide encoding of behavioral variables we have reported here.

Materials and methods

All experimental procedures were conducted according to the UK Animals Scientific Procedures Act (1986). Experiments were performed at University College London under personal and project licenses released by the Home Office following appropriate ethics review.

Preparation for two-photon calcium imaging in visual cortex

The imaging methods were similar to those described elsewhere (14). Briefly, surgeries were performed in seven adult mice (P35 to P125) in a stereotaxic frame and under isoflurane anesthesia (5% for induction, 0.5 to 1% during the surgery). We used mice bred to express GCaMP6s in excitatory neurons (1 EMX-CRE x Ai94 GCaMP6s mouse, 3 CamKII x tetO GCaMP6s mice, and 1 Rasgrf-CRE x Ai94 GCaMP6s mouse) or mice bred to express tdTomato in GAD+ inhibitory neurons (2 GAD-Cre x tdTomato mice), allowing inhibitory neurons to be identified and excluded from further analysis. We did not observe epileptiform activity in any of these mice (55).

Before surgery, Rimadyl was administered as a systemic analgesic, and lidocaine was administered locally at the surgery site. During the surgery we implanted a headplate for later head-fixation, made a craniotomy of 3 to 4 mm in diameter with a cranial window implant for optical access, and, in Gad-Cre x tdTomato transgenics, performed virus injections with a beveled micropipette using a Nanoject II injector (Drummond Scientific Company, Broomall, PA 1) attached to a stereotaxic micromanipulator. We used AAV2/1-hSyn-GCaMP6s, acquired from University of Pennsylvania Viral Vector Core. Injections of 50 to 200 nl virus (1 × 1012 to 3 × 1012 GC/ml) were targeted to monocular V1, 2.1 to 3.3 mm laterally and 3.5 to 4.0 mm posteriorly from Bregma. To obtain large fields of view for imaging, we typically performed 4 to 8 injections at nearby locations, at multiple depths (~500 and ~200 μm). Rimadyl was then used as a postoperative analgesic for 3 days, delivered to the mice via their drinking water.

Data acquisition

We optically recorded neural activity in head-fixed awake mice implanted with 3- to 4-mm cranial windows centered over visual cortex, obtaining ~10,000 neurons in all recordings. The recordings were performed using multiplane acquisition controlled by a resonance scanner, with planes spaced 30 to 35 μm apart in depth. In nine recordings, 10 or 12 planes were acquired simultaneously at a scan rate of 3 or 2.5 Hz. For three further recordings, we used single-plane configuration, with a scan rate of 30 Hz. The mice were free to run on an air-floating ball and were surrounded by three computer monitors. Spontaneous activity was recorded in darkness (monitors off), or with a gray background or with presented visual stimuli on these monitors arranged at 90° angles to the left, front, and right of the animal, so that the animal’s head was approximately in the geometric center of the setup.

For each mouse imaged, we typically spent the first imaging day finding a suitable recording location, where the following three conditions held: (i) the GCaMP signal was strong, in the sense that clear transients could be observed in large numbers of cells; (ii) a large enough field of view could be obtained for 10,000 neuron recordings; and (iii) the receptive fields of the neuropil were clearly spatially localized on our three monitors.

In animals for which there was a choice over multiple valid recording locations, we chose either: (i) a horizontally and vertically central retinotopic location or (ii) a lateral retinotopic location, at 90° from the center but still centered vertically. We did not observe differences related to retinotopic location (central or lateral) and thus pooled data across recording locations. We also did not observe significant differences between recordings obtained from GCaMP transgenic animals and from virus injections nor between recordings made in complete darkness or with a gray screen. Thus, we pooled data over all conditions.

Recording protocol and visual stimuli

Spontaneous multiple-plane recordings were performed with either constant gray-screen background (three recordings) or monitors switched off (six recordings). The nine multiplane recordings were of lengths 162, 155, 120, 117, 105, 90, 70, 70, and 70 min. We also performed three gray-screen recordings in single-plane mode (30-Hz frame rate) of lengths 93, 87, and 49 min.

Natural image responses were recorded by presenting 90 to 114 repetitions of 32 images manually selected from the ImageNet database (56), from ethologically relevant categories: birds, cat, flowers, hamster, holes, insects, mice, mushrooms, nests, pellets, snakes, and wildcat. We chose images in which less than 50% of the image was a uniform background and which contained a balance of low and high spatial frequencies. Images were flashed on all three screens for 0.5 s, with a randomized gray-screen interstimulus interval between 0.3 and 1.1 s. The images were shown in a random order during each repeat block. In three sessions, a 30-s period of gray-screen period occurred in between each repeat block; each of these sessions lasted at least 110 min in total. In one session, we first recorded a gray-screen period (10 min), then interleaved longer blocks of stimuli (25 min) and gray screen (15 min) three times, for a total of 130 min of recording.

In four recordings, we presented full-field drifting gratings on all three monitors at 1 of 32 directions evenly spaced at 11°, with a spatial frequency of 0.05 cycles per degree and temporal frequency of 2 Hz. Each direction was presented 70 to 128 times with a duration of 0.5 s and a gray-screen interstimulus interval randomly distributed between 0.3 and 1.1 s. Gratings were presented in random order in blocks of 32, and in between blocks, there was an additional gray-screen period of length 30 s (four recordings) or 0.5 s (one recording). Each recording session was at least 90 min long.

In six electrophysiological recordings (six mice), we targeted visual cortex with Neuropixels probes while presenting natural images from the ImageNet database. Stimuli were presented for 400 ms, with a random gray-screen interstimulus interval of 300 to 700 ms. We presented 700 different stimuli, two times each.

Calcium imaging processing

The preprocessing of all raw calcium movie data was done using a toolbox we developed called Suite2p, using the default settings (29). The software is available at

Briefly, Suite2p first aligns all frames of a calcium movie using two-dimensional rigid registration based on regularized phase correlation, subpixel interpolation, and kriging. For all recordings, we validated the inferred X and Y offset traces to monitor any potential outlier frames that may have been incorrectly aligned. In a few recordings, a very small percentage (<0.01%) of frames that had registration artifacts were removed, and the extracted traces were replaced with interpolated values at those frames. In all recordings, the registered movie appeared well aligned by visual inspection. Next, Suite2p performs automated cell detection and neuropil correction. To detect cells, Suite2p computes a low-dimensional decomposition of the data, which is used to run a clustering algorithm that finds regions of interest (ROIs) based on the correlation of the pixels inside them. The extraction of ROIs stops when the pixel correlations of new potential ROIs drops below a threshold parameter, which is set as a fraction of the correlation in the high SNR ROIs; thus, it does not require the number of clusters to be set a priori. A further step in the Suite2p GUI classifies ROIs as somatic or not. This classifier learns from user input, reaching 95% performance on this data (29), thus allowing us to skip the manual step altogether for most recordings. We note that the 5% errors might be attributable to human labeling error or to dendritic signals from backpropagating APs, reflecting the spiking of deeper cells. Thus, there is little risk of ROIs measuring signals other than neuronal action potentials.

We took great care to compensate cellular fluorescence traces for the surrounding neuropil signal (57). This contamination is typically removed by subtracting out from the ROI signal a scaled-down version of the neuropil signal around the ROI; the scaling factor was set to 0.7 for all neurons. Importantly, for computing the neuropil signal, we excluded all pixels that Suite2p attributed to an ROI, whether somatic or dendritic. After neuropil subtraction, we subtracted a running baseline of the calcium traces with a sliding window of 60 s to remove long time scale drift in baseline, then applied nonnegative spike deconvolution using the OASIS algorithm with a fixed time scale of calcium indicator decay of 2 s (58, 59). To further ensure that out-of-focus fluorescence could not contribute to our results, we excluded neurons whose signal might span two planes by excluding neurons in sequential planes that had a greater than a 0.6 correlation (in 1.2-s bins) with each other, and whose centers were within 5 μm of each other in XY.

In addition, we ensured that the cell sets used for reliable variance estimation (Fig. 1I) were spatially nonoverlapping: We segregated the field of view into 16 strips in XY (encompassing all Z) of width 60 μm and put cells from the odd strips in one group and the cells in the even strips in the other group. This ensured that no cells from different groups were at the same XY position but at a different depth. For peer prediction analyses (fig. S9), we excluded all peer cells within 70 μm of the target (Euclidean distance in three-dimensional space).

Facial videography

Infrared LEDs (850 nm) were pointed at the face of the mouse to enable infrared video acquisition in darkness. The videos were acquired at 30 Hz using a camera with a zoom lens and an infrared filter (850 nm, 50 nm cutoff). The wavelength of 850 nm was chosen to avoid the 970 nm wavelength of the two-photon laser, while remaining outside the visual detection range of the mice.

Running speed was not monitored videographically but rather by optical mice placed orthogonally to the air floating ball on which the mouse stood.

Automated extraction of orofacial behaviors of mice

We developed a toolbox with a GUI for videographic processing of orofacial movements of mice. The software is termed FaceMap and is available at The processing time taken by the software scales linearly with the number of frames and runs 4× faster than real-time on 30-Hz videos.

Motion processing of regions of interest

To extract defined behavioral variables (e.g., pupil diameter and whisking), we used a graphical user interface that allows manual selection of face areas. The user can choose any region of the frame in which to compute the total absolute motion energy, the SVDs of the absolute motion energy, or the SVDs of the raw frames.

The absolute motion energy at each time T is computed as the absolute value of the difference between consecutive frames, resulting in a matrix M of size Npixels × Ntime points. The whisker signal used in the current study (Fig. 1F) was defined to be the total motion energy summed over all pixels in a manually defined region covering the whisker pad.

SVD computation for large matrices

To extract a high-dimensional representation of the facial signal, the toolbox applies singular value decomposition (SVD) to the raw movie, the motion energy movie, or both. The computation is identical in both cases.

The movie matrices are too large to decompose in their raw form. To compute their SVD, we first split the movie M into temporal segments Mi of length ~1 min and compute the SVD of each segment individually. Because the number of pixels is very large (>1 million), we compute the SVD of each movie segment by computing the top 200 eigenvectors Vi of its time by time covariance matrix. We then compute the spatial projections of the segment onto these components, Ui=MiVi. Each matrix Ui consists of the left singular vectors of Mi, scaled by the singular values and is thus a 200-dimensional summary of the segment Mi, related via an orthogonal projection. To estimate the SVD of the entire movie, we concatenate the Ui for all segments of the movie and recompute the SVD: [UiUn]=USV. The matrix U represents the spatial components of the full movie, and we project the movie onto the top 1000 components of it, to obtain their temporal profiles: Wmotion=UM.

Pupil processing

To compute pupil area, the user first defines a region of interest using the FaceMap interface. The minimum value in this region is subtracted from all pixels for robustness across illumination changes. The darkest pixels in this region, identified by a user-selected threshold, correspond to the pupil. We estimate the pupil center as the center of mass of these dark pixels: x¯=ΣxxR(x)/ΣxR(x), where x is the two-dimensional pixel location, R(x) is that pixel’s darkness level relative to the threshold, and the sum runs over all pixels x darker than the threshold. We compute the covariance of a two-dimensional Gaussian fit to the region of interest: Σx(xx¯)(xx¯)/Nx, where the sum runs over all pixels darker than the threshold and Nx is the number of such pixels. For robustness, this process is iterated four times after reselecting only pixels that are two standard deviations away from the center and recomputing the Gaussian covariance fit. The final result is an outline of the pupil defined by an ellipse two standard deviations from the center of mass.

Neuropixels recordings

Neuropixels electrode arrays (37) were used to record extracellularly from neurons in seven mice. In three of these mice, eight Neuropixels probes were used to target multiple brain areas. The three mice were (i) 73 days old, male, wild type (mouse 1); (ii) 113 days old, female, TetO-GCaMP6s;Camk2a-tTa (mouse 2); and (iii) 99 days old, male, Ai32;Pvalb-Cre (mouse 3). In four of these mice, one to four Neuropixels probes were used to target visual cortex. These mice were between 8 and 24 weeks old at the time of recording and were of either gender. The genotypes of these mice were Slc17a7-Cre;Ai95, Snap25-GCaMP6s, Ai32;Pvalb-Cre, or Emx1-Cre;CaMKIIa-tTA;Ai94.

In all cases, a brief (<1 hour) surgery to implant a steel headplate and 3D-printed plastic recording chamber (~12 mm diameter) was first performed. Following recovery, mice were acclimated to head-fixation in the recording setup. During head-fixation, mice were seated on a plastic apparatus with forepaws on a rotating rubber wheel. Three computer screens were positioned around the mouse at right angles. On the day of recording, mice were again briefly anesthetized with isoflurane while two to eight small craniotomies were made with a dental drill. After several hours of recovery, mice were head-fixed in the setup. Probes had a silver wire soldered onto the reference pad and shorted to ground; these reference wires were connected to a Ag/AgCl wire positioned on the skull. The craniotomies as well as the wire were covered with saline-based agar, which was covered with silicone oil to prevent drying. Each probe was mounted on a rod held by an electronically positionable micromanipulator (uMP-4, Sensapex Inc.) and was advanced through the agar and through the dura. Once electrodes punctured dura, they were advanced slowly (~10 μm/s) to their final depth (4 or 5 mm deep). Electrodes were allowed to settle for approximately 15 min before starting recording. Recordings were made in external reference mode with LFP gain = 250 and AP gain = 500, using SpikeGLX software. The mice were in a light-isolated enclosure and, during the spontaneous part of the recording, the computer screens were black. Data were preprocessed by rereferencing to the common median across all channels (60).

Spike sorting the Neuropixels data

We spike sorted the data using a modification of Kilosort (61), termed Kilosort2, that tracks drifting clusters (fig. S17). Code is available at Without the modifications, the original Kilosort and similar algorithms can split clusters according to drift of the electrode, which would confound our behavioral-related analyses. Kilosort2 tracks neurons across drift levels and for longer periods of time (~1 hour in our case). In addition, Kilosort2 performed automated splits and merges similar to what a human curator would do on the basis of spike waveform similarity, the bimodality of the distribution of waveform features, and the spike auto- and cross-correlograms.

The final single units used were from several cortical areas (visual: 709, sensorimotor: 371, frontal: 842, and retrosplenial: 122), hippocampal formation (753), striatum (310), thalamus (2743), and midbrain (570).


Pairwise correlations were computed after binning activity at 1.2 to 1.3 s (three or four frames respectively for 12 and 10 plane recordings; 1.2-s bins for Neuropixels recordings). To compute shuffled correlations (Fig. 1C), we circularly shifted each neuron’s activity in time by a random number of bins (at least ±1000) and correlated all the shifted traces with all the original traces.

Arranging rasters by correlation

To visualize high-dimensional structure in raw data, raster plots were sorted vertically along a one-dimensional continuum so that nearby neurons were most correlated. To do this, the binned activity of each neuron was first z scored, and electrode data was high-pass filtered (100-s Gaussian kernel; this was not necessary for 2p data because traces had already been high-passed in preprocessing). Neurons were sorted using a generalization of scaled k-means clustering, where the clusters are ordered along a one-dimensional axis to have similar means to their nearby clusters. Neurons were initially ordered on the basis of their weights onto the first principal component of population activity and divided into 30 equal-sized clusters along this ordering. On each iteration, we computed the mean activity of each cluster, smoothed it across clusters with a Gaussian window, then reassigned each neuron to the cluster whose smoothed activity it was most correlated with. This process was repeated for 75 iterations. The width of the Gaussian smoothing window was held at three clusters for the first 25 iterations, then annealed to one over the following 50 iterations. On the final pass, we up-sampled the neurons’ correlations with each cluster by a factor of 100 via kriging interpolation with a smoothing constant of one cluster. This allowed us to determine subinteger assignments of neurons to clusters, resulting in a continuous distribution of neurons along a one-dimensional axis. The algorithm is available, implemented in Python and MATLAB at We ran the MATLAB version on the data here.

Although the electrode data was high-pass filtered to compute sorting, we display the original raw activity in Fig. 3E.

Shared variance component analysis

The SVCA method gives an asymptotically unbiased lower-bound estimate for the amount of a neural population’s variance reliably encoding a latent signal. A mathematical proof of this is given in the appendix; here, we describe how the algorithm was implemented for the current study.

We first split the population into two spatially segregated populations. To do so, we divided the XY plane into 16 nonoverlapping strips of width 60 μm and assigned the neurons in the even strips to one group and the neurons in the odd strips to the other group, regardless of the neuron’s depth. Thus, there did not exist neuron pairs in the two sets that had the same XY position but a different depth, avoiding a potential confound that a neuron could be predicted from its own out-of-focus fluorescence.

Neural population activity was binned at 1.2- to 1.3-s resolution (see above), and each neuron’s mean activity was subtracted from its firing trace. We divided the recording into training and test time points (alternating periods of 72 s each), thereby obtaining four neural activity matrices: Ftrain, Ftest, Gtrain, and Gtest of size Nneurons×Ntime points, where F and G represent activity of the two cell sets. We compute the covariance matrix between the two cell sets on the training time points asS^k=uk FtestGtestvk/Ntime pointsTo obtain the fraction of reliable variance (Fig. 1L), we normalize this reliable variance by the arithmetic mean of the variances of the test set data for each cell set on the corresponding projections, Sk, tot=(ukFtestFtestuk+vkGtestGtestvk)/2.

Predicting neural activity from behavioral variables

To estimate the fraction of neural variance that could be predicted from explicitly computed arousal variables (Fig. 1, N and O), we resampled their traces into the same 1.2- to 1.3-s bins as the neural data. The arousal variables (either single traces of running, whisking, pupil area, or all three together) defined predictor matrices Xtrain and Xtest for the training and test sets. We predicted the SVCs of neural activity UFtrain and VGtrain from the training-set behavior traces by unregularized multivariate linear regression, obtaining weight matrices A and B that minimized the squared errors UFtrainAXtrain2 and VGtrainBXtrain2. We then used these weight matrices to predict activity in the test set and computed the covariance matrix of the residual error of each SVC:Sk,res=(ukFtestakXtest)(vkGtestbkXtest)/NsamplesSk, res represents the amount of variance along SVC k that cannot be predicted by the behavioral traces, and (S^kSk,res)/S^k represents the fraction of reliable variance that can be so predicted. To compute the fraction of total variance explainable by behavioral traces (Figs. 1, N and O, and 2, E and G to J), we normalize instead by the total test-set variance: (S^kSk,res)/Sk, tot.

To predict the fraction of neural variance that could be predicted from unsupervised videographic analysis (Fig. 2), we took a similar approach but computed the weight matrices A and B by reduced-rank regression. Reduced-rank regression is a form of regularized linear regression, with the prediction weights matrix restricted to a specific rank (62), reducing the number of parameters and making it more robust to overfitting. Figure 2E shows the fraction of total variance in successive dimensions that can be predicted by rank-16 prediction, whereas Fig. 2G shows how the predicted fraction of variance in the first 128 dimensions depends on the rank of the predictor.

Peer prediction analysis

The shared variance component analysis described above—like a related algorithm for estimating reliable stimulus coding (35)—provides unbiased estimates but requires thousands of simultaneously recorded neurons per brain area. Because this many neurons were not available in our Neuropixels recordings, we turned to another method to estimate the reliable variance in these data. This method is an adaptation of the previously described “peer prediction” method (63, 64). Peer prediction analysis attempts to predict each neuron individually from the other simultaneously recorded cells (the neuron’s “peers”). By contrast, SVCA finds the dimensions of activity in a large population that can be most reliably predicted from a held-out set of neurons. Because a substantial fraction of a single neuron’s variance arises from independent noise, which is averaged out when projecting onto the SVCA dimensions, peer prediction gives systematically lower values of variance explained than SVCA.

To apply peer prediction to our data, we again binned neural activity with 1.2- to 1.3-s resolution and divided these time points into a training set and a test set, consisting of alternating blocks of duration 72 s. Each neuron took a turn as target for prediction from the activity of simultaneously recorded peer cells, defined to be any cells on all other probes and cells on the same probe greater than five sites away (40 μm) for Neuropixels recordings; for 2p recordings, we used all neurons greater than 70 μm from the cell in three-dimensional distance, in order to avoid potential optical contamination from the target neuron. We denote peer cell activity in the training and test sets by Ncells×Ntime points matrices Ftrain and Ftest, respectively, and target cell activity in the training and sets as 1×Ntime points vectors gtrain and gtest. We first computed the singular value decomposition of peer cell activity on the training set: Ftrain=USV. We then predicted the target neuron activity by ridge regression from n singular value components of peer cell activity, where n took values n=1, 2, 4, 8, 16, ..., 512, 1024 . The prediction weights were thuswn=[(gtrainVnSn)((VnSn)(VnSn)+λI)1]Unwhere Un,Vn,Sn are matrices containing the top n singular vectors. Then the prediction of the single-neuron activity on the test set was gtestn=wnFtest, and the fraction of variance explained was 1gtestgtestn2/gtest2. We chose λ to be 10 by hand.

Subspaces of stimulus and behavioral activity

The stimulus subspace (Fig. 4) was defined as the space spanned by the trial-averaged responses of each of the 32 stimuli presented. We computed the Nneurons × Nstimuli matrix of trial-averaged responses R from one-third of the stimulus responses, saving the other two-thirds of the stimulus responses for variance estimation.

The behavior subspace was defined via the reduced-rank regression prediction method described above. We performed the regression on one-half of the spontaneous activity, leaving the other half of the spontaneous activity for variance estimation. This method produces a weight matrix of size Nneurons × NfacePCs, that factorizes as a product of two matrices of sizes Nneurons × r and r × NfacePCs, where r is the rank of prediction. We defined the behavior space as the space spanned by the first 32 columns of the former matrix, and we define a Nneurons×32 matrix EB whose columns contain an orthonormal basis for this space.

To determine dimensions inside the behavioral subspace that contain stimulus information, we found the sequence of orthogonal directions ei maximizing the sum of squared projections (the power) of the trial-averaged stimulus responses R:


such that






The solution to this maximization problem is given by the left singular vectors of EBR. To determine the amount of stimulus variance in shared dimension i (Fig. 4C), we projected test data onto ei and quantified the stimulus-related variance in this projection using the unbiased method of (35) on the remaining two-thirds of stimulus responses. We call ei the “shared stimulus-behavior” dimension, because it contains significant stimulus variance, as opposed to e2, e3,, which contain very little (Fig. 4C). A histogram of the weights of e1 on all neurons is plotted in Fig. 4D.

The behavior-only subspace was defined by projecting out the shared dimension e1 from all columns of EB. The stimulus-only subspace was defined by projecting out from all rows of R the top right singular vector of the matrix EBR. Time courses of neural activity projected into these subspaces are plotted in Fig. 4F. To quantify the amount of variance in each subspace (stimulus-only, behavior-only, and stimulus-behavior) (Fig. 4G), we computed the total projected variance as the sum of squared projection lengths along each axis of an orthonormal basis.

An identical analysis was used to define the stimulus-spontaneous shared dimension and spontaneous-only subspace by replacing the subspace of the top 32 behavioral components with the subspace of the top 128 principal components of activity computed in one-half of the spontaneous period.

To account for trial-to-trial variability in the stimulus subspace, we fit a multiplicative gain model. A gain parameter gt was fit for each trial t, and the activity of dimension n in response to the stimulus σt shown on this trial was modeled as f¯n,σt(1+gtαn). Here, f¯n,σ represents the mean activity of dimension n to stimulus σ, and αn represents the susceptibility of this dimension to gain fluctuations. Note that the mean of gt across trials from the same stimulus is 0 by definition. We used an alternating optimization method to obtain the best fit g given α, then α given g, repeating for 100 iterations. We also evaluated an affine model, allowing both the gain and offset of each neuron’s responses to change on a trial-by-trial basis: f¯n,σt(1+gtαn)+atβn. Here, at is an additive offset on trial t, with each neuron scaling this offset by a factor βn. The vector β can therefore describe directions of additive variability inside the stimulus subspace.

Stimulus decoding analysis

To predict stimulus identity from population responses, we fit a linear multiclass support vector machine (SVM) model, using MATLAB’s “fitcecoc” function. This function fits K*(K − 1)/2 binary linear SVM models, where K is the number of different stimuli. Stimulus repeats were divided into equal thirds: one for estimating the stimulus subspace, one for training the classifier, and one for evaluating performance. For the 32 natural image stimuli, decoding error was defined as the fraction of responses not assigned to the correct stimulus. For drifting grating responses, we fit two decoders: one to direction (32 possibilities) and one to orientation (16 possibilities). We defined the error as the number of degrees the prediction was off, averaged across all the test set responses.

Population activity was always decoded from a 32-dimensional subspace. For the spontaneous-only and behavior-only subspaces, we used the top 32 dimensions. Random 32-dimensional subspaces were found by generating a matrix of standard Gaussian variates of size neurons × 32 and normalizing each column so the sum of its squares was 1. For each experiment we computed 20 different random subspaces, computed the decoding error and SNR in these subspaces, and averaged the results.

Supplementary Materials

Supplementary Text

Figs. S1 to S17

References (68)

References and Notes

Acknowledgments: We thank M. Krumin for assistance with the two-photon microscopes and A. Peters for comments on the manuscript. Funding: This research was funded by the Wellcome Trust (095668, 095669, 108726, 205093, and 204915), the European Research Council (694401), and the Simons Foundation (SCGB 325512). C.S. was funded by a 4-year Gatsby Foundation Ph.D. studentship. M.C. holds the GlaxoSmithKline/Fight for Sight Chair in Visual Neuroscience. C.S. and M.P. are now funded by HHMI Janelia. N.S. was supported by postdoctoral fellowships from the Human Frontier Sciences Program and the Marie Curie Action of the EU (656528). Author contributions: Conceptualization: C.S., M.P., N.S., M.C., and K.D.H.; methodology: C.S. and M.P.; software: C.S. and M.P.; investigation: C.S., M.P., N.S., and C.B.R.; writing: C.S., M.P., N.S., M.C., and K.D.H; Resources: M.C. and K.D.H; and funding acquisition: M.C. and K.D.H. Competing interests: The authors declare no competing interests. Data and materials availability: All of the processed deconvolved calcium traces are available on figshare (65), together with the behavioral traces (face motion components, whisking, running and pupil). All the spike-sorting electrophysiology and the corresponding behavioral traces are also on figshare (66). The code for the analysis and the figures is available on github ( and at Zenodo (67).
View Abstract

Navigate This Article