Song Replay During Sleep and Computational Rules for Sensorimotor Vocal Learning

See allHide authors and affiliations

Science  27 Oct 2000:
Vol. 290, Issue 5492, pp. 812-816
DOI: 10.1126/science.290.5492.812


Songbirds learn a correspondence between vocal-motor output and auditory feedback during development. For neurons in a motor cortex analog of adult zebra finches, we show that the timing and structure of activity elicited by the playback of song during sleep matches activity during daytime singing. The motor activity leads syllables, and the matching sensory response depends on a sequence of typically up to three of the preceding syllables. Thus, sensorimotor correspondence is reflected in temporally precise activity patterns of single neurons that use long sensory memories to predict syllable sequences. Additionally, “spontaneous” activity of these neurons during sleep matches their sensorimotor activity, a form of song “replay.” These data suggest a model whereby sensorimotor correspondences are stored during singing but do not modify behavior, and off-line comparison (e.g., during sleep) of rehearsed motor output and predicted sensory feedback is used to adaptively shape motor output.

In reinforcement learning, systems learn through interaction with the environment by trying to optimize some measure of performance. Biological systems may experience a substantial delay between premotor activity and assessment of performance through sensory feedback (1). This delay poses the problem of how to reward or punish a premotor circuit when that circuit is participating in a different task by the time the reward or punishment is computed. Reinforcement learning is further complicated in systems such as vocal learning, where the mapping of sensory feedback (fundamentally represented as frequency versus time) onto motor output (muscle dynamics) is of high dimensionality (a many-to-many dynamic mapping). Methods developed in the field of machine learning solve the problem of reinforcement learning with delayed reward (2), and a variety of biological solutions have been proposed to the problem of learning sequences of actions (3). Here, we report on neuronal data that represent a solution to the problem of sensorimotor mapping in the bird vocal-motor (“song”) system. The physiological properties observed during sleep also suggest an algorithmic implementation for reinforcement learning of song.

Zebra finch songs are organized hierarchically, with one or more notes composing a syllable, and sequences of syllables forming a motif, which are repeated to form song. We investigated neurons in the forebrain nucleus robustus archistriatalis (RA), whose descending projections represent the output of the forebrain song system. During singing, RA neurons exhibit short bursts of activity, whose identity varies with the note that immediately follows the burst (4). In awake birds, outside the context of vocalizations, RA neurons are regularly firing. RA neurons also prominently burst “spontaneously” and respond to sounds, but only during sleep (5). With the goal of comparing motor, auditory, and ongoing bursting activity, we recorded single neurons in the RA of singing male zebra finches, permitted the animals to fall asleep by turning off the lights, and then tested the same neurons' sensory and ongoing discharge properties (6, 7).

The spiking patterns of RA neurons in singing birds consisted of phasic patterns of premotor excitation superimposed over a background of profound inhibition (4) (Fig. 1, B and C). This premotor activity was virtually invariant for multiple occurrences of the same sound. After the lights were turned off, RA auditory responses were initially weak but gained strength with time, reflecting the gradual transition into sleep (5). Responses to playback of the bird's own song (BOS) also consisted of phasic patterns of excitation separated by inhibition that were similar for multiple occurrences of the same sound, differing mainly in the strength of response rather than pattern (8).

Figure 1

(A) Schematic of the song system. Auditory and premotor activity converge onto the HVc. The HVc projects directly to the RA, which projects to brainstem motor centers. The HVc also projects to area X, which projects to the DLM. The DLM projects to the lMAN, which projects to the RA. Feedback loops arise from the RA and lMAN. (B and C) Activity of RA single neurons during singing is premotor (i.e., neuronal activity leads syllables). Spectrographs of the sound that the bird produced are shown in a color scale (frequency, 0 to 10 kHz, is on the ordinate; time is on the abscissa). Corresponding raw traces of the neural activity (amplitude versus time) are shown below the spectrographs. Data from one neuron for two similar examples of singing are shown in (B). The neuron's activity patterns are the same for the two examples of singing, except where the vocalizations differ, and the difference in neuronal discharge (at arrow) precedes the difference in the vocalizations. Both sequences of vocalizations occurred frequently; the neuronal pattern associated with each sequence was stereotyped. In (C) [different neuron, same bird as in (B)], the bird produced syllable “C” (marked by arrows) twice (each syllable is identified by a letter), with the song ending prematurely after the second occurrence. The neuronal discharge following the second C was affected. Activity during calling (not shown) also clearly demonstrates that RA activity leads vocalizations [see also (4)]. (D) Cross-correlation of auditory and motor activity for the first neuron shown in (E) (positive time shifts imply that auditory response lags premotor activity). (E) Examples of the match between auditory and motor activity from one neuron in each of two birds. The spectrographs show the BOS used as stimuli during playback experiments, and each syllable is identified by a letter. The rasters marked “Aud.” represent the neuron's auditory response during playback while the birds were asleep. The rasters marked “Mot.” were constructed from neuronal activity of the same neurons during singing (4). The correspondence between the two patterns of activity is visually striking. In contrast to singing, however, during song playback, neurons exhibited ongoing discharge, not inhibition, for some syllables. (F) Example of raw traces showing the match between activity during playback and singing [same neuron as in (B)].

The timing of auditory responses to the BOS was very well aligned to the timing of premotor activity (Fig. 1F). The only exceptions were instances of silence following the end of a motif or the end of song, where the auditory response could include an additional burst that corresponded with the syllable that would have followed if the song had continued without pause. To compare motor and auditory activity, we analyzed the singing-related activity surrounding each syllable of song (4, 9). The spike patterns from the response to the BOS playback were then compared with the spike patterns from premotor activity derived from the corresponding syllables, showing that the timing of excitation and inhibition during auditory stimulation was well aligned to such timing during singing (Fig. 1, D and E). A cross-correlation procedure revealed a strong, significant (P < 0.02) correlation (10) between premotor and sensory spike patterns in all 17 neurons (from three birds) (mean normalized peak correlation = 0.49 ± 0.13SD). Thus, sensorimotor transformations in the song system result in a correspondence between temporally precise sensory and motor activity observed at the level of individual cells.

The auditory activity was only slightly delayed in relation to motor activity (by 8 ± 2 ms; range, 4 to 13 ms). Because premotor activity in RA can lead the onset of syllables by up to ∼40 ms (4), this was surprising and suggested that the sensory patterns representing subsequent syllables were generated by responses to previous syllables. To characterize the extent of temporal integration in the auditory responses, we presented stimuli in which a syllable chosen at random from the final motif of the BOS was substituted by a background of equal duration, and we assessed the effect on the neuronal activity during the same or subsequent syllables (11). The deletion of a syllable substantially reduced the neuronal activity occurring one to three syllables later (Fig. 2A), up to ∼250 ms (8). This property was ubiquitous for all RA neurons that were auditory (14 neurons from three birds) (12) (Fig. 2B). These response properties are suggestive of temporal combination sensitivity, in which a sensory neuron's response is nonlinearly dependent on the temporal sequence of preceding syllables. Such responses have also been described for neurons in the nucleus HVc, which projects to the RA (13). Thus, in the RA as well as in the HVc (4,13), the integration time of individual neurons appears to be considerably greater when in the sensory (auditory) state than during singing. Given the alignment of auditory and motor activity in the RA, one way of interpreting these results is that auditory responses to song syllables represent a prediction of subsequent premotor activity.

Figure 2

Deletion experiments. (A) The bird was presented songs during sleep. Below the spectrograph of the last motif of the bird's song are nine histograms of the response of one neuron, representing 30 repetitions each of nine different stimuli. The BOS histogram is the response to the unaltered motif. For each of the eight other stimuli, one of the syllables from A to H was replaced with background noise. For syllable F, for example, the neuron responded with two bursts, with both bursts occurring during syllable F. The first burst (but not the second) is statistically significantly reduced (8) by the elimination of syllables C, D, E, or F. The second burst is affected by the elimination of syllable F. The burst at syllable H is affected by eliminating syllables F, G, and H. (B) In one bird with the most complex song, 10 neurons were tested with deletion stimuli. Each cell of the matrix gives the number of neurons in which the deletion of a syllable (specified by the column) significantly altered the response during the target syllable (specified by the row). The last syllable of the song, H, was excluded because appropriate control data were unavailable (an earlier syllable had always been deleted). The matrix diagonal represents the effect of deleting a syllable on the neuronal response during that same syllable. The numbers to the right of the diagonal are the number of neurons for which there were a statistically significant response during the target syllable. It can be seen that the deletion of a syllable commonly affected the neuronal response several syllables later. For example, of eight neurons responding to syllable E, the response was suppressed for one, six, seven, and six neurons when syllables B, C, D, and E, respectively, were deleted.

We also searched for similarities between ongoing bursting activity during sleep (5) and the sensorimotor patterns of RA neurons. For each cell, a visual inspection of samples of activity from long stretches (15 to 60 min) of undisturbed sleep identified repeated examples of one or more complex burst patterns, suggestive of the patterns that we had observed in the cell's premotor activity. To quantify this match, we developed a procedure to automate burst detection (14), considering only bursts of eight spikes or more to ease the computational burden and to allow for statistical analysis. By this procedure, 7.1 ± 5.3% of all spikes (14 neurons from three birds) occurred in bursts, an average of 175.4 ± 144.6 (range, 38 to 581) bursts per cell. For each cell, a measure of similarity between each burst and the single longest bout of the cell's premotor activity (4 to 8 s, consisting of several motifs or songs) was computed and tested for significance (15,16). The results showed that 15.3 ± 6.5% of bursts (range, 2.6 to 26.8%) significantly matched premotor activity. Only the cell with 2.6% matching bursts failed to exceed the 5% level expected by chance (16). Examples of matches between longer sequences of complex ongoing bursts and premotor activity were particularly compelling (Fig. 3A). In an exceptional case when two RA neurons were recorded simultaneously from different electrodes during sleep, both neurons commonly exhibited simultaneous bursting, with the different burst patterns for each neuron corresponding to the same sequences of syllables (Fig. 3B). This suggests that populations of RA neurons burst in a coordinated fashion during sleep. Bursts (and matching bursts) preferentially occurred during periods when the rate of ongoing discharge was lower and more variable (Fig. 4). Such modulation may correspond to specific phases of the sleep cycle.

Figure 3

Neuronal replay during undisturbed sleep. (A) Raw traces of neuronal activity (900 ms) during sleep (“Spon”) in two different neurons for one bird. For each sample, a representative corresponding sample of premotor activity (“Mot.”) and a color spectrograph of the song that the bird sang are shown. (B) Raw traces (1400 ms) of simultaneous recordings from two neurons (∼400 μm apart) in another bird. (The second neuron's activity is visible in the background of the first neuron's signal, an artifact of the pairing of signals used to achieve differential recordings resistant to movement-induced artifacts.) Both neurons simultaneously burst during sleep, with complex burst structures that match premotor activity. Apparent temporal expansion (first motif: A, B, and C) and compression (second motif: A, B, and C) is highlighted by the blue lines. This phenomenon has also been reported in population activity of hippocampal neurons (23).

Figure 4

(Top) The firing rate during recordings of RA ongoing activity over almost 1 hour of sleep, estimated from a 100-point moving average of the interspike intervals (there was a gap in the data collection of ∼3.25 min). (Bottom) A histogram (30-s bins) of the number of bursts identified by a burst-finding procedure. The number of bursts that significantly matched the premotor activity is shown in blue.

In the sensorimotor phase of vocal learning, the mapping between auditory feedback and vocal output is the fundamental computational problem to be solved (17). A solution to this problem is reflected in the sensorimotor activity patterns of RA neurons. Precision of spike timing has been observed in a number of systems and provides evidence for temporally based neural codes in sensory processing, although only in a few cases has the behavioral relevance been directly demonstrated (18). The observed correspondence between auditory activity and vocal output demonstrates that, in the RA, sensorimotor mapping is based on a temporal code. This correspondence is likely to arise from auditory input recruiting similar components of the RA pattern-generating circuits as those recruited during singing. In the hierarchical organization of the song system (4, 19), the sensorimotor correspondence may first emerge at the single-cell level within the RA. The data suggest that, during vocal development, the song system learns to generate premotor commands by association with a prediction of future commands based on the timing of auditory feedback from preceding syllables. This can be interpreted as learning the match between the auditory response to a sequence of syllables with the premotor pattern for a subsequent syllable or as learning the match between the prediction of a sensory representation of a syllable with the premotor representation of the same syllable.

In the birdsong system, RA receives input from the HVc and from an anterior forebrain pathway (AFP) (Fig. 1A). Sensorimotor song learning could result in part from “online” mechanisms, whereby during singing, HVc activity in response to auditory feedback from sequences of syllables is delayed through the AFP to produce a prediction of activity in the RA of a subsequent syllable. The data collected during sleep, however, also suggest “off-line” models for learning that address the problems of feedback delay and sequence generation. Such models share some similarities with temporal-difference models of reinforcement learning and sequence generation that are prominent in mammalian work on basal ganglia and the cerebellum, in that they reward or modify the system on the basis of its overall performance, not on the basis of the performance of individual components or movements (3). The AFP has been likened to a mammalian corticobasal ganglia-thalamocortical loop (20, 21).

In the vocal learning model motivated by the present data, signals that arise in the RA during singing train the AFP to generate a prediction of auditory feedback; during sleep rehearsal, the AFP's predicted feedback provides reinforcement to RA neurons. During singing, sensorimotor efference copy signals (premotor output and expected auditory feedback) traverse the AFP, and via the lateral subdivision of the magnocellular nucleus of the anterior neostriatum (lMAN) projection onto area X, are compared with real auditory feedback arriving in area X from the HVc (Fig. 1A). Efference copy is brought into temporal register with auditory feedback using the long (∼50 ms) synaptic delays observed in the medial subdivision of the dorsolateral nucleus of the thalamus (DLM) (21). This stimulates area X neurons that are sensitive to temporally coincident input. The output of the lMAN onto the RA has a reduced effect because the lMAN is not in temporal register with driving input from the HVc. During sleep, replay of song premotor patterns via ongoing bursting generates coherent activity throughout the song system that is similar to singing in the absence of actual sound production and perception. The output of the lMAN represents a prediction of the real auditory feedback that would have resulted from the burst-generated motor command, is in near coincidence with HVc bursts driving the RA, and is used to modify RA neurons that are sensitive to temporally coincident input.

The proposed algorithm for birdsong learning depends on circadian modulation of neuronal activity patterns (22). Our observation of neuronal replay of sensorimotor patterns during sleep is consistent with data from hippocampal studies suggesting that sleep is important for the consolidation of neuronal temporal codes for spatial memory (23, 24). The fundamental prediction of our model is that birdsong learning depends on sleep or other off-line computations.

  • * To whom correspondence should be addressed. E-mail: dan{at}


View Abstract

Navigate This Article