Optimizing Sound Features for Cortical Neurons

See allHide authors and affiliations

Science  29 May 1998:
Vol. 280, Issue 5368, pp. 1439-1444
DOI: 10.1126/science.280.5368.1439


The brain's cerebral cortex decomposes visual images into information about oriented edges, direction and velocity information, and color. How does the cortex decompose perceived sounds? A reverse correlation technique demonstrates that neurons in the primary auditory cortex of the awake primate have complex patterns of sound-feature selectivity that indicate sensitivity to stimulus edges in frequency or in time, stimulus transitions in frequency or intensity, and feature conjunctions. This allows the creation of classes of stimuli matched to the processing characteristics of auditory cortical neurons. Stimuli designed for a particular neuron's preferred feature pattern can drive that neuron with higher sustained firing rates than have typically been recorded with simple stimuli. These data suggest that the cortex decomposes an auditory scene into component parts using a feature-processing system reminiscent of that used for the cortical decomposition of visual images.

Feature processing by neurons in the primary visual cortex has been studied extensively since the discovery by Hubel and Wiesel that the “right” kinds of stimuli for visual cortical neurons are moving, oriented bars within a spatial receptive field (1)—a finding later confirmed and extended in detail by reverse-correlation methods (2). The fundamental feature-processing characteristics within the spectral receptive fields of neurons in the awake primary auditory cortex are less completely resolved, except in species with particularly well understood auditory behavior, such as bats (3), owls (4), and song-birds (5), where stimuli have been selected on the basis of ethological principles (6). Our understanding of auditory cortex physiology in other mammalian species results largely from studies of anesthetized animals (7), which has demonstrated that auditory neurons are tuned for a number of independent feature parameters of simple stimuli including frequency (8), intensity (9), amplitude modulation (10), frequency modulation (11), and binaural structure (12). However, auditory responses to multiple stimuli can also enhance or suppress one another in a time-dependent manner (13), and auditory cortical neurons can be highly selective for species-specific vocalizations (14), indicating complex acoustic processing by these cells. It is not yet known if or how these many independent selectivities of auditory cortical neurons reflect an underlying pattern of feature decomposition, as has been suggested (15). Further, because sustained firing-rate responses in the auditory cortex to tonal stimuli are typically much lower than visual responses to drifting bars (16), it has been suggested that the preferred type of auditory stimulus may still not be known (17). For these reasons, we investigated whether a reverse-correlation method similar to that used in the visual cortex would discern the full spectral and temporal feature-response profile of auditory cortical neurons in the awake animal.

Reverse correlation stimuli used in visual cortex experiments have consisted of rapidly presented two-dimensional checkerboard spatial patterns (Fig. 1A). The auditory stimuli used here consisted of rapidly presented random chords or asynchronous random tone progressions that evenly spanned a portion of the one-dimensional receptor surface of the cochlea (Fig. 1, B and C). Rather than measuring tuning for a preselected parameter, this unbiased method constructs the average full auditory stimulus preceding spikes from a neuron, whatever form it may take (18). Alternatively, this can be viewed as the average response of the neuron driven by each separate stimulus component, which is numerically identical (but is reversed in time, and is expressed in units of mean density of stimulus components instead of the mean density of spikes). Neuronal reverse-correlation techniques were originally developed for characterizing auditory neurons in the periphery (19), but while these methods have been applied to more peripheral structures and visual cortical cells, researchers have only recently begun to succeed in applying this method to auditory cortical neurons (20). Data presented here are taken from extensive characterizations of 206 isolated single neurons in the primary auditory cortex (AI) of two awake owl monkeys (21).

Figure 1

Stimuli used for reverse correlation in (A) the visual and (B and C) the auditory cortex. In the visual system, reverse-correlation stimuli have consisted of flickering checkerboards. Stimuli used here for the auditory domain consisted of chords created by randomly selecting frequencies from 84 possible values spanning 7 octaves from 110 to 14,080 Hz in even Embedded Imageth octave steps. The density of tones in each stimulus was one tone per octave on average, and seven tones per chord, but the stimuli were stochastic so that a given chord could be composed of a variable number of tones of randomly selected frequencies. Rates of 10 to 100 chords/s have been used, as well as stimuli with random, nonaligned onset times in each frequency band. Data presented here used 50 chords/s or the equivalent number of nonaligned tones. The complete stimulus set lasted for 10 min, thereby including 30,000 individual chords. The stimuli were 20-ms long pure tones with 5-ms cosine onset and offset ramps, and were presented at either 70 or 50 dB, or both. Fewer frequencies are shown in the diagram than were actually used.

Neurons in the auditory cortex have traditionally been analyzed by counting spikes evoked by individual pure-tone stimuli to compute frequency tuning at differing intensity levels (22). Figure2A shows the standard frequency response area of a single AI neuron at a range of intensities for comparison. The remainder of Fig. 2 shows plots of the full linear average pattern of feature selectivity in spectral content and in time for seven representative AI neurons in the form of spectrotemporal receptive fields, measured at a 70-dB sound pressure level (SPL), including this same neuron from (A) presented in (B) and (C). The spectrotemporal receptive field (STRF) is effectively a sonogram of the linear estimate of the optimal stimulus and is computed during the same amount of recording time as a traditional tuning curve, 10 min. Figure 2, B to I, shows the effect of average firing rate on each neuron driven by each individual component of the complex stimulus as the pixel's color (spikes per second). The reverse-correlogram is thus presented as a stimulus-triggered spike rate average, analogous to a standard peristimulus time histogram but reversed in time, and is identical to the estimated optimal stimulus for the cell (a spike-triggered stimulus average in units of mean stimulus density). To demonstrate the typical consistency of the method to reproduce the complex details of receptive fields, Fig. 2C presents a second estimate of the optimal stimulus for the neuron in (B) using an entirely different stimulus set and using asynchronous rather than synchronous tone onsets. Considerable structure in the regions of both increased and decreased firing rate is apparent in the STRF that is not evident in the traditional tuning curve.

Figure 2

The spectrotemporal receptive fields of neurons in the primary auditory cortex of the awake primate show the pattern of sound features selected for by particular neurons. The traditional tuning curve (A) was computed by counting the number of spikes elicited from a single auditory cortical neuron by 100-ms pure-tone pips presented at 84 frequencies and eight intensities (scale bar in spikes per stimulus, intensities in decibel SPL, data convolved with 2-pixel-wide Gaussian). The spectrotemporal receptive field (B) shows the full time-frequency structure of the same neuron's sound-feature selectivity (note the different axes), which is proportional to its estimated optimal stimulus including multiple excitatory and inhibitory regions, in units of mean spikes per second. Spectrotemporal receptive fields (B toI) were computed as described in the text. The spectrotemporal receptive field in (C) was recomputed for the same neuron with a completely different stimulus set. In (B) the stimuli were made up of random chords with synchronous onset times for each component in the chord, and in (C) all individual components were presented at completely random and asynchronous times. Receptive fields are all from well-isolated single-neuron recordings. Receptive-field structures correspond to the average rate of spikes from the neuron at time zero driven by each stimulus component frequency at the lag time shown; this is the standard peristimulus spike rate value triggered on each stimulus component but reversed in time. Proportionally equivalent, the spectrotemporal receptive field is also the average stimulus preceding each neuronal spike at time zero, an average of the stimulus components triggered on the spike occurrence. Therefore, scale bars shown (in spikes per stimulus) are directly proportional to the probability of occurrence of each stimulus component (in stimuli per spike). Spectrotemporal receptive fields were convolved with a 1-pixel-wide Gaussian to reduce noise.

Only a minority of neurons in AI (5% of total, n= 206 neurons) had spectrotemporal receptive fields with just a single region of increased rate and no inhibition, which corresponds to simple selectivity for a particular frequency region, with no additional feature-processing structure. We have found that cells of this type (23) are less common than cells with complex multipartite receptive fields that include regions of both increased and decreased firing rate as well as temporal structure. We will refer to these regions as excitatory and inhibitory, although these are measures of deviations from mean ongoing rate during characterization stimuli and are not necessarily diagnostic of an underlying synaptic mechanism.

Neurons with multipartite excitatory and inhibitory receptive fields can serve as detectors of stimulus edges in both sound frequency and in time. The neuron shown in Fig. 2D had a receptive field structure within its frequency-responsive area showing a long, narrow region of excitation flanked by inhibition, suggesting that this cell would extract information about sound components containing a continuous-frequency edge at a precise tonal location, or very spectrally narrow sounds at this frequency (observed selectivity is shown in Fig. 3). The neuron shown in Fig. 2E showed a brief region of excitation, with its dominant feature being symmetrical lateral inhibitory components above and below, also suggestive of detecting stimulus edges and strong tuning for sound-component width (24). The neuron shown in Fig. 2F had an STRF that predicted selectivity for stimuli with little sound energy in the receptive field, followed by strong energy—the edge of a stimulus in time rather than in frequency. This type of neuron would be predicted to respond to successive stimulus transients separated by times greater than a characteristic minimum time (25).

Figure 3

Detectors of stationary and moving spectral stimulus edges. The six color panels in (A) are sonograms of stationary-edge low-pass noise stimuli with increasing frequencies of the upper cutoff. The sonograms shown span 250 ms and 7 octaves, with increasing sound intensity progressing from blue to red. The first stimulus has a cutoff frequency of 1.369 kHz, and each successive stimulus has a cutoff frequency Embedded Imageth of an octave higher. (Top) The average responses to 20 presentations of the stimulus is presented as a peristimulus time histogram, in spikes per second, with 25-ms bins on a time scale in milliseconds, with the duration of the stimulus indicated by the red bar and with records of individual trials presented above. Of 270 total high-pass, low-pass, and band-pass noise stimuli presented, the neuron in (A) was selective for stimuli containing a spectral edge within the excitatory region of its STRF, preferring low-pass edges as shown (best stimulus cutoff, 4.3 octaves above 110 Hz), and including narrow-band stimuli with both upper and lower edges within this region. The sonograms in (B) show moving auditory grating stimuli consisting of pure tones spaced at two-octave intervals that sweep upward or downward in frequency at differing rates and are replaced when reaching the frequency range boundary (±4, 8, or 12 octaves/s shown, taken from a set of 24 total variants). These sonograms are 1 s long and span 7 octaves; pure tones appear broader at lower frequencies due to the resolution of the sonograms at low frequency. Peristimulus time histograms are shown in spikes per second with 100-ms bins on a time scale in seconds, with the duration of the stimulus indicated by the red bar and with records of individual trials presented above. The neuron in (B) was selective for descending FM stimuli with a rate of −8 octaves/s. All stimuli were generated digitally and presented at 44.1 kHz from audio compact disc.

Neurons were also observed that responded with increased rates to one frequency range at one time, and to a shifted frequency range at a later time, with regions of inhibition showing a similar shift. The neuron shown in Fig. 2G is an example of this type. This pattern of response within the receptive field is strongly analogous to motion energy detectors (25), which detect a moving stimulus edge, and this cell was indeed selective for the direction and rate of tones sweeping in frequency. The red line on the panel illustrates the slope of the sweeping tonal stimulus that drove the strongest response from this cell (observed selectivity is shown in Fig. 3). The neurons shown in Fig. 2, B, H, and I, had complex multimodal receptive-field structures, with multiple excitatory or inhibitory subregions. Response areas of this type are indicative of selectivity for stimulus-feature conjunctions, sometimes quite complex, and perhaps related to sounds that the animal must process within its learned environment. In addition, the neuron in (I) was chosen to illustrate a neuron with an STRF that was weaker and more diffuse, a feature often found in spatially nonlinear V1 neurons characterized as complex cells (26).

To conservatively estimate the statistical significance of each response component, we divided spectrotemporal receptive field data for each neuron into 10 segments of 1 min each and smoothed the data with a 1-pixel-wide Gaussian. Response subregions were deemed significant only if they were at least three standard errors above or below the overall mean value and were greater than one pixel in size. Of 206 single units recorded, 13.1% did not yield a statistically significant STRF in 10 min of recording time, suggesting that these cells were not responsive to the noise stimuli or were strongly nonlinear. Of cells with a significant STRF, 46.4% had multiple excitatory subregions, and 48.6% had multiple inhibitory subregions. The percentages of cells with different numbers of response subregions are presented in Table 1. Response areas were highly localized in temporal and spectral extent, suggesting decomposition of sounds into small-component features in the primary sensory cortex. The median bandwidth of total response area was 1.8 octaves (27), and the median temporal duration of response was 64 ms (28).

Table 1

Percentage of single neurons with different numbers of statistically significant excitatory and inhibitory subregions. Abbreviations: Ex, number of excitatory subregions of response area; Inh, number of inhibitory subregions of response area. Of the total, 13.1% did not have a significant STRF by our criteria and are not included in these percentages.

View this table:

This method produces an estimate of the feature processing by cortical neurons in frequency and time, but this estimate does not take into account the full range of properties of cortical neurons, particularly their nonlinearities and their processing of stimulus-component amplitude. Consequently, as with other characterization methods, the STRF reflects properties of these cells, but should not be taken to quantitatively predict neuronal responses to arbitrary stimuli. However, the feature-processing pattern observed for an auditory neuron's STRF can guide the generation of stimuli used to characterize the cell. Based on the finding of selectivity for stationary or moving spectral stimulus edges, stimuli were designed containing these features. These auditory stimuli are direct analogs of spatial stationary edges and moving gratings.

Individual stimuli are shown in Fig. 3 as color spectrograms where time is plotted along the horizontal axis, frequency content on an octave scale is along the vertical axis, and color corresponds to the intensity of the signal (from blue to red). The static-edge stimuli shown in the bottom of Fig. 3A are 250-ms continuous-noise bursts that contained all frequencies up to a discrete cutoff, with the position of the frequency edge increasing along the series. The neuron presented was highly selective for stationary frequency edges that fell exactly upon its long excitatory region (see the STRF in Fig. 2D), illustrating the behavior of a static-edge detector. Figure 3B shows drifting grating stimuli that contained pure-tone frequencies separated by two octaves that drifted continuously upward or downward in time, being replaced when the end of the chosen sound-frequency range was reached. The neuron in this example (STRF shown in Fig. 2G; the red line corresponds to −8 octaves/s) illustrates selectivity for the direction and the rate of the repeated moving-frequency edges of an auditory grating stimulus (29).

The STRF method can also be used to create estimated optimized stimuli for neurons with arbitrarily complex patterns of feature selectivity. Complex sound stimuli were designed to match the linear estimate of an optimized stimulus for a neuron and then played back to the same neuron to measure responsiveness. Figure4A shows the stable, isolated action potential waveforms from an auditory cortical neuron at the time when we collected initial characterization data, and 3 hours later when we presented a set of optimized stimuli. Figure 4D shows the strong response of this neuron to its estimated optimal stimulus, consisting of 10 repetitions of the linear average stimulus derived from the STRF (lasting a total of 1 s) (30), with individual trial responses shown above. This neuron was driven to a sustained rate of greater than 40 spikes/s by its predicted optimal stimulus. In Fig. 4, B to F, variant stimuli were generated that had the same average spectral content and total power as the predicted optimal stimulus, but were contracted or expanded in time to measure tuning in the local stimulus space, corresponding to a stretching or shrinking of the spectrogram's horizontal axis. The shorter to longer stimulus variants were also repeated 10 times in 1 s. The neuron was strongly selective for the predicted optimal stimulus, compared with stimuli made from components that were stretched by ¼, ½, 2, or 4×. In Fig. 4, H to L, stimuli were generated that had the same total sound energy in each time bin and the same center frequency, but frequency content corresponding to a stretching or shrinking of the spectrogram's vertical frequency axis, making the stimuli broader or narrower in spectral content. The neuron preferred the predicted optimal stimulus over four morphed versions, as well as over a pure tone at the center frequency (shown in Fig. 2G). The STRF for this cell was presented in Fig. 2H. This outstanding example illustrates that auditory cortical neurons can be capable of very high sustained rates when presented with optimized stimuli, even when conventional stimuli are ineffective. When optimized stimuli are used with higher repetition rates than those presented here, the most active primary auditory cortical neurons can achieve sustained rates as high as 60 spikes/s, more comparable to rates found in the primary visual and somatosensory cortices.

Figure 4

Responses of neurons in the primary auditory cortex of the awake primate to predicted optimal stimuli and to variants. (A) Voltage waveforms randomly selected throughout recording from one isolated single neuron (red) and other background spikes that crossed a high fixed threshold (blue) are presented at the time of initial characterization and during presentation of our optimized stimulus set 3 hours later. This neuron had a signal-to-noise ratio of >11:1 compared with background noise standard deviation. Horizontal bar, 1 ms; vertical bar, −100 μV. (B to L) Responses to the predicted optimal stimulus for the neuron in (A) and to variants. In (B) to (F), temporal profiles were expanded or contracted to ¼, ½, 1, 2, 4× the predicted optimal temporal profile. In (G) a pure tone was presented at the predicted best frequency, and in (H) to (L) spectral profiles were expanded or contracted to ¼, ½, 1, 2, 4× the predicted optimal profile. (M) The mean firing rate during the predicted optimal stimulus for 40 neurons in the awake primary auditory cortex plotted versus the integral of the positive component of the STRF after thresholding at 2 SDs and dividing by the total time, to yield units of spikes per second. (N) The selectivity of this population for the time expansion series and the average response to a set of stimuli optimized for other cells; (O) the selectivity of the population for the spectral expansion series and for pure tones. For each response, the increase over background rate was divided by the background rate for that neuron, responses were averaged, and means ± SE are shown. Median responses showed a similar pattern, indicating that the mean values do not merely reflect the contribution of the most pronounced examples.

From the infinite space of possible sounds, the predicted feature conjunctions for a neuron generated by reverse correlation produced highly effective stimuli for about half of auditory cortical cells. The full distribution of sustained rate responses for all 40 cells that were carried through this long protocol are shown in Fig. 4M. The continuous firing-rate responses to optimized stimuli are plotted along the vertical axis, against the integral of the excitatory area of the neuron's STRF, along the horizontal axis. Of the cells that responded poorly to optimized stimuli, all had STRFs showing only weak excitation. Figure 4 (N and O) shows population averages for the tuning of neurons to their predicted optimal stimuli compared with variants in temporal and spectral structure. The feature conjunctions predicted by reverse correlation on average produced the strongest responses from these cells, with monotonically decreasing responses to perturbations from this stimulus (P < 0.007 sign test, significance of centrally peaked, monotonic distribution), and with much weaker average responses to best-frequency pure tones, or to stimuli designed for other neurons.

These data collectively demonstrate that it is possible to generate linear estimates of the time and frequency feature processing of auditory cortical neurons, although they do not produce a complete quantitative model of selectivity to all stimuli. The method allows the design of stimuli that effectively probe cortical selectivity and feature decomposition. In decomposing visual forms or auditory scenes, the cortex uses detectors with similar characteristics for finding the position of stimulus edges along the sensory receptor surface, finding stimulus edges in time, finding stimulus movements, and finding feature conjunctions. Auditory cortical neurons have responses with similar extents of excitation and inhibition, a response time-course restricted to 20 to 100 ms, and selectivity to direction and velocity of stimulus movement within a temporal range of 2 to 40 cycles/s, which are similar, although not identical, to stimulus decomposition parameters in the visual cortex (31). In addition, these data indicate that by using spectrally and temporally optimized stimuli it is possible to drive many auditory cortical cells to high sustained rates of firing.

  • * To whom correspondence should be addressed. E-mail: merz{at}


View Abstract

Stay Connected to Science

Navigate This Article