## Abstract

We present a method for learning nonlinear systems, echo state networks (ESNs). ESNs employ artificial recurrent neural networks in a way that has recently been proposed independently as a learning mechanism in biological brains. The learning method is computationally efficient and easy to use. On a benchmark task of predicting a chaotic time series, accuracy is improved by a factor of 2400 over previous techniques. The potential for engineering applications is illustrated by equalizing a communication channel, where the signal error rate is improved by two orders of magnitude.

Nonlinear dynamical systems abound in the sciences and in engineering. If one wishes to simulate, predict, filter, classify, or control such a system, one needs an executable system model. However, it is often infeasible to obtain analytical models. In such cases, one has to resort to black-box models, which ignore the internal physical mechanisms and instead reproduce only the outwardly observable input-output behavior of the target system.

If the target system is linear, efficient methods for black-box modeling are available. Most technical systems, however, become nonlinear if operated at higher operational points (that is, closer to saturation). Although this might lead to cheaper and more energy-efficient designs, it is not done because the resulting nonlinearities cannot be harnessed. Many biomechanical systems use their full dynamic range (up to saturation) and thereby become lightweight, energy efficient, and thoroughly nonlinear.

Here, we present an approach to learning black-box models of nonlinear systems, echo state networks (ESNs). An ESN is an artificial recurrent neural network (RNN). RNNs are characterized by feedback (“recurrent”) loops in their synaptic connection pathways. They can maintain an ongoing activation even in the absence of input and thus exhibit dynamic memory. Biological neural networks are typically recurrent. Like biological neural networks, an artificial RNN can learn to mimic a target system—in principle, with arbitrary accuracy (*1*). Several learning algorithms are known (*2*–*4*) that incrementally adapt the synaptic weights of an RNN in order to tune it toward the target system. These algorithms have not been widely employed in technical applications because of slow convergence and suboptimal solutions (*5*, *6*). The ESN approach differs from these methods in that a large RNN is used (on the order of 50 to 1000 neurons; previous techniques typically use 5 to 30 neurons) and in that only the synaptic connections from the RNN to the output readout neurons are modified by learning; previous techniques tune all synaptic connections (Fig. 1). Because there are no cyclic dependencies between the trained readout connections, training an ESN becomes a simple linear regression task.

We illustrate the ESN approach on a task of chaotic time series prediction (Fig. 2) (*7*). The Mackey-Glass system (MGS) (*8*) is a standard benchmark system for time series prediction studies. It generates a subtly irregular time series (Fig. 2A). The prediction task has two steps: (i) using an initial teacher sequence generated by the original MGS to learn a black-box model *M* of the generating system, and (ii) using *M* to predict the value of the sequence some steps ahead.

First, we created a random RNN with 1000 neurons (called the “reservoir”) and one output neuron. The output neuron was equipped with random connections that project back into the reservoir (Fig. 2B). A 3000-step teacher sequence *d*(1),..., *d*(3000) was generated from the MGS equation and fed into the output neuron. This excited the internal neurons through the output feedback connections. After an initial transient period, they started to exhibit systematic individual variations of the teacher sequence (Fig. 2B).

The fact that the internal neurons display systematic variants of the exciting external signal is constitutional for ESNs: The internal neurons must work as “echo functions” for the driving signal. Not every randomly generated RNN has this property, but it can effectively be built into a reservoir (supporting online text).

It is important that the echo signals be richly varied. This was ensured by a sparse interconnectivity of 1% within the reservoir. This condition lets the reservoir decompose into many loosely coupled subsystems, establishing a richly structured reservoir of excitable dynamics.

After time *n* = 3000, output connection weights *w _{i}* (

*i*= 1,..., 1000) were computed (dashed arrows in Fig. 2B) from the last 2000 steps

*n*= 1001,..., 3000 of the training run such that the training error was minimized [

*x*(

_{i}*n*), activation of the

*i*th internal neuron at time

*n*]. This is a simple linear regression.

With the new *w _{i}* in place, the ESN was disconnected from the teacher after step 3000 and left running freely. A bidirectional dynamical interplay of the network-generated output signal with the internal signals

*x*(

_{i}*n*) unfolded. The output signal

*y*(

*n*) was created from the internal neuron activation signals

*x*(

_{i}*n*) through the trained connections

*w*, by . Conversely, the internal signals were echoed from that output signal through the fixed output feedback connections (supporting online text).

_{i}For testing, an 84-step continuation *d*(3001),..., *d*(3084) of the original signal was computed for reference. The network output *y*(3084) was compared with the correct continuation *d*(3084). Averaged over 100 independent trials, a normalized root mean square error was obtained (*d _{j}* and

*y*teacher and network output in trial

_{j}*j*, σ

^{2}variance of MGS signal), improving the best previous techniques (

*9*–

*15*), which used training sequences of length 500 to 10,000, by a factor of 700. If the prediction run was continued, deviations typically became visible after about 1300 steps (Fig. 2A). With a refined variant of the learning method (

*7*), the improvement factor rises to 2400. Models of similar accuracy were also obtained for other chaotic systems (supporting online text).

The main reason for the jump in modeling accuracy is that ESNs capitalize on a massive short-term memory. We showed analytically (*16*) that under certain conditions an ESN of size *N* may be able to “remember” a number of previous inputs that is of the same order of magnitude as *N*. This information is more massive than the information used in other techniques (supporting online text).

We now illustrate the approach in a task of practical relevance, namely, the equalization of a wireless communication channel (*7*). The essentials of equalization are as follows: A sender wants to communicate a symbol sequence *s*(*n*). This sequence is first transformed into an analog envelope signal *d*(*n*), then modulated on a high-frequency carrier signal and transmitted, then received and demodulated into an analog signal *u*(*n*), which is a corrupted version of *d*(*n*). Major sources of corruption are noise (thermal or due to interfering signals), multipath propagation, which leads to a superposition of adjacent symbols (intersymbol interference), and nonlinear distortion induced by operating the sender's power amplifier in the high-gain region. To avoid the latter, the actual power amplification is run well below the maximum amplification possible, thereby incurring a substantial loss in energy efficiency, which is clearly undesirable in cell-phone and satellite communications. The corrupted signal *u*(*n*) is then passed through an equalizing filter whose output *y*(*n*) should restore *u*(*n*) as closely as possible to *d*(*n*). Finally, the equalized signal *y*(*n*) is converted back into a symbol sequence. The quality measure for the entire process is the fraction of incorrect symbols finally obtained (symbol error rate).

To compare the performance of an ESN equalizer with standard techniques, we took a channel model for a nonlinear wireless transmission system from a study (*17*) that compared three customary nonlinear equalization methods: a linear decision feedback equalizer (DFE), which is actually a nonlinear method; a Volterra DFE; and a bilinear DFE. The model equation featured intersymbol interference across 10 consecutive symbols, a second-order and a third-order nonlinear distortion, and additive white Gaussian noise. All methods investigated in that study had 47 adjustable parameters and used sequences of 5000 symbols for training. To make the ESN equalizer comparable with the equalizers studied in (*17*), we took ESNs with a reservoir of 46 neurons (which is small for the ESN approach), which yielded 47 adjustable parameters. (The 47th comes from a direct connection from the input to the output neuron.)

We carried out numerous learning trials (*7*) to obtain ESN equalizers, using an online learning method (a version of the recursive least square algorithm known from linear adaptive filters) to train the output weights on 5000-step training sequences. We chose an online adaptation scheme here because the methods in (*17*) were online adaptive, too, and because wireless communication channels mostly are time-varying, such that an equalizer must adapt to changing system characteristics. The entire learning-testing procedure was repeated for signal-to-noise ratios ranging from 12 to 32 db. Figure 3 compares the average symbol error rates obtained with the results reported in (*17*), showing an improvement of two magnitudes for high signal-to-noise ratios.

For tasks with multichannel input and/or output, the ESN approach can be accommodated simply by adding more input or output neurons (*16*, *18*).

ESNs can be applied to all basic tasks of signal processing and control, including time series prediction, inverse modeling, pattern generation, event detection and classification, modeling distributions of stochastic processes, filtering, and nonlinear control (*16*, *18*, *19*, *20*). Because a single learning run takes only a few seconds (or minutes, for very large data sets and networks), engineers can test out variants at a high turnover rate, a crucial factor for practical usability.

ESNs have been developed from a mathematical and engineering perspective, but exhibit typical features of biological RNNs: a large number of neurons, recurrent pathways, sparse random connectivity, and local modification of synaptic weights. The idea of using randomly connected RNNs to represent and memorize dynamic input in network states has frequently been explored in specific contexts, for instance, in artificial intelligence models of associative memory (*21*), models of prefrontal cortex function in sensory-motor sequencing tasks (*22*), models of birdsong (*23*), models of the cerebellum (*24*), and general computational models of neural oscillators (*25*). Many different learning mechanisms were considered, mostly within the RNN itself. The contribution of the ESN is to elucidate the mathematical properties of large RNNs such that they can be used with a linear, trainable readout mechanism for general black-box modeling. An approach essentially equivalent to ESNs, liquid state networks (*26*, *27*), has been developed independently to model computations in cortical microcircuits. Recent findings in neurophysiology suggest that the basic ESN/liquid state network principle seems not uncommon in biological networks (*28*–*30*) and could eventually be exploited to control prosthetic devices by signals collected from a collective of neurons (*31*).

**Supporting Online Material**

www.sciencemag.org/cgi/content/full/304/5667/78/DC1

Materials and Methods

SOM Text

Figs. S1 to S4

References