Report

How Low Can You Go? Physical Production Mechanism of Elephant Infrasonic Vocalizations

See allHide authors and affiliations

Science  03 Aug 2012:
Vol. 337, Issue 6094, pp. 595-599
DOI: 10.1126/science.1219712

The Song of the Elephant

In mammals, vocal sound production generally occurs in one of two ways, either through muscular control—as when a cat purrs or, more commonly, by air passing through the vocal folds—which occurs in humans and facilitates production of extremely high frequency bat calls. Over the past 20 years, it has been recognized that elephants can communicate through extremely low frequency infrasonic sounds. Taking advantage of a natural death of an elephant in a zoo, Herbst et al. (p. 595) examined the biomechanics of elephant sound production in an excised elephant larynx. Self-sustained vocal-fold vibrations, without the presence of any neural control, were used to produce infrasonic elephant sounds, using the same mechanism as singing in humans and echolocation in bats.

Abstract

Elephants can communicate using sounds below the range of human hearing (“infrasounds” below 20 hertz). It is commonly speculated that these vocalizations are produced in the larynx, either by neurally controlled muscle twitching (as in cat purring) or by flow-induced self-sustained vibrations of the vocal folds (as in human speech and song). We used direct high-speed video observations of an excised elephant larynx to demonstrate flow-induced self-sustained vocal fold vibration in the absence of any neural signals, thus excluding the need for any “purring” mechanism. The observed physical principles of voice production apply to a wide variety of mammals, extending across a remarkably large range of fundamental frequencies and body sizes, spanning more than five orders of magnitude.

Mammal vocalizations from different species span a frequency range of nearly five orders of magnitude, from 9 Hz in some whales (1) to above 110,000 Hz in some bats (2). This is a remarkably wide operating range for tissue vibrations in the same organ, the larynx. The source of most mammal vocalizations is vibrations of the vocal folds, located within the larynx (3, 4). There are at least two well-documented mechanisms of sound production that involve vocal fold vibration. In the active muscular contraction (AMC) or “purring” mode, best documented in various cat species, the highest frequencies producible are limited by muscle contraction speeds, which even with superfast muscles cannot get much higher than 200 Hz (5). However, this mode allows arbitrarily low fundamental frequencies. In contrast, frequencies in the myoelastic-aerodynamic (MEAD) (6, 7) or “flow-driven” mode are tightly limited by the physical size of the oscillators (3), because the fundamental frequency range for a species is determined by the size of the vibrating tissue (i.e., the length of the vocal folds or cords). Consequently, there is a direct interspecific relationship between body mass, vocal fold size, and fundamental frequency for MEAD-induced vocalizations, but no such relationship for purring (Fig. 1A). For individual animals intraspecifically, the fundamental frequency range is also influenced by age and the consequent change in the vocal fold dimensions [see, for example, (8, 9)]. In MEAD vocalizations there seems to be no sharp upper limit, and frequencies above 110,000 Hz have been documented in bats. Additionally, the MEAD mechanism is probably more energetically efficient, because it requires no active time-varying neural firing or muscular contraction (10). Thus, the two known vocalization mechanisms have distinct advantages for different vocal ranges.

Fig. 1

Two mechanisms for vocal sound production in mammals. (A) Log-log plot of body mass versus average fundamental frequency for various mammals. The data for the AMC mode stem from data on purring cats (26); the data for vocalizations governed by the MEAD theory were taken from (2, 3, 10, 2730). (B) Mechanics of vocal fold closure in both MEAD [after (31)] and AMC theory. (C) Idealized representation of intrinsic laryngeal muscle activity during two glottal cycles for both MEAD and AMC theory [after (13)]. The asterisks indicate events of glottal closure.

Low vibration frequencies employing the MEAD mechanism are well known in human speech and singing and have previously been demonstrated in tigers [typically between 40 and 100 Hz (10)]. In this mechanism, the primary sound source is generated by flow-induced self-sustaining oscillations of the vocal folds, driven by air coming from the lungs (Fig. 1B). In contrast, AMC phonation is caused by a centrally driven periodic muscular modulation of respiratory flow (11). It results from the intermittent activation of intrinsic laryngeal muscles caused by a very regular, stereotyped pattern of EMG bursts occurring 20 to 30 times per second. Each of these muscle discharge bursts causes glottal closure (12) and the development of a transglottal pressure that generates sound when dissipated by glottal opening (13) (Fig. 1, B and C). Low vibration frequencies using the AMC mechanism are well studied in cats (typically around 30 Hz) and toadfish (20 to 280 Hz). Despite some early claims (14), there is no evidence of sound production based on AMC in humans (6).

The two models shown in Fig. 1A separately account for most mammalian vocalizations. However, at the lower end of the frequency range, the infrasound vocalizations of the African savannah elephant (Loxodonta africana) are ambiguous in this context. These low-frequency vocalizations (1522) are called “infrasonic,” because their fundamental frequency is below 20 Hz. Elephant infrasound vocalizations are thought to be produced in the larynx (19), but whether the production mechanism relies on the MEAD or AMC mechanism remains disputed (18, 23, 24).

To answer this question, we experimentally induced infrasound vocalizations in an excised larynx of an African elephant in the laboratory (Fig. 2). In such a setup, the vocal folds (labeled 5 in Fig. 2, B to D) are brought together by adducting the arytenoid cartilages (labeled 4 in Fig. 2, B to D), thus sealing the glottal air space and increasing the tracheal pressure during exhalation. This pressure is then dissipated as a pulsating air flow generated by the vibrating vocal folds. In an excised larynx experiment, the AMC mode of production is impossible, because the laryngeal muscles are disconnected from the nervous system and receive no periodic nervous input. Thus, if in vivo elephant infrasound vocalizations relied on the AMC mode, they could not be duplicated in our setup. On the other hand, successful generation of species-typical infrasounds in an excised larynx setup would demonstrate that the MEAD mechanism can account for infrasound production, and AMC is not necessary. This would strongly suggest that the MEAD theory can fully account for in vivo vocalization (though in vivo AMC cannot be definitively ruled out by our experiments).

Fig. 2

Elephant larynx anatomy and experiment setup. (A) Vocal anatomy of the elephant: red, larynx and hyoid apparatus; pink, oral vocal tract; blue, nasal vocal tract (see the supplementary materials for the role of the vocal tract in vocalization). (B) Midsagittal cut through the larynx, with the left half displayed. (C) Computed tomograpy (CT) scan of the excised larynx, made 2 hours before the experiment. (D) Schematic drawing of the laryngeal anatomy, traced from the CT scan. Labels for (B) to (D) are as follows: 1, trachea; 2, cricoid cartilage; 3, thyroid cartilage; 4, arytenoid cartilage; 5, vocal fold; 6, ventricular fold [not seen in midsagittal sections in (C) and (D)]; 7, epiglottis. (E) Schematic illustration of the experimental excised larynx setup. EGG, electroglottographic.

We observed flow-induced vocal fold vibration, starting at air pressures of 17 mbar [in comparison, the phonation threshold pressure in humans is about 3 to 4 mbar (3)]. The sounds produced by the excised larynx were closely comparable to in vivo low-frequency vocalizations of African elephants (see Fig. 3A for an example) and with a MEAD-based computational simulation of vibrating elephant vocal folds (fig. S1).

Fig. 3

Infrasound vocalization of the African elephant. (A) (Top) Time-domain signal (left) and spectrogram (right) of an in vivo low-frequency vocalization of a female African elephant (18 years old, weighing 3200 kg). (Bottom) Time-domain signal (left) and spectrogram (right) of a phonation generated by the excised larynx. (B) Measurement and correlates of vocal fold vibration in the excised larynx: acoustic signal; electroglottographic signal, measuring relative vocal fold contact area; and time-varying glottal area (number of pixels). (C) Images extracted from high-speed video recordings, illustrating one cycle of vocal fold vibration: 1, just before vocal fold opening; 2, start of vocal fold separation, caused by air pressure buildup below the vocal folds; 3, maximal area of the glottis during the vibratory cycle; 4, maximal separation of the superior vocal fold margins (note the inferior/superior phase difference in vocal fold vibration); 5, full vocal fold closure [see also (B)].

Our data show a close relationship between acoustic, electroglottographic, and high-speed video signals (Fig. 3B). Acoustic energy was created during both opening and closure of the glottis. The vocal fold contact area [as encoded in the electroglottographic signal (see the supplementary materials)] reached a maximum immediately after glottal closure, suggesting that the inferior edges of the vocal folds started to open shortly after the closing event, resulting in a inferior/superior phase difference of vocal fold vibration during the closed phase. The same inferior/superior phase difference was observed in the open phase (image 4 in Fig. 3C), effectively shortening the open phase and resulting in a very small open quotient (Fig. 4D). Such a phase difference in vocal fold vibration is necessary for energy transfer from the air stream into the tissue, and for creating pressure gradients (caused by the divergent vocal fold shape) that enable closure of the glottis. The observed vibratory behavior is fully consistent with the MEAD theory.

Fig. 4

Vocal fold oscillation regimes observed in the excised larynx. (A) Extracted glottal area (GA), specified in pixels (top row); and EGG signal (bottom row) for periodic oscillation (385 ms displayed), period doubling (140 ms displayed), and deterministic chaos (200 ms displayed). (B) Phase portraits created from the signals shown in (A). (C) Glottovibrograms (GVGs) for the three observed vocal fold vibration regimes: periodic; period doubling (the vocal folds vibrated twice as fast in the dorsal part of the glottis, as compared to the ventral part); and deterministic chaos. (D) One cycle of periodic vocal fold vibration, as seen in the GVG. The open quotient is defined as the relation of the duration of the open phase to the period.

Another important feature seen in our elephant larynx setup, predicted by MEAD theory, was the frequent occurrence of nonlinear phenomena (25). Apart from periodic oscillations, our analysis revealed a variety of these nonlinear phenomena: period doubling, tripling, and quadrupling; biphonation; chaos; and bifurcations between various phonatory regimes (Fig. 4 and movie S1). Three stereotypical patterns of vocal fold vibration (periodic, subharmonic, and chaotic) have been analyzed with three-dimensional phase portraits [a graphical representation of a dynamical system’s possible states, illustrating the evolution of its vibratory characteristics over time (Fig. 4B)] and with glottovibrograms [a visualization of the time-varying glottal width along the entire glottal axis (Fig. 4C and movie S2 for the measurement of the time-varying glottal area)]. The frequent occurrence of nonlinear phenomena might be partly related to the large dimensions of the elephant vocal folds, facilitating higher-order modes of vibration.

The observed fundamental frequencies (F0) of periodic vocal fold vibration were in the range of 5 to 60 Hz, with an average of 16.38 Hz (analyzing all periodic phonations produced in the course of the experiment). These values agree well with the fundamental frequency produced by a computational simulation (supplementary materials), with our recordings of the elephants, and with values from live elephants published in the literature (20, 21). As a first approximation, the fundamental frequency of vocal fold vibration can be explained by a piano-string model (3), in which a change in vocal fold resting length would be inversely and linearly related to F0. Applying this model to the measured elephant vocal fold length of 10.4 cm results in a predicted F0 of 18.43 Hz (supplementary materials), which is remarkably close to the mean F0 measured in our experiments. It is thus apparent that the frequency of the infrasonic vocalizations is directly related to the length of the large elephant vocal folds, again as predicted by the MEAD theory.

In the in vivo situation, interactions between the vibrating vocal folds and the vocal tract may slightly raise or lower the fundamental frequency, as well as introduce nonlinear phenomena. To evaluate these possibilities, we calculated the predicted formant frequencies for the elephant vocal tract, assuming a uniform cross-sectional area (supplementary materials). Given an estimated oral vocal tract length of 75 cm and a nasal tract length of 2.5 m, the lowest formants would be 117 and 35 Hz, respectively. Because these estimated formant frequencies are at least two times higher than the observed fundamental frequencies, the vocal tract should not play a crucial role in the creation of nonlinear phenomena, nor substantially modify F0, of elephant infrasound vocalizations. This indicates that the nonlinearities seen in our excised larynx experiments are, in the absence of a vocal tract, a consequence of vocal fold dynamics and not source/tract interaction (for details, see the supplementary materials).

This study is the first to directly observe the sound production mechanism of elephant infrasound vocalizations. We have shown that low-frequency phonation can be created by flow-induced self-sustaining oscillations of the vocal folds, in accordance with the MEAD theory of voice production. The behavior of the vibrating tissue is governed by biomechanical properties, powered by tracheal air pressure.

Although we can clearly rule out a role for active muscle twitching in our excised larynx preparation, we obviously cannot eliminate the possibility of such “purring” in a living elephant. However, our study demonstrates that there is no need for such twitching to produce loud low-frequency vocalizations such as elephant rumbles. The low fundamental frequency of the produced sounds is directly related to the dimensions and tension of the vibrating tissue, based on well-understood physical principles. The elephant larynx constitutes a vibrating system that behaves in a fashion similar to that known in humans and other mammals, showing that flow-induced vocal fold vibration offers a physiologically and evolutionarily efficient means to produce the very intense low-frequency sounds used in elephant long-distance communication (19).

Supplementary Materials

www.sciencemag.org/cgi/content/full/337/6094/595/DC1

Materials and Methods

Supplementary Text

Figs. S1 and S2

References (3245)

Movies S1 and S2

References and Notes

  1. The glottis (rima glottidis) is the elongated opening between the vocal folds.
  2. Acknowledgments: This research was supported by European Research Council Advanced Grant SOMACCA and a startup grant from the University of Vienna (W.T.F.); an Austrian Science Fund (FWF) grant, P 23099 (A.S.); grant no. LO1413/2 by the Deutsche Forschungsgemeinschaft (J.L.); and grant R01 DC 008612 (A Simulator for Sound Production in Airways) from the National Institute for Deafness and Other Communication Disorders (I.R.T.). Our sincere thanks go to B. Blaszkiewitz (Direktor, Tierpark Berlin Friedrichsfelde) for supplying us with the elephant larynx. We thank R. Hofer for contributing to the setup of the excised larynx experiment, P. Pesak for assisting in the computed tomography scan of the larynx specimen, and N. Kavcik for creating the artwork in Fig. 1 and the figure of the excised larynx setup in Fig. 2. The data reported in this manuscript are available in the supplementary materials.

Stay Connected to Science

Navigate This Article