Research Article

Universality and diversity in human song

See allHide authors and affiliations

Science  22 Nov 2019:
Vol. 366, Issue 6468, eaax0868
DOI: 10.1126/science.aax0868
  • Studying world music systematically.

    We used primary ethnographic text and field recordings of song performances to build two richly annotated cross-cultural datasets: NHS Ethnography and NHS Discography. The original material in each dataset was annotated by humans (both amateur and expert) and by automated algorithms.

  • Fig. 1 Design of the NHS Ethnography.

    The illustration depicts the sequence from acts of singing to the ethnography corpus. (A) People produce songs in conjunction with other behavior, which scholars observe and describe in text. These ethnographies are published in books, reports, and journal articles and then compiled, translated, cataloged, and digitized by the Human Relations Area Files organization. (B) We conduct searches of the online eHRAF corpus for all descriptions of songs in the 60 societies of the Probability Sample File and annotate them with a variety of behavioral features. The raw text, annotations, and metadata together form the NHS Ethnography. Codebooks listing all available data are in tables S1 to S6; a listing of societies and locations from which texts were gathered is in table S12.

  • Fig. 2 Patterns of variation in the NHS Ethnography.

    (A to E) Projection of a subset of the NHS Ethnography onto three principal components. Each point represents the posterior mean location of an excerpt, with points colored by which of four types (identified by a broad search for matching keywords and annotations) it falls into: dance (blue), lullaby (green), healing (red), or love (yellow). The geometric centroids of each song type are represented by the diamonds. Excerpts that do not match any single search are not plotted but can be viewed in the interactive version of this figure at, along with all text and metadata. Selected examples of each song type are presented here [highlighted circles and (B) to (E)]. (F to H) Density plots show the differences between song types on each dimension. Criteria for classifying song types from the raw text and annotations are shown in table S17.

  • Fig. 3 Society-wise variation in musical behavior.

    Density plots for each society show the distributions of musical performances on each of the three principal components (Formality, Arousal, Religiosity). Distributions are based on posterior samples aggregated from corresponding ethnographic observations. Societies are ordered by the number of available documents in the NHS Ethnography (the number of documents per society is displayed in parentheses). Distributions are color-coded according to their mean distance from the global mean (in z-scores; redder distributions are farther from 0). Although some societies’ means differ significantly from the global mean, the mean of each society’s distribution is within 1.96 standard deviations of the global mean of 0. One society (Tzeltal) is not plotted because it has insufficient observations for a density plot. Asterisks denote society-level mean differences from the global mean. *P < 0.05, **P < 0.01, ***P < 0.001.

  • Fig. 4 Design of the NHS Discography.

    (A) Illustration depicting the sequence from acts of singing to the audio discography. People produce songs, which scholars record. We aggregate and analyze the recordings via four methods: automatic music information retrieval, annotations from expert listeners, annotations from naïve listeners, and staff notation transcriptions (from which annotations are automatically generated). The raw audio, four types of annotations, transcriptions, and metadata together form the NHS Discography. (B) Plot of the locations of the 86 societies represented, with points colored by the song type in each recording (blue, dance; red, healing; yellow, love; green, lullaby). Codebooks listing all available data are in tables S1 and S7 to S11; a listing of societies and locations from which recordings were gathered is in table S22.

  • Fig. 5 Form and function in song.

    (A) In a massive online experiment (N = 29,357), listeners categorized dance songs, lullabies, healing songs, and love songs at rates higher than chance level of 25%, but their responses to love songs were by far the most ambiguous (the heat map shows average percent correct, color-coded from lowest magnitude, in blue, to highest magnitude, in red). Note that the marginals (below the heat map) are not evenly distributed across behavioral contexts: Listeners guessed “healing” most often and “love” least often despite the equal number of each in the materials. The d-prime scores estimate listeners’ sensitivity to the song-type signal independent of this response bias. (B) Categorical classification of the behavioral contexts of songs, using each of the four representations in the NHS Discography, is substantially above the chance performance level of 25% (dotted red line) and is indistinguishable from the performance of human listeners, 42.4% (dotted blue line). The classifier that combines expert annotations with transcription features (the two representations that best ignore background sounds and other context) performs at 50.8% correct, above the level of human listeners. (C) Binary classifiers that use the expert annotation + transcription feature representations to distinguish pairs of behavioral contexts [e.g., dance from love songs, as opposed to the four-way classification in (B)] perform above the chance level of 50% (dotted red line). Error bars represent 95% confidence intervals from corrected resampled t tests (94).

  • Fig. 6 Signatures of tonality in the NHS Discography.

    (A) Histograms representing 30 expert listeners' ratings of tonal centers in all 118 songs, each song corresponding to a different color, show two main findings: (i) Most songs’ distributions are unimodal, such that most listeners agreed on a single tonal center (represented by the value 0). (ii) When listeners disagree, they are multimodal, with the most popular second mode (in absolute distance) five semitones away from the overall mode, a perfect fourth. The music notation is provided as a hypothetical example only, with C as a reference tonal center; note that the ratings of tonal centers could be at any pitch level. (B) The scatterplot shows the correspondence between modal ratings of expert listeners with the first-rank predictions from the Krumhansl-Schmuckler key-finding algorithm. Points are jittered to avoid overlap. Note that pitch classes are circular (i.e., C is one semitone away from C# and from B) but the plot is not; distances on the axes of (B) should be interpreted accordingly.

  • Fig. 7 Dimensions of musical variation in the NHS Discography.

    (A) A Bayesian principal components analysis reduction of expert annotations and transcription features (the representations least contaminated by contextual features) shows that these measurements fall along two dimensions that may be interpreted as rhythmic complexity and melodic complexity. (B and C) Histograms for each dimension show the differences—or lack thereof—between behavioral contexts. (D to G) Excerpts of transcriptions from songs at extremes from each of the four quadrants, to validate the dimension reduction visually. The two songs at the high–rhythmic complexity quadrants are dance songs (in blue); the two songs at the low–rhythmic complexity quadrants are lullabies (in green). Healing songs are depicted in red and love songs in yellow. Readers can listen to excerpts from all songs in the corpus at; an interactive version of this plot is available at

  • Fig. 8 The distributions of melodic and rhythmic patterns in the NHS Discography follow power laws.

    (A and B) We computed relative melodic (A) and rhythmic (B) bigrams and examined their distributions in the corpus. Both distributions followed a power law; the parameter estimates in the inset correspond to those from the generalized Zipf-Mandelbrot law, where s refers to the exponent of the power law and β refers to the Mandelbrot offset. Note that in both plots, the axes are on logarithmic scales. The full lists of bigrams are in tables S28 and S29.

  • Table 1 Cross-cultural associations between song and other behaviors.

    We tested 20 hypothesized associations between song and other behaviors by comparing the frequency of a behavior in song-related passages to that in comparably-sized samples of text from the same sources that are not about song. Behavior was identified with two methods: topic annotations from the Outline of Cultural Materials (“OCM identifiers”) and automatic detection of related keywords (“WordNet seed words”; see table S19). Significance tests compared the frequencies in the passages in the full Probability Sample File containing song-related keywords (“Song freq.”) with the frequencies in a simulated null distribution of passages randomly selected from the same documents (“Null freq.”). ***P < 0.001, **P < 0.01, *P < 0.05, using adjusted P values (88); 95% intervals for the null distribution are in parentheses.

    HypothesisOCM identifier(s)Song freq.Null freq.WordNet seed word(s)Song freq.Null freq.
    (397, 467)
    (3105, 3468)
    InfancyINFANT CARE63*44
    (33, 57)
    infant, baby, cradle, lullaby688**561
    (491, 631)
    (1004, 1123)
    heal, shaman, sick, cure3983***2466
    (2317, 2619)
    (2130, 2295)
    religious, spiritual, ritual8644***5521
    (5307, 5741)
    (250, 304)
    play, game, child, toy4130***2732
    (2577, 2890)
    ProcessionSPECTACLES; NUPTIALS371***213
    (188, 240)
    wedding, parade, march, procession, funeral, coronation2648***1495
    (1409, 1583)
    (476, 557)
    mourn, death, funeral3784***2511
    (2373, 2655)
    (81, 117)
    ritual, ceremony8520**5138
    (4941, 5343)
    (12, 29)
    entertain, spectacle744***290
    (256, 327)
    ChildrenCHILDHOOD ACTIVITIES178***108
    (90, 126)
    (3304, 3647)
    Mood/emotionsDRIVES AND EMOTIONS219***138
    (118, 159)
    mood, emotion, emotive796***669
    (607, 731)
    WorkLABOR AND LEISURE137***60
    (47, 75)
    work, labor3500**3223
    (3071, 3378)
    StorytellingVERBAL ARTS; LITERATURE736***537
    (506, 567)
    story, history, myth2792***2115
    (1994, 2239)
    Greeting visitorsVISITING AND HOSPITALITY360***172
    (148, 196)
    visit, greet, welcome1611***1084
    (1008, 1162)
    (253, 311)
    war, battle, raid3154***2254
    (2122, 2389)
    (322, 388)
    praise, admire, acclaim481***302
    (267, 339)
    (119, 162)
    love, courtship1625***804
    (734, 876)
    (141, 187)
    bond, cohesion1582***1424
    (1344, 1508)
    (169, 218)
    marriage, wedding20112256
    (2108, 2410)
    Art/creationN/An/an/aart, creation905***694
    (630, 757)
  • Table 2 Features of songs that distinguish between behavioral contexts.

    The table reports the predictive influence of musical features in the NHS Discography in distinguishing song types across cultures, ordered by their overall influence across all behavioral contexts. The classifiers used the average rating for each feature across 30 annotators. The coefficients are from a penalized logistic regression with standardized features and are selected for inclusion using a LASSO for variable selection. For brevity, we only present the subset of features with notable influence on a pairwise comparison (coefficients greater than 0.1). Changes in the values of the coefficients produce changes in the predicted log-odds ratio, so the values in the table can be interpreted as in a logistic regression.

    Coefficient (pairwise comparison)
    Musical featureDefinitionDance () vs.
    Lullaby (+)
    Dance () vs.
    Love (+)
    Healing () vs.
    Lullaby (+)
    Love () vs.
    Lullaby (+)
    Dance () vs.
    Healing (+)
    Healing (–) vs.
    Love (+)
    AccentThe differentiation of musical pulses, usually by volume or emphasis of articulation. A fluid, gentle song will have few accents and a correspondingly low value.–0.64–0.24–0.85–0.41.–0.34
    TempoThe rate of salient rhythmic pulses, measured in beats per minute; the perceived speed of the music. A fast song will have a high value.–0.65–0.51..–0.76.
    Quality of pitch collectionMajor versus minor key. In Western music, a key usually has a “minor” quality if its third note is three semitones from the tonic. This variable was derived from annotators’ qualitative categorization of the pitch collection, which we then dichotomized into Major (0) or Minor (1)..0.260.44.–0.370.35
    Consistency of macrometerMeter refers to salient repetitive patterns of accent within a stream of pulses. A micrometer refers to the low-level pattern of accents; a macrometer refers to repetitive patterns of micrometer groups. This variable refers to the consistency of the macrometer, in an ordinal scale, from “No macrometer” (1) to “Totally clear macrometer” (6). A song with a highly variable macrometer will have a low value.–0.44–0.49..–0.46.
    Number of common intervalsVariability in interval sizes, measured by the number of different melodic interval sizes that constitute more than 9% of the song’s intervals. A song with a large number of different melodic interval sizes will have a high value..0.58...0.62
    Pitch rangeThe musical distance between the extremes of pitch in a melody, measured in semitones. A song that includes very high and very low pitches will have a high value....–0.49..
    Stepwise motionStepwise motion refers to melodic strings of consecutive notes (1 or 2 semitones apart), without skips or leaps. This variable consists of the fraction of all intervals in a song that are 1 or 2 semitones in size. A song with many melodic leaps will have a low value.....0.61–0.20
    Tension/releaseThe degree to which the passage is perceived to build and release tension via changes in melodic contour, harmonic progression, rhythm, motivic development, accent, or instrumentation. If so, the song is annotated with a value of 1..0.27...0.27
    Average melodic interval sizeThe average of all interval sizes between successive melodic pitches, measured in semitones on a 12-tone equal temperament scale, rather than in absolute frequencies. A melody with many wide leaps between pitches will have a high value..–0.46....
    Average note durationThe mean of all note durations; a song predominated by short notes will have a low value......–0.49
    Triple micrometerA low-level pattern of accents that groups together pulses in threes.....–0.23.
    Predominance of most common pitch classVariety versus monotony of the melody, measured by the ratio of the proportion of occurrences of the second most common pitch (collapsing across octaves) to the proportion of occurrences of the most common pitch; monotonous melodies will have low values.....–0.48.
    Rhythmic variationVariety versus monotony of the rhythm, judged subjectively and dichotomously. Repetitive songs have a low value.....0.42.
    Tempo variationChanges in tempo: A song that is perceived to speed up or slow down is annotated with a value of 1......–0.27
    OrnamentationComplex melodic variation or “decoration” of a perceived underlying musical structure. A song perceived as having ornamentation is annotated with a value of 1..0.25....
    Pitch class variationA pitch class is the group of pitches that sound equivalent at different octaves, such as all the Cs, not just middle C. This variable, another indicator of melodic variety, counts the number of pitch classes that appear at least once in the song...–0.25...
    Triple macrometerIf a melody arranges micrometer groups into larger phrases of three, like a waltz, it is annotated with a value of 1...0.14...
    Predominance of most common intervalVariability among pitch intervals, measured as the fraction of all intervals that are the most common interval size. A song with little variability in interval sizes will have a high value.....0.12.

Supplementary Materials

  • Universality and diversity in human song

    Samuel A. Mehr, Manvir Singh, Dean Knox, Daniel M. Ketter, Daniel Pickens-Jones, S. Atwood, Christopher Lucas, Nori Jacoby, Alena A. Egner, Erin J. Hopkins, Rhea M. Howard, Joshua K. Hartshorne, Mariela V. Jennings, Jan Simson, Constance M. Bainbridge, Steven Pinker, Timothy J. O’Donnell, Max M. Krasnow, Luke Glowacki

    Materials/Methods, Supplementary Text, Tables, Figures, and/or References

    Download Supplement
    • Supplementary Text
    • Figs. S1 to S15
    • Tables S1 to S37 
    • References and Notes 

Stay Connected to Science

Navigate This Article