Emotion semantics show both cultural variation and universal structure

See allHide authors and affiliations

Science  20 Dec 2019:
Vol. 366, Issue 6472, pp. 1517-1522
DOI: 10.1126/science.aaw8160

The diverse way that languages convey emotion

It is unclear whether emotion terms have the same meaning across cultures. Jackson et al. examined nearly 2500 languages to determine the degree of similarity in linguistic networks of 24 emotion terms across cultures (see the Perspective by Majid). There were low levels of similarity, and thus high variability, in the meaning of emotion terms across cultures. Similarity of emotion terms could be predicted on the basis of the geographic proximity of the languages they originate from, their hedonic valence, and the physiological arousal they evoke.

Science, this issue p. 1517; see also p. 1444


Many human languages have words for emotions such as “anger” and “fear,” yet it is not clear whether these emotions have similar meanings across languages, or why their meanings might vary. We estimate emotion semantics across a sample of 2474 spoken languages using “colexification”—a phenomenon in which languages name semantically related concepts with the same word. Analyses show significant variation in networks of emotion concept colexification, which is predicted by the geographic proximity of language families. We also find evidence of universal structure in emotion colexification networks, with all families differentiating emotions primarily on the basis of hedonic valence and physiological activation. Our findings contribute to debates about universality and diversity in how humans understand and experience emotion.

Many human languages have rich vocabularies devoted to communicating emotions. Although not all emotion words are common—the German word Sehnsucht refers to a strong desire for an alternative life and has no direct translation in English—there are many words that appear to name similar emotional states across the world’s spoken languages. Translation dictionaries, for example, suggest that the English word love can be equated with the Turkish word sevgi and the Hungarian word szerelem. But does this mean that the concept of “love” is the same in English, Turkish, and Hungarian? Here, we explore this question by examining the meaning of emotion concepts in a sample of 2474 languages from 20 major language families. Using a new method from comparative linguistics, we examine sources of variation and structure in emotion semantics across this global sample of languages.

Early theories of emotion, drawing from Darwin (1), suggested that there are a discrete number of universal emotions from which all other emotions are derived (24). Many of these theories claimed that, just as there are primary colors (e.g., yellow, red), there may be primary emotions (e.g., anger, sadness) that evolved in mammalian brains (4). In turn, many languages may develop words for primary emotion concepts such as “anger” and “sadness” because these concepts name experiences derived from universal biological structures that are shared by all humans (24). These theories do allow for cultural and linguistic variation in emotion, but tend not to model or predict this variation.

There is a growing recognition, however, that emotions can vary systematically in their meaning and experience across culture and language (57). Constructionist models of emotion in particular claim that concepts such as “anger” and “sadness” do not derive from dedicated brain structures (8), but occur when humans make socially learned inferences about the meaning of basic physiological processes linked to maintaining the body’s homeostasis (9, 10). The meaning of emotion concepts (i.e., “emotion semantics”) should thus draw from both culturally evolved conceptualizations as well as biologically evolved physiology.

If cultural evolutionary processes shape the meaning of emotion concepts, the historical relationships between language groups should predict which languages have the most similar emotion semantics. Language groups in closer geographic proximity are the most likely to engage in borrowing (the sharing of concepts, norms, etc.) and also tend to share more recent common ancestors than geographically distant groups (11). We thus hypothesize that emotion semantics are associated with a language group’s geographic location: Language groups in close geographic proximity may have more similar emotion semantics than distant groups. Although cultural variation in emotion is plausible under many models of emotion, a link between geographic distance and emotion semantics would support constructionism’s claim that emotions are conceptualized using social learning.

Biologically evolved physiology should provide universal structure to emotion semantics, but the exact sources of this structure are not clear. Constructionist models of emotion emphasize the roles of valence—the hedonic pleasantness versus unpleasantness of emotions—and activation—the physiological arousal associated with experiencing emotions (810). According to these models, valence and activation reflect basic neurophysiological processes that signal when the body shifts away from homeostasis (9), and the universal importance of these processes may lead all languages to differentiate emotions primarily on the basis of their degree of valence and activation. Other accounts, however, suggest that factors such as dominance, certainty, sociality, and approach-avoidance may also represent universal dimensions of variance in emotion semantics (1215).

Predictions about the influence of culture and biology on emotion have long been examined and debated, yet findings from past studies are mixed. An early study found that human subjects from remote Papua New Guinea matched posed facial expressions to emotional situations at similar rates to North Americans (16), whereas recent field studies among other small-scale societies have found considerably more cultural variability in people’s conceptualization of emotion (17). These mixed results may be due to methodological limitations of past research. Owing to logistical challenges, the vast majority of cross-cultural studies have been two-group comparisons (17), and the few multigroup studies on emotion have sampled predominantly from industrial and globalized nations (18, 19). Moreover, human subject–based studies seldom present emotions as they naturally occur, instead using posed facial expressions, fictional vignettes, and exaggerated vocalizations as test stimuli. Finally, human subject–based studies may be susceptible to demand characteristics and researcher bias: Studies with imposed training phases and forced choice paradigms have found evidence for universal recognition of emotion (16), whereas studies with fewer constraints have found more cultural variability (17).

As an alternative to human subjects–based research, analyses of naturally occurring language can have high ecological validity and do not rely on human subject recruitment. Language may be an imprecise metric of experience, but analyzing how people use words can reveal how they experience emotions as similar or different. Several linguistic studies have conducted these analyses by qualitatively comparing the meaning of emotion words by searching for semantic primitives that have similar meanings across many languages (20). Yet few studies have quantitatively compared the meaning of emotion words because the field lacks metrics that quantify the semantic distance between words such as the English love and the Turkish sevgi (21).

To overcome this challenge, we take a new quantitative approach to estimate variability and structure in emotion semantics. Our approach examines cases of colexification, instances in which multiple concepts are coexpressed by the same word form within a language. Colexifications are useful for addressing questions about semantic structure because they often arise when two concepts are perceived as conceptually similar (22, 23) (see fig. S5). Persian, for instance, uses the word-form ænduh to express both the concepts of “grief” and “regret,” whereas the Sirkhi dialect of Dargwa uses the word-form dard to express both the concepts of “grief” and “anxiety.” Persian speakers may therefore understand “grief” as an emotion more similar to “regret,” whereas Dargwa speakers may understand “grief” as more similar to “anxiety.”

Past research has used colexification patterns across languages to examine the semantic structure of non-emotion concepts. Youn and colleagues coded dictionaries from 81 languages to show that concepts such as “sun,” “river,” “mountain,” and “hill” had universal patterns of colexification that reflected concepts’ material and functional properties (21). For instance, languages were more likely to colexify concepts such as “water” and “sea,” than concepts such as “sun” and “water,” implying that speakers of these languages viewed “water” and “sea” as semantically similar concepts and “sun” and “water” as distinct. We use a similar approach to estimate the variation and structure of emotion semantics across language families.

To gather a high-powered sample, we computationally aggregated colexifications into a database of cross-linguistic colexifications (CLICS) featuring 2474 languages and 2439 distinct concepts—including 24 emotion concepts. We then used a random walk probability procedure to generate colexification networks (24). In these networks, nodes represented emotion concepts, and edges represented colexifications between these concepts, weighted by the number of languages that possessed a particular colexification. We used this procedure to construct a network for all languages in our database, and then for 20 individual language families whose colexification networks had a significant level of modularity (ps < 0.001). Although nodes in each language family network were labeled with the same emotion concepts (“anger”), comparing patterns of colexification across language families allowed us to test whether these nodes actually showed universal semantic equivalence or whether their patterns of association reflected semantic variation (see supplementary text for more details).

A key step in these network comparisons involved identifying communities: clusters of emotion concepts that are more tightly colexified with one another than with emotion concepts outside of the community. For each network, we computed community structure using the Cluster Optimal algorithm (25). Figure 1 displays the global colexification network and the five largest language family–specific networks, and fig. S1 displays the remaining language families. Family-specific colexification networks allowed us to estimate global variability in emotion semantics and to predict variation and structure in emotion semantics across language families.

Fig. 1 Colexification of emotion concepts across all languages (top left) and the largest language families.

Nodes are emotion concepts, and node size represents the number of colexifications involving the concept. Edges represent colexifications, and edge thickness represents the number of colexifications between two emotion concepts. Node color designates community.

We estimated global variation in emotion semantics by comparing the community structures of language family networks. We quantified agreement in community structure using adjusted Rand indices (ARIs), which indicate the similarity of two networks’ community structures (26). Negative ARI values indicate that two networks’ community partitions vary more than would be expected by chance, ARI values of 0 indicate that two networks’ community partitions vary at a level that would be expected at chance, and ARI values approaching 1 reflect high agreement in community structure between two networks. The distribution of raw ARIs indicated high variability in community structure across language families, with a mean ARI of 0.09 (SD = 0.11). Because ARIs can be artificially low in networks with few edges owing to isolated nodes, we also examined the ARI values for a thresholded set of community comparisons. Through a series of permutation tests, we identified pairs of communities that were more similar than would be expected by chance and then thresholded our sample to only include these permutation-robust community comparisons. With this more conservative set of comparisons, the mean ARI was 0.22 (SD = 0.09), still reflecting high variability in emotion semantics across language families.

To test whether variation in emotion colexification patterns merely arose from methodological factors, such as the way that concepts were glossed in our database, we next compared the ARI values from our emotion concept comparisons to ARI values for colexification networks involving color concepts. Color concepts have also been studied cross-linguistically (27) and are frequently compared to emotion concepts (4), making them an appropriate sample of comparison concepts. In the full sample of comparisons, color concepts had a mean ARI of 0.35 (SD = 0.17), significantly higher than the full sample of emotion concept comparisons, t(390) = 18.51, p < 0.001. In the permutation-robust sample of comparisons, color concepts had a mean ARI of 0.41 (SD = 0.15), again showing more universality than the permutation-robust sample of emotion concept comparisons, t(158) = 11.44, p < 0.001 (Fig. 2). This difference also replicated when equating the number of color and emotion concepts, t(334) = 15.52, p < 0.001 (see materials and methods for more details). Emotion semantics thus vary widely across language families, and their variation is significantly greater than variation in color semantics.

Fig. 2 The distributions of all pairwise language family ARI values for emotion concepts (in orange) and color concepts (in light blue), and the distributions of permutation-robust ARI values for emotion concepts (in red) and color concepts (in dark blue).

Emotion concepts had significantly lower ARI values than color concepts, showing more semantic variability.

Our next analysis investigated whether geographic proximity predicted the pattern of variation in emotion semantics across language families. We tested this hypothesis by correlating the geographic proximity of language families (via the latitude and longitude coordinates of their languages) with their pairwise ARI values. As predicted, language families with higher pairwise ARI values were in closer geographic proximity, both in the full sample of our ARI comparisons, r(188) = −0.26, p < 0.001, and in the smaller permutation-robust sample, r(55) = −0.29, p = 0.03 (Fig. 3). These associations suggest that emotion semantics do not vary randomly; their variation is tied to the cultural evolutionary relationship between language families.

Fig. 3 The relationship between geographic proximity and pairwise ARI values.

Point size illustrates the number of languages in a comparison. In the key, the nodes denoting point size are not colored because they apply to both red and orange points. The red points display the permutation-robust ARI values (r = −0.29), and the orange points display the remaining ARI values (r = −0.26). The regression line is fitted to all cases, and the shading represents standard error.

Finally, we tested whether any psychophysiological dimensions could predict the semantic structure of emotion across language families. We examined the explanatory power of six dimensions (valence, activation, dominance, certainty, approach-avoidance, and sociality) by testing whether they predicted the community membership of emotion concepts across colexification networks. Using ratings of 200 online participants (90 female, 110 male; Mage = 34.11, SDage = 10.52), we first classified our emotion concepts on these dimensions using a 1-10 Likert-type scale. We also classified a set of five “neutral” concepts (ordinary, nondescript, indifferent, neutral, and impartial). Using a multilevel structural equation model in which participants’ ratings of emotion concepts on these dimensions predicted the community membership of emotion concepts, we were then able to test how well each dimension differentiated emotion communities from our set of neutral words. If a dimension was highly predictive, the model’s Akaike information criteria (AIC) fit would show a large decrement when the dimension was removed from the model. By contrast, removing nonpredictive dimensions would have less of an impact on the model’s AIC fit. We ran this analysis for all language families except the Nuclear Macro-Je, for which models did not converge because only a single community contained multiple emotion concepts.

The results of this leave-one-out analysis revealed higher predictive power for valence and activation than for other dimensions (Fig. 4). Valence was the most predictive dimension, with the highest AIC fit decrements (MAIC = 323.50) for the all-family network and for 13 of the 19 language families in our analysis. Activation was the most predictive dimension for the remaining six language families (MAIC = 208.76). Approach (MAIC = 35.82), certainty (MAIC = 30.26), dominance (MAIC = 26.18), and sociality (MAIC = 7.41) had far less predictive power than valence and activation, and comparing the distributions of fit decrements across language families revealed that both valence (ps < 0.001) and activation (ps < 0.001) had significantly higher decrements (i.e., explained more variance) than these other dimensions, and that valence had a higher average fit decrement than activation, t(19) = 2.70, p = 0.01. These findings suggest that languages around the world primarily differentiate emotions on the basis of valence and activation (see materials and methods for further analyses and discussion).

Fig. 4 Results from a leave-one-out analysis examining relative decrements in model fit following the removal of each dimension.

The top panel represents the AIC fit decrements associated with removing dimensions from a predictive model of emotion community membership. Higher decrements indicate that the dimension was more predictive. The bottom panel shows the distribution of AIC fit decrements for each dimension. Valence and activation had significantly higher average decrements than other dimensions.

Our findings reveal wide variation in emotion semantics across 20 of the world’s language families. Emotion concepts had different patterns of association in different language families. For example, “anxiety” was closely related to “fear” among Tai-Kadai languages, but was more related to “grief” and “regret” amongst Austroasiatic languages. By contrast, “anger” was related to “envy” among Nakh-Daghestanian languages, but was more related to “hate,” “bad,” and “proud” among Austronesian languages. We interpret these findings to mean that emotion words vary in meaning across languages, even if they are often equated in translation dictionaries. The supplementary materials contain an extended discussion of why other technical and sampling artifacts are unlikely to account for the variation that we observed in emotion semantics.

Geography partly explained variation in emotion semantics, such that geographically closer language families tended to colexify emotion concepts in more similar ways than distant language families. Geographically proximal societies often have more opportunities for contact through trade, conquest, and migration and share more recent common ancestry than distant groups (11). This suggests that historical patterns of contact and common ancestry may have shaped cross-cultural variation in how people conceptualize emotions. We encourage future research to examine the specific vertical and horizontal transmission processes that give rise to geographic variation in emotion semantics.

Despite this variation, we find evidence for a common underlying structure in the meaning of emotion concepts across languages. Valence and physiological activation—which are linked to neurophysiological systems that maintain homeostasis (9)—served as universal constraints to variability in emotion semantics. Positively and negatively valenced emotions seldom belonged to the same colexification communities, although there were notable exceptions to this pattern. For example, some Austronesian languages colexified the concepts of “pity” and “love,” which implies that these languages may conceptualize “pity” as a more positive (or “love” as a more negative) concept than other languages. The ability of valence and activation to consistently predict structure in emotion semantics across language families suggests that these are common psychophysiological dimensions shared by all humans.

Questions about the meaning of human emotions are age-old, and debate about the nature of emotion persists in scientific literature. The colexification approach that we take here provides a new method and a set of metrics to answer these questions by creating vast networks of how people use words to name experiences. Analyzing these networks sheds light on the cultural and biological evolutionary mechanisms underlying how emotions are ascribed meaning in languages around the world. Although debates about the relationship between language and conscious experience are notoriously difficult to resolve (28), our findings also raise the intriguing possibility that emotion experiences vary systematically across cultural groups. More broadly, our study shows the value of combining large comparative linguistic databases with quantitative network methods. Analyzing the diverse ways that people use language promises to yield insights into human cognition on an unprecedented scale.

Supplementary Materials

Materials and Methods

Supplementary Text

Figs. S1 to S5

Tables S1 to S6

References (2962)

References and Notes

Acknowledgments: We acknowledge the many linguists who provided the word lists necessary to detect and analyze colexifications across languages. We also acknowledge the feedback of our editor and six anonymous reviewers; K. Gray, K. Payne, E. McCormick, J. Leshin, and N. Caluori; and the research assistance of R. Drabble, I. Khismatova, and A. Veeragandham. Funding: This study was supported by a National Science Foundation Graduate Research Fellowship and a Thomas S. and Caroline H. Royster Fellowship to J.C.J.. The compilation of the CLICS data and software used in this study was funded by the Max Planck Society (as part of the CLLD project,, the Max Planck Institute for the Science of Human History and the Royal Society of New Zealand (Marsden Fund grant 13-UOA-121 and GlottoBank project,, the DFG research fellowship grant 261553824 and the ERC Starting Grant 715618 (both awarded to J.M.L.), and the ARC’s Discovery Project DE 120101954 and the ARC Center of Excellence CE140100041 (both awarded to S.J.G.). J.W. is supported by funds from the Templeton Religious Trust (TRT0153). No funding agency was involved in the conceptualization, design, data collection, analysis, decision to publish, or preparation of this manuscript, and the views expressed in this manuscript do not necessarily reflect the views of our funding agencies. Author contributions: J.C.J., J.W., and K.L. conceptualized and designed the study. J.C.J., J.W., T.H., J.M.L., and P.J.M. acquired and analyzed the data. J.M.L., R.F., and S.J.G. contributed software and data used in our analyses. J.C.J., J.W., T.H., P.J.M., and K.L. interpreted the analysis. J.C.J., J.W., K.L., J.M.L., and R.D.G. wrote the manuscript. All authors approved the submitted manuscript. Competing interests: The authors have no competing interests to declare. Data and materials availability: All data, scripts, and materials are available at

Stay Connected to Science


Navigate This Article