Research Article

Markets, Religion, Community Size, and the Evolution of Fairness and Punishment

See allHide authors and affiliations

Science  19 Mar 2010:
Vol. 327, Issue 5972, pp. 1480-1484
DOI: 10.1126/science.1182238

This article has a correction. Please see:


Large-scale societies in which strangers regularly engage in mutually beneficial transactions are puzzling. The evolutionary mechanisms associated with kinship and reciprocity, which underpin much of primate sociality, do not readily extend to large unrelated groups. Theory suggests that the evolution of such societies may have required norms and institutions that sustain fairness in ephemeral exchanges. If that is true, then engagement in larger-scale institutions, such as markets and world religions, should be associated with greater fairness, and larger communities should punish unfairness more. Using three behavioral experiments administered across 15 diverse populations, we show that market integration (measured as the percentage of purchased calories) positively covaries with fairness while community size positively covaries with punishment. Participation in a world religion is associated with fairness, although not across all measures. These results suggest that modern prosociality is not solely the product of an innate psychology, but also reflects norms and institutions that have emerged over the course of human history.

At the onset of the Holocene, the stabilization of global climates created possibilities for the emergence of larger-scale sedentary human societies (1). Over the next 10 millennia, the scale of some human communities dramatically expanded from kin-based foraging bands to complex, intensely cooperative societies in which strangers frequently engage in mutually beneficial transactions (2). Consistent with life in these large-scale societies, behavioral experiments performed with people from these populations reveal fair, trusting, and cooperative behavior among strangers, even in one-shot encounters (3).

Two major theoretical approaches have sought to explain both the relatively rapid expansion of human societal scales and the puzzlingly prosocial behavior observed in experiments. The first approach proposes that humans possess an innate social psychology calibrated to life in the small-scale societies of our Paleolithic ancestors. Rooted in the evolutionary logic of kinship and reciprocity, these heuristics were mistakenly extended to nonkin and ephemeral interactants, as societies expanded with the emergence of agriculture. From this view, the prosocial behavior observed in experiments directly reflects the operation of these ancient heuristics (4, 5).

An alternative approach proposes that a crucial ingredient in the rise of more-complex societies was the development of new social norms and informal institutions that are capable of domesticating our innate psychology for life in ever-expanding populations (6). Larger and more-complex societies prospered and spread to the degree that their norms and institutions effectively sustained successful interaction in ever-widening socioeconomic spheres, well beyond individuals’ local networks of kin and long-term relationships (7). It is these particular norms and their gradual internalization as proximate motivations (8) that recalibrate our innate psychology for life in small-scale societies in a manner that permits successful larger-scale cooperation and exchange in vast communities.

Much research suggests that norms arise because humans use evolved learning mechanisms to calibrate their behavior, motivations, and beliefs to variable circumstances (7, 9). Modeling work shows that when these learning mechanisms are applied to different kinds of social interactions, such as large-scale cooperation or ephemeral exchange, individually costly behaviors can be sustained by punishment, signaling, and reputational mechanisms (1013). By sustaining such behaviors, norms can facilitate trust, fairness, and cooperation in a diverse array of interactions, thereby allowing the most productive use of unevenly distributed skills, knowledge, and resources, as well as increasing cooperation in exchange, public goods, and warfare. More-effective norms and institutions can spread among societies by a variety of theoretically and empirically grounded mechanisms, including conquest and assimilation, preferential imitation of more-successful societies, or forward-looking decision making by leaders or high-status coalitions (14, 15).

Norms that enhance fairness among strangers are likely causally interconnected with the diffusion of several kinds of institutions. Here we focus on two: (i) the expansion of both the breadth and intensity of market exchange (16, 17), and (ii) the spread of world religions (18). The efficiency of market exchange involving infrequent or anonymous interactions improves with an increasingly shared set of motivations and expectations related to trust, fairness, and cooperation. This lowers transaction costs, raises the frequency of successful transactions, and increases long-term rewards (16). Although frequent and efficient exchanges among strangers are now commonplace, studies of nonhuman primates and small-scale societies suggest that during most of our evolutionary history, transactions beyond the local group, and certainly beyond the ethnolinguistic unit, were fraught with danger, mistrust, and exploitation (2, 16, 19). Thus, we propose that such “market norms” may have evolved as part of an overall process of societal evolution to sustain mutually beneficial exchanges in contexts where established social relationships (for example, kin, reciprocity, and status) were insufficient. If our theory is correct, then measures of fairness in situations lacking relationship information (for example, anonymous others) should positively covary with market integration.

Recent work has also tentatively proposed that certain religious institutions, beliefs, and rituals may have coevolved with the norms that support large-scale societies and broad exchange (18, 20, 21). Intersocietal competition may have favored those religious systems that galvanize prosocial behavior in broader communities, perhaps using both supernatural incentives (for example, hell) and recurrent rituals that intensify group solidarity (20, 22). Consistent with this view, analyses of ethnographic data show that the emergence of moralizing religions increases with greater societal size and complexity (18, 23). Archaeologically, regular rituals and the construction of monumental religious architecture co-emerge with societal size and complexity (24). In experiments, unconsciously priming the faithful with religious concepts favors greater fairness toward anonymous others (21). This suggests that, in contrast to the religions that likely dominated our evolutionary history, modern world religions such as Christianity and Islam may be unusual in ways that buttress the norms and institutions that sustain larger-scale interaction [supporting online material (SOM) text]. If this theory is correct, then greater fairness toward anonymous others should be associated with adherence to a world religion.

Evolutionary approaches to norms also afford a prediction about the willingness of individuals across populations to engage in the costly punishment of norm violations. Theoretical modeling reveals at least two different kinds of norm-stabilizing mechanisms: One involves reputational effects in which norm violators are sanctioned in another interaction by, for example, not receiving aid in a dyadic helping situation (10), and a second involves the use of diffuse (25) costly punishment (12). Because the effectiveness of reputational systems in sustaining norms degrades rapidly as communities expand (26, 27), fairness in larger communities must increasingly be maintained by diffuse punishment; that is, larger communities should punish more.

Experiments. The evidence presented below derives from a second round of cross-cultural experiments that were designed to illuminate findings from our first project (28, 29). This round replaces 10 populations from our earlier effort with 10 new ones (swapping several researchers as well), while resampling from five of the same populations used previously (Table 1). This analysis complements prior work from our team, which focused on the distributional patterns of punishment (30), by seeking to explain the variation in our experimental measures of both fairness and punishment. Our analysis converges with recent work comparing diverse industrialized populations (31).

Table 1

Summary information for the populations studied. The column Economic Base classifies the production systems (for example, horticulturalists rely primarily on slash-and-burn agriculture; pastoralists rely on herding). Residence classifies the nature and frequency of each population’s movements. Mean MI is the average percentage of total household calories that are purchased in the market. Mean WR gives the percentage of the sample that reported adhering to a world religion. Mean CS gives the average CS for the populations studied. These CSs are for villages, except among the Hadza (who live in camps) and Accra. Com Sam (DG/UG/TPG) gives the number of different communities from which participants were drawn for the DG, UG, and TPG. N gives the number of pairs or trios for each experiment (table S4).

View this table:

If markets and world religions are linked to the norms that sustain exchange in large-scale societies, we expect that experimental measures of fairness in anonymous interactions will positively covary with measures of involvement in these two institutions. To test this, we used three experiments that were designed to measure individuals’ propensities for fairness and their willingness to punish unfairness across 15 populations that vary in their degree of market integration and their participation in world religions. Our three experiments are the Dictator, Ultimatum, and Third-Party Punishment Games (32).

In the Dictator Game (DG), two anonymous players are allotted a sum of money (the stake) in a one-shot interaction (3). Player 1 must decide how to divide this sum between himself or herself and Player 2. Player 2 receives the allocation (offer), and the game ends. Player 1’s offer to Player 2 provides a measure of Player 1’s behavioral fairness in this context.

In the Ultimatum Game (UG), two anonymous players are again allotted a sum in a one-shot interaction (3). Player 1 can offer a portion of this to Player 2. Player 2, before hearing the actual offer from Player 1, must decide whether to accept or reject each of the possible offers (in 10% increments). Decisions are binding. If Player 2 specified that he or she would accept the amount of the actual offer, then Player 2 receives the offered amount and Player 1 gets the remainder. If Player 2 specified that he or she would reject the offered amount, then both players receive zero. If people are motivated purely by money maximization, Player 2s will always accept any positive offer; realizing this, Player 1 should offer the smallest nonzero amount. Because this is a one-shot anonymous interaction, Player 2’s willingness to reject provides a measure of punishment. Player 1’s offer measures a combination of social motivations and an assessment of the likelihood of rejection; this gives us a second behavioral measure of fairness.

In the Third-Party Punishment Game (TPG), two players are again allotted a stake, but now a third player also receives the equivalent of one-half of the stake (33). Player 1 must decide how much to allocate to Player 2, who has no choices. Player 3, before hearing the actual amount that Player 1 allocated to Player 2, has to decide whether to pay 20% of his or her allocation to punish Player 1 for each of the possible offers (in 10% increments). If punished, Player 1 loses triple the amount paid by Player 3. Suppose the stake is $100; if Player 1 offers $10 to Player 2 (keeping $90), and Player 3 wants to punish this offer amount, then Player 1 takes home $60 ($90 – $30), Player 2 gets $10, and Player 3 gets $40 ($50 – $10). If Player 3 had instead decided not to punish offers of $10, then the take-home amounts would be $90, $10, and $50, respectively. Because a money-maximizing Player 3 would never pay to punish, a similarly motivated Player 1 should always give zero to Player 2. Thus, Player 3’s willingness-to-pay provides another measure of punishment. Player 1’s offer measures a mixture of his or her social motivations in this context and an assessment of the punishment threat; this provides a third behavioral measure of fairness.

These experiments, with their salient contextual cues of cash and anonymity, seem well suited to tapping the particular norms that support cooperation and exchange among ephemeral interactants in market transactions. Cash is closely associated with market transactions (34) and often signals a desire to avoid a longer-term nonmarket relationship (35). Anonymity in our games means that players lack the cues or information necessary to apply the expectations and motivations associated with other kinds of relationships, such as those based on kinship, reciprocity, or status differences, forcing players to default to local norms for dealing with people outside durable relationships. However, an important concern in interpreting such experiments involves the degree to which participants accept the experimental situation, believe the anonymity, and worry about the experimenter’s judgments of them (SOM text).

Our standardized protocols and methods included the following: (i) random adult samples from our communities (with little attrition in most sites), (ii) stakes set at 1 day’s local wage, (iii) a show-up fee of 20% of the stake, (iv) back-translation of scripts, (v) one-on-one instruction and pregame testing for comprehension, (vi) steps to preclude collusion and contamination, and (vii) no deception (32). Our pregame tests combined with analyses of the relationships between experimental decisions and measures of comprehension, indicate no measurable impact of differences in comprehension on behavior (30).

We performed these experiments with 2148 individuals across populations from Africa, North and South America, Oceania, New Guinea, and Asia that included small-scale societies of hunter-gatherers, marine foragers, pastoralists, horticulturalists, and wage laborers. Table 1 provides the location (mapped in fig. S3), environment, economic base, residence pattern, and sampling information for each population, as well as averages for three key variables.

Results. The theory outlined above predicts a positive relationship between our three measures of fairness—offers in each game—with market integration and adherence to a world religion. Individuals’ offers are measured as a percentage of the total stake (1 day’s local wage). Market Integration (MI) is measured at the household level by calculating the percentage of a household’s total calories that were purchased from the market, as opposed to home-grown, hunted, or fished, and then averaged to obtain a community-level measure. We use the community average for MI both to remain consistent with our definition of norms (as local equilibria) and to remove day-to-day stochastic variation (32). Table 1 shows that the population means for MI range from 0 to 100%, with a mean of 57.3%. World Religion (WR) was assessed by asking participants what religion they practiced, and coding these as a binary variable, with “1” indicating Islam or Christianity, and “0” marking the practice of a tribal religion or “no religion.” Table 1 provides the percentage of each population that adheres to a world religion. The mean value of WR is 89%.

To analyze these data, we regressed offers on MI and WR, as well as seven control variables: age, sex, education, income, wealth, household size, and community size (CS). Except for wealth and household size, which are both measured at the household level, and CS, these control variables are individual-level measures. We used 30 (DG), 26 (UG), and 16 (TPG) community-level means for MI to predict 416 (DG), 398 (UG), and 272 (TPG) individual offers with a minimal set of controls, or 336 (DG), 319 (UG), and 265 (TPG) individual offers with the full set of controls.

Table 2 shows four regression models using MI and WR to predict all offers together and offers from the DG, UG, and TPG, separately. Independent of the other sociodemographic variables, the coefficients on MI are large and significant at conventional levels across all four models. A 20–percentage point increase in MI is associated with an increase in percentage offered ranging from roughly 2 to 3.4. The same qualitative results are obtained for MI whether one uses household measures, community averages, or population averages.

Table 2

Linear regression models for offers. These ordinary least-squares models include four additional control variables (sex, age, community size, and education). Coefficients are followed by standard errors, indicated with ±; P values are given in parentheses.

View this table:

The coefficients on WR are also large across all offers and in both the UG and DG, though not well estimated in the DG (P < 0.10). However, WR’s coefficients are significant (P < 0.05) across all other specifications (tables S5, S8, and S11). Participating in a world religion is associated with an increase in percentage offered of between about 6 and 10 (36).

Taken together, these data indicate that going from a fully subsistence-based society (MI = 0) with a local religion to a fully market-incorporated society (MI = 100%) with a world religion predicts an increase in percentage offered of roughly 23, 20, and 11 in the DG, UG, and TPG, respectively. This spans most of the range of variation across our populations: DG means range from 26 to 47%, UG from 25 to 51%, and TPG from 20 to 43%.

For the seven other socioeconomic variables, none of their coefficients is significant (P < 0.05) across all offers or in the DG (tables S5 and S8). For UG offers, the coefficient on age is also positive (P < 0.05, table S11). For TPG offers, Table 2 shows that the coefficients on income, wealth, and household size are significant (P < 0.05, table S14).

Figure 1 plots mean offers versus mean MI values for our 15 populations. Population mean MI values account for 52% of the variation in mean DG offers [correlation ρ = 0.72, 95% bootstrap confidence interval (CI) = 0.4 to 1.0 , P < 0.01, n = 15 populations]. In designing this second round of experiments, we sought out an additional New Guinea population (Sursurunga), because in the first round, the Au of New Guinea revealed highly unusual behavior, including relatively high offers with little market integration. The Au pattern replicated, and now extends to the Sursurunga. However, because we targeted a second population that skews our world sample unrepresentatively toward New Guinea, we also examined this relationship with either the Au or the Sursurunga dropped. Dropping the Sursurunga leads to mean MI accounting for 58% of the variation (ρ = 0.76, 95% bootstrap CI = 0.44 to 0.95, P < 0.001). Dropping the Au instead leads to mean MI capturing 71% of the variation (ρ = 0.84, 95% bootstrap CI = 0.59 to 0.96, P < 0.001).

Fig. 1

Mean DG offers for each population plotted against mean value of MI. Error bars are bootstrapped standard errors (bias corrected and accelerated) on the population mean.

On the punishment side, we looked for a relationship between our individual-level measures of punishment, from the UG and TPG, and our theoretically important variable, CS, while controlling for our other eight variables. For both the UG and TPG, we reduced the vector of “punish or not punish” responses to a single number called a minimum acceptable offer (MAO). This is the lowest offer (≤50%) for which an individual no longer punishes. Thus, if a player accepts all offers, his or her MAO is zero. If the player punishes offers of 0 and 10% but accepts all higher offers, his or her MAO is set at 20. MAO measures an individual’s willingness to punish low offers (32).

To analyze MAOs, we used an ordered logistic regression (OLR) because MAOs are both discrete and bimodally distributed. This model assumes that the dependent variable is discrete and rank ordered, but that the distance between ranks is not meaningful. Because the output of an OLR is nonintuitive (tables S20A and S23A), we have captured the effects of the highly significant (P < 0.001) coefficients for CS in Fig. 2, A and B. Holding the other eight variables at their mean values, the figure shows how the distribution of MAOs for the TPG and UG shift with increases in CS. Small communities, the size of foraging bands, are the least willing to punish. As CS increases from 50 to 5000, there is a dramatic shift from a modal MAO of 0 to 50. Our communities’ sizes range from 20 to 4600 people.

Fig. 2

Effect of CS on MAOs. Each set of bars shows the distribution of MAOs, ranging discretely from 0 to 50, for different CSs. The coefficients used to create the plot are for the variable CS in an ordered logistic regression containing all eight of our other variables. All variables except CS are held constant at their mean values. The coefficients on CS represented graphically here are 0.052 ± 0.0085 (robust standard error, P < 0.001, n = 227) for the TPG and 0.087 ± 0.011 (robust standard error, P < 0.001, n = 297) for the UG. (A) TPG (see table S20A for full regression). (B) UG (see table S23A).

Theoretical arguments suggest that punishment (MAO) should be related more directly to the natural logarithm of CS, because the effectiveness of reputational systems decays in rough proportion to this variable (26, 37). Consistent with this, the natural logarithm of CS is a better predictor of MAO in both experiments than CS itself (tables S20B to S25B), and this amplifies the effects illustrated in Fig. 2 (fig. S4).

Our theoretical approach makes no predictions about the relationship between punishment and MI, because, for small communities, there are numerous reputational mechanisms (not involving costly punishment) that can sustain equality norms. Similarly, because religion should galvanize the existing mechanisms for norm stabilization, whether participants in a world religion punish more depends entirely on the local stabilization mechanisms, which themselves depend on CS (38). Nevertheless, for MAOs in the TPG, we find that WR is associated with significantly (P < 0.05) more punishment, although MI reveals no such relationship. None of the other predictors is significant (P < 0.05, table S20A). For MAO in the UG (table S23A), income, wealth, and MI all significantly predict lower MAOs (P < 0.05). The effect for MI, however, is contingent on having CS in the equation. If CS is dropped, the effect of MI becomes nonsignificant. Thus, unlike CS, none of these other effects is consistent for both MAO measures or across alternative specifications.

Our analyses for both offers and MAOs are robust to a variety of checks, including alternative model specifications and adjustments to our wealth and income variables to account for local differences in purchasing power (tables S6, S9, S12, S15, S21, and S24). We also included continental-level dummies (Africa, Oceania, South America, and Eurasia) to address concerns about shared cultural phylogenies, and we used clustered robust standard errors (clustering on site) to control for the potential nonindependence of individual observations within our sites (tables S5 to S25). The findings for MI and CS are robust to these checks. However, when continental controls are applied, the effects of WR on offers disappears because of the highly uneven distribution of populations containing individuals with WR = 0 (tables S5, S8, and S11, Model 1). We reran our models for Africa only and generally reconfirmed the above findings (32). Nevertheless, because of the rather small sample of individuals with WR = 0, any conclusions about the effect of WR must remain quite tentative (39).

Discussion. Our results on the relationship between MI and offers extend the findings of our previous project (29) in several ways, including using fresh samples, better measures, new experiments, and additional controls. Despite swapping in 10 new sites and using a different protocol, we have replicated our earlier UG findings at both the level of individual sites (in four sites) and in obtaining a positive relationship between UG offers and market integration with a better measure. This relationship is now demonstrated for DG and TPG offers, each of which reveals similarly large effects and is robust to the inclusion of a range of socioeconomic variables not previously measured and to a suite of statistical checks (tables S5 to S25).

These findings also delineate two additional lines of research. First, our inclusion of participation in a world religion converges with other recent findings (21) and tentatively supports the notion that religion may have coevolved with complex societies to facilitate larger-scale interactions. Second, our analyses (tables S18 and S19) open up the question of why, when a third-party punisher is added to a DG to create a TPG, do mean offers usually decrease, the predictive effects of WR disappear (Table 2), and economic variables emerge as potent correlates of offers (tables S14 to S16)?

On the punishment side, our analyses of MAOs are consistent with the idea that as reputational systems break down in larger populations, increasing levels of diffuse costly punishment are required to sustain large harmonious communities. This extends earlier bivariate analyses using population means in the TPG (40) and converges with ethnographic data, suggesting that large communities, lacking sufficient and effective punishing mechanisms, fragment into smaller groups as their community sizes reach about 300 people (41).

Although our primary interest here is behavioral fairness, which includes whatever combination of motivations and expectations yields more equal divisions, two features of these results suggest that they may stand when considered as measures of intrinsic (internalized) motivations. First, because no punishment opportunities exist in the DG, this experiment likely provides a purer measure of intrinsic motivations for equal offers. Second, we reran the above offer regressions for UG and TPG offers including each population’s mean MAO as an additional predictor variable. If individuals are well calibrated to the threat of punishment posed by their local communities, then adding this variable should (at least partially) remove the impact of any threat of punishment on offers. The inclusion of mean MAO as a predictor variable does not change the coefficients on MI or WR (table S17). Finally, anonymity concerns regarding our experiments and results are taken up in the SOM text, where we discuss our double-blind treatments and related analyses, which indicate that differences in perceived anonymity are unlikely to explain our major findings.

These findings indicate that people living in small communities lacking market integration or world religions—absences that likely characterized all societies until the Holocene—display relatively little concern with fairness or punishing unfairness in transactions involving strangers or anonymous others. This result challenges the hypothesis that successful social interaction in large-scale societies—and the corresponding experimental findings—arise directly from an evolved psychology that mistakenly applies kin and reciprocity-based heuristics to strangers in vast populations (4, 5), without any of the “psychological workarounds” (42) that are created by norms and institutions. Moreover, it is not clear how this hypothesis can explain why we find so much variation among populations in our experimental measures and why this variation is so strongly related to MI, WR, and CS. The mere fact that the largest and most anonymous communities engage in substantially greater punishment relative to the smallest-scale societies, who punish very little, challenges this interpretation.

Methodologically, our findings suggest caution in interpreting behavioral experiments from industrialized populations as providing direct insights into human nature. Combining our findings with work on the links between behavioral experiments and real life (31, 43, 44) suggests that such experiments elicit norms, or reflect institutions, that have evolved to facilitate interactions among individuals not engaged in durable relationships. Given this, much current work using behavioral games appears to be studying the interaction between a particular set of norms and our evolved psychology, not tapping this psychology directly.

Overall, these findings lend support to the idea that the evolution of societal complexity, especially as it has occurred over the last 10 millennia, involved the selective spread of those norms and institutions that best facilitated the successful exchange and interaction in socioeconomic spheres well beyond local networks of durable kin and reciprocity-based relationships. Although differences in environmental affordances probably had a profound impact on the emergence of complex societies across the globe (2), the rate-determining step in societal evolution may have involved the assembly of the norms and institutions that are capable of harnessing and extending our evolved social psychology to accommodate life in large, intensely cooperative communities. Recent experimental work among diverse industrialized populations suggests that the gradual honing of these norms and institutions continues in modern societies (31).

Supporting Online Material

Materials and Methods

SOM Text

Figs. S1 to S4

Tables S1 to S25


References and Notes

  1. Diffuse punishment implies that the responsibility for punishing is spread over a large segment of the population.
  2. Background, methods, and supplemental analyses are available as supporting material on Science Online.
  3. Cash has value in all our societies. Everyone wants steel, salt, and sugar, if nothing else.
  4. We found no robust differences among different forms of Christianity (Catholics versus Protestants), or between Christianity and Islam.
  5. Traditional religions may sometimes galvanize local norms (SOM text), but they do not galvanize “market norms,” which are for interacting with strangers or anonymous others. When religions do underwrite norms, they must operate in concert with the local norm-stabilizing mechanisms.
  6. We did not use fixed effects for communities or populations in any of our regressions, because our theoretical focus is on differences in social norms. Because norms are group-level properties (different locally stable equilibria), much of the relevant norm-related variation occurs between, not within, communities or populations (SOM text).
  7. We especially thank the communities that participated in our research. We also thank the National Science Foundation, the MacArthur Norms and Preferences Network, the Max Planck Institute for Social Anthropology, and the Russell Sage Foundation for funding this project. Thanks also to the many audiences and readers who contributed to improving our efforts.
View Abstract

Navigate This Article