Coordinated Punishment of Defectors Sustains Cooperation and Can Proliferate When Rare

See allHide authors and affiliations

Science  30 Apr 2010:
Vol. 328, Issue 5978, pp. 617-620
DOI: 10.1126/science.1183665


Because mutually beneficial cooperation may unravel unless most members of a group contribute, people often gang up on free-riders, punishing them when this is cost-effective in sustaining cooperation. In contrast, current models of the evolution of cooperation assume that punishment is uncoordinated and unconditional. These models have difficulty explaining the evolutionary emergence of punishment because rare unconditional punishers bear substantial costs and hence are eliminated. Moreover, in human behavioral experiments in which punishment is uncoordinated, the sum of costs to punishers and their targets often exceeds the benefits of the increased cooperation that results from the punishment of free-riders. As a result, cooperation sustained by punishment may actually reduce the average payoffs of group members in comparison with groups in which punishment of free-riders is not an option. Here, we present a model of coordinated punishment that is calibrated for ancestral human conditions and captures a further aspect of reality missing from both models and experiments: The total cost of punishing a free-rider declines as the number of punishers increases. We show that punishment can proliferate when rare, and when it does, it enhances group-average payoffs.

Humans are a uniquely cooperative species. In even the simplest societies, people cooperate in large groups of genealogically distant individuals (13). In the laboratory, subjects routinely cooperate in situations in which selfish agents would free-ride on the cooperation of others (4, 5). Recent theoretical studies provide an evolutionary explanation for such cooperative behavior: Punishment reduces gain to free-riding, so groups with more punishers can sustain more cooperation (69). Punishment is costly, but unlike unconditional altruism its costs are greatly reduced when punishers are common because punishment then occurs at very low frequency, is effective, and its costs can be shared. As a result, a modest advantage of groups in which cooperation is sustained by the presence of punishers is sufficient to compensate them for the cost of punishment.

There are two important problems with this explanation of human cooperation. First, punishment can reduce the average payoffs of group members because the costs of punishment may exceed the gains from cooperation (5). This problem is exacerbated when punishers target cooperative group members, as sometimes occurs in experiments (1012). Second, the initial emergence of punishment remains a puzzle. In order to survive, punishers must engage in enough punishment of defectors so that the induced cooperation more than offsets the cost of punishing. Rare punishers do not have the benefit of outnumbering their targets, so the cost of punishing a free-rider is substantial. Moreover, they usually bear this cost alone rather than sharing it with other punishers (1316).

These problems are an artifact of the unrealistic way that punishment is implemented in existing models and in most experiments. In these models, punishment is an unconditional and uncoordinated individual action automatically triggered by defection. Similarly and with few exceptions (17), in experiments individuals cannot coordinate their punishment. In contrast, ethnographic evidence indicates that punishment is coordinated by means of gossip and other communication among punishers, is contingent on the expected effectiveness of punishment in inducing cooperation, and is not undertaken unless it is judged as legitimate by most group members (1820). When it occurs, punishment is usually collective and conveys a message of peer condemnation. Consistent with the anthropological evidence, in behavioral experiments with communication or with the option of coordinating behavior punishment is often highly effective in raising group average payoffs (21).

We analyzed a model of the evolution of punishment that incorporates two empirically based features absent from previous work. First, punishment is coordinated among group members so that it is contingent on the number of others predisposed to participate in the punishment. This means that when individuals willing to punish are rare, they demur and so bear only the cost of signaling their willingness to punish. They thus avoid the cost of punishing when it does not pay. Second, consistent with the “strength in numbers” and “divide and rule” maxims punishment is characterized by increasing returns to scale, so the total cost of punishing a single free-rider declines as the number of punishers increases. Adding these two features resolves the problems with previous models. Our model shows that for levels of relatedness consistent with recent genetic data from hunter-gather populations (22), punishment can proliferate when rare, and when it is common it increases group-average fitness.

In our model, a large population of individuals interact repeatedly in groups of size n. Groups are randomly formed, so there is no genetic assortment. Later, we will introduce an empirically plausible degree of genetic assortment. The model is fully described in (23). After the formation of a group, there is an initial period of an interaction that has three stages. First is a signaling stage, in which individuals can signal their intent to punish defectors. The cost of signaling, q, is high enough so that it does not pay to signal and then fail to punish. There follows a cooperation stage, during which individuals can choose to cooperate or defect. Cooperation costs the cooperator c and benefits each member of the group b/n (b > c > b/n). Lastly, there is a punishment stage in which punishers can coordinate to inflict a cost p on the target at an expected cost to each punisher of k/npa, where np is the number of punishers. Given that a greatly outnumbered target is unlikely to inflict costs on any of the punishers, it is plausible that a > 1: There are increasing returns to scale, so the punishers’ total cost of a punishment episode decreases as the number of punishers increases. During subsequent periods, there are only cooperation and punishment stages. The interaction continues to another period with probability (1 − 1/T), so T is the expected number of periods until the group disbands and new groups are drawn from the population.

Population structures like this one, in which groups do not persist but are created anew for each interaction by drawing individuals from a larger population (2426), are useful because they provide an analytically tractable approximation to more realistic structures. In the first interaction of such models, individuals have no common history (as they would if we modeled persistent groups) and hence cannot know anything about strategies of other group members. To address this information problem, we introduced a first “information gathering” period in which individuals know nothing about their group mates. This extreme assumption exaggerates the costs of signaling and establishing whether a quorum for punishment exists, but it captures an important fact: Even in the more realistic setting of persistent groups, individuals change, die, or leave the group and are replaced by migrants or offspring. This means that actors must deal with situations in which the past behavior of some group members is unknown, which is analogous to the first period in the present model. We believe that the present model represents a worst case for the evolution of punishment because it maximizes the level of uncertainty about the strategies of others.

Individuals have one of two heritable strategies: “punisher” and “nonpunisher.” Cooperation and free-riding are not inherited strategies. Rather, they are choices that individuals make in light of the incentives provided by the prospect of punishment. During the first interaction, punishers signal that they are willing to punish. Next, if at least τ (0 ≤ τ ≤ n – 1) other group members signal, punishers cooperate with probability 1 – e and defect with probability e and then punish any individual who did not cooperate. We refer to punishers with a threshold of τ as “τ-punishers.” If fewer than τ other individuals signaled during the first stage, punishers defect and do not punish. Nonpunishers do not signal, defect, and do not punish, and as a result are punished if there are at least τ + 1 punishers in the group. During subsequent periods, both types cooperate with probability 1 – e and defect with probability e if defectors were punished the last time a defection occurred. Punishers punish defectors if at least τ other individuals punished the last time a defection occurred. The cost of being punished to the target, p, is greater than the net cost of cooperating, cb/n, so on average cooperation is the payoff maximizing action if punishment is anticipated. A fraction e individuals nonetheless defects, either due to error or because cooperation is more costly for some individuals and so it does pay for them to cooperate, even if they expect to be punished. Nonpunishers are a plausible ancestral state for the evolution of punishment. They do not cooperate or punish, nor do they respond to unverified threats. However, once they have been punished they cooperate in subsequent periods in order to avoid more punishment.

It has been argued that punishment can evolve only when it is linked to cooperation (27). After the first period, punishers and nonpunishers cooperate under exactly the same conditions: the presence of sufficiently many punishers in the group so that free-riding does not pay, so that the linkage between cooperation and punishment is very weak. In (23), we show that even this weak linkage is not necessary for the evolution of punishment.

After the social interaction just described, individuals reproduce at a rate that is proportional to their payoff as compared with the population-average payoff leading to the equations (23) that describe how natural selection changes the frequencies of the two types through time.

In the absence of genetic assortment, there are two long-run evolutionary outcomes (Fig. 1). First, a population of all nonpunishers is evolutionarily stable as long as solitary punishers do not punish (τ > 0). When punishers are rare in the population, they will most often be alone in a group. Thus, they pay the cost of signaling but do not reap the benefits of cooperation, and as a result will have lower fitness than nonpunishers. Punishers who are willing to punish alone (τ = 0) cannot invade a population of all nonpunishers unless the benefits from cooperation are so large that a single punisher can recoup the costs of signaling and punishing everyone else in the group. Here, we assume that this “Lone Ranger” condition is not satisfied so that only punishment by two or more punishers pays.

Fig. 1

Equilibrium frequencies of punishers with a threshold frequency of τ when group members are unrelated for two values of b. For each value of τ, the solid circles give locally stable equilibrium frequencies of the punishing type, and the open circles give interior unstable equilibrium frequencies. (A) b = 2c. For τ < 3, the only stable equilibrium is a population without punishers. For larger thresholds, there are two stable equilibrium frequencies, zero and a stable interior equilibrium at which punishers and nonpunishers coexist. The arrows indicate the effect of natural selection at points above and below the solid and open circles. In these cases, the unstable equilibria mark the frequency that punishers must achieve before they are favored by selection. (B) b = 4c. Now, there are two equilibria for all values τ > 0. Benchmark parameters are c = 0.01, q = k = p = 1.5c, r = 0, a = 2, e = 0.1, n = 18, and T = 25. The parameter r is the genetic relatedness among group members.

Mixtures of punishers and nonpunishers can also be evolutionarily stable. Punishers have an advantage over nonpunishers only in groups in which there are exactly τ + 1 punishers because in such “threshold groups,” each punisher is necessary to sustain punishment and therefore cooperation. In groups with fewer than τ + 1 punishers, punishers pay the cost of signaling, but because they do not punish they (like all group members) enjoy no cooperative benefits. In groups with more than the critical number of punishers, a punisher who switched to nonpunishing would enjoy the same payoff from cooperation as other group members without paying the costs of signaling and punishment. This means that selection cannot favor τ-punishers unless they are in groups in which there are exactly τ + 1 punishers and the benefits from cooperation are enough to compensate punishers for the costs of signaling and punishment. Moreover, the advantage enjoyed by punishers in these critical groups must be large enough to offset the payoff disadvantages suffered by punishers in groups with fewer or more than the critical numbers of punishers.

The existence of a stable mixture of punishers and nonpunishers depends on the value of the punishment threshold, τ. When the threshold is too low, punishment does not pay even at the threshold, and nonpunishment is the only evolutionarily stable strategy. At higher thresholds, punishment does pay in threshold groups, and this means that punishment may be favored if such groups are sufficiently common. Thus, as the frequency of punishers in the metapopulation increases from zero, the fraction of groups with the threshold number of punishers increases, and so does the expected fitness of punishers (Fig. 2). Once the fraction of threshold groups is high enough, the punishers’ advantage in these groups offsets their disadvantage in all other groups. Then, natural selection will increase the frequency of punishers. This marks the unstable equilibria (open circles) shown in Fig. 1 and the leftmost zero intercept on the horizontal axis for each of the functions in Fig. 2.

Fig. 2

The difference in fitness of punishers (Wp) and nonpunishers (Wn) as a function of the frequency of punishers. When this difference is positive, punishers increase in frequency, and when it is negative punishers decrease in frequency. Equilibria occur when this difference is zero (evolutionarily stable when the function intersects the horizontal axis from above and unstable otherwise). When τ = 1, punishment at the threshold does not pay for any frequency of punishers, and thus increasing the frequency of punishers from zero decreases their relative fitness. For larger values of τ, punishment at the threshold does pay, and thus increasing the frequency of punishers increases their fitness. This leads to a stable polymorphic equilibrium at which punishers and nonpunishers coexist. b = 2c; other parameters are as in Fig. 1A.

Further increases in the metapopulation frequency of punishers eventually decrease the fraction of threshold groups. When, as a result, the fitness of punishers and nonpunishers is equalized, there is a stable polymorphic equilibrium (Fig. 1, solid circles, and Fig. 2, rightmost horizontal-axis intersection). As τ increases, the frequency of punishers at the polymorphic equilibrium also increases, but the minimum initial frequency of punishers required for selection to move a population to this equilibrium also increases, making it less accessible if punishers are initially rare.

At the stable polymorphic equilibrium, punishment is not altruistic: The punisher that switched to nonpunisher would experience no change in payoff. When groups are formed at random, averaged over all groups, the long-run benefits of punishment exactly compensate for the costs. However, it is mutually beneficial to the group (Fig. 3) in that populations with the equilibrium frequency of punishers have higher average fitness than populations without punishers. We show below that modest amounts of positive assortment in the formation of groups allow for the evolution of altruistic punishment.

Fig. 3

The difference in average fitness between the polymorphic equilibrium at which punishers are present and the monomorphic nonpunishing equilibrium. Whenever the polymorphic equilibrium exists, it has higher average fitness, but near maximum benefit differences occur for relatively low thresholds. Benchmark parameters are as in Fig. 1A.

The results presented so far depend critically on two parameters: the extent of economies of scale in punishment, a, and the cost punishers have to pay to signal their willingness to punish, q. Considering the first, were we to assume a = 1 (constant returns to scale) the total cost of punishing defectors would be independent of the number of punishers, and much higher frequencies of punishment would be required before punishment would become evolutionarily stable (23). This supports the intuition that increasing returns is crucial, and therefore the notion of coordinated punishment is important.

To determine the minimum cost of signaling, q, necessary to ensure that the signal is honest, we introduced a third strategy: “liar,” who may benefit by “turning on” the punishment process without paying the costs. During the first period, liars signal that they are punishers, incurring the signaling cost, and then cooperate so as to avoid punishment during the first period. However, they do not punish, and therefore avoid the associated costs. In subsequent periods, liars count the number of other group members that signaled in the first period and cooperate if the number of such signalers is greater than τ + 1. Because liars never punish, after the first period they behave like nonpunishers and so receive the nonpunisher payoff. At equilibrium, punishers and nonpunishers have the same fitness, and thus liars can invade if their expected payoff during the first period is greater than the expected payoff of nonpunishers during the first period. This leads to a minimum cost of signaling, given in (23). The value of q used in our calculations satisfies this condition for all results presented here.

Although punishment is evolutionarily stable in this model, so is nonpunishment. A complete account of the evolution of cooperation must explain how punishing strategies can increase when rare. In their classic work on pairwise reciprocity, Axelrod and Hamilton (24) showed that a small amount of nonrandom assortment, such as interaction between weakly related group members, destabilizes noncooperative equilibria but not cooperative equilibria. This principle holds in a wide range of pairwise cooperative interactions, but not in larger groups (1315).

To explore the effects of genetic assortment, we dropped our assumption that groups are formed at random and assumed that the relatedness within groups is r > 0, so that individuals are more likely to interact with individuals similar to themselves than expected by chance. Figure 4 shows the equilibrium behavior assuming that r = 0.07, which is a rough estimate of the average relatedness within human foraging groups (22). For low thresholds (τ ≤ 3), the only stable equilibrium is a mixture of punishers and nonpunishers, which means that punishers invade when rare. And because of the population structure (between-group genetic differences), punishment may also be altruistic at the polymorphic equilibrium.

Fig. 4

Equilibrium frequencies of punishers with a threshold frequency of τ with modest assortment (r = 0.07) and two values of b. As in Fig. 1, for each value of τ the solid circles give locally stable equilibrium frequencies of the punishing type, and the open circles give unstable equilibrium frequencies. (A) b = 2c. As in the case with no assortment, for large enough values of τ there are two equilibria, but punishers cannot invade and increase when rare. (B) b = 4c. Now for 0 < τ ≤ 3, rare punishers invade a population of nonpunishers, and the only stable equilibrium is a mixture of punishers and nonpunishers in which cooperation is sustained in most groups. For larger thresholds, there are two stable equilibrium frequencies, zero and a mixed strategy at which punishers and nonpunishers coexist. In these cases, the unstable equilibria (open circles) mark the frequency that punishers must achieve before they are favored by selection. Benchmark parameters are as in Fig. 1A, except r = 0.07.

This result persists when groups are much larger (n = 72) and for lower levels of relatedness if the benefit-cost ratio is somewhat higher (23). However, modest assortment does not allow punishment strategies with higher thresholds to invade populations of punishers with lower thresholds, so there is no evolutionary process in this model that would ratchet up the threshold levels. Thus, consistent with ethnographic observation the model predicts that only some individuals will engage in punishment. However, even when τ = 3—meaning that a minimum of four out of 18 individuals punish—groups achieve about two thirds of the maximum gains from cooperation attainable with higher thresholds (Fig. 3).

Unlike many models of the evolution of punishment, this one does not suffer from a “second-order free-rider” problem in which individuals who cooperate but do not punish out-compete the punishers. To see why, consider a new strategy: “contingent cooperators,” who cooperate during the first period if there are τ + 1 signaling individuals but do not punish. Contingent cooperators avoid punishment during the first period and otherwise behave like nonpunishers, and thus have higher fitness than nonpunishers. As a result, they invade the polymorphic punisher-nonpunisher equilibrium, replacing the nonpunishers. However, because they still respond to punishment, and punishment still benefits punishers, the population evolves to a stable equilibrium at which punishers and contingent cooperators coexist and that cannot be invaded by other second-order free-riding types. The frequency of punishers at this new equilibrium is approximately the same as in the original punisher-nonpunisher equilibrium (23).

In our model, the initial proliferation of punishment occurs under plausible levels of group genetic differences and results in persistent and high levels of cooperation. This result depends on the contingent nature of punishment and the existence of increasing returns to punishment. It differs from the model of Hauert et al. (28), in which the population cycles between periods of cooperation, defection, and opting-out of the interaction entirely, the latter strategy invading the all-defect phase of the cycle and subsequently being invaded by cooperators. Although their model applies to some forms of cooperation, the present model is a more realistic representation of the nature and dynamics of human cooperation (29, 30).

Supporting Online Material

Materials and Methods

Figs. S1 to S7


References and Notes

  1. Materials and methods are available as supporting material on Science Online.
  2. We thank the Behavioral Sciences Program of the Santa Fe Institute, the U.S. National Science Foundation, the European Science Foundation, and the University of Siena for research support. The authors declare no competing interests.
View Abstract

Navigate This Article