Via Freedom to Coercion: The Emergence of Costly Punishment

See allHide authors and affiliations

Science  29 Jun 2007:
Vol. 316, Issue 5833, pp. 1905-1907
DOI: 10.1126/science.1141588


In human societies, cooperative behavior in joint enterprises is often enforced through institutions that impose sanctions on defectors. Many experiments on so-called public goods games have shown that in the absence of such institutions, individuals are willing to punish defectors, even at a cost to themselves. Theoretical models confirm that social norms prescribing the punishment of uncooperative behavior are stable—once established, they prevent dissident minorities from spreading. But how can such costly punishing behavior gain a foothold in the population? A surprisingly simple model shows that if individuals have the option to stand aside and abstain from the joint endeavor, this paves the way for the emergence and establishment of cooperative behavior based on the punishment of defectors. Paradoxically, the freedom to withdraw from the common enterprise leads to enforcement of social norms. Joint enterprises that are compulsory rather than voluntary are less likely to lead to cooperation.

An impressive body of evidence shows that many humans are willing to pay a personal cost in order to punish wrongdoers (18). In particular, punishment is an effective mechanism to ensure cooperation in public goods interactions (911). All human populations seem willing to use costly punishment to varying degrees, and their willingness to punish correlates with the propensity for altruistic contributions (12). This raises an evolutionary problem: In joint enterprises, free-riding individuals who do not contribute, but who exploit the efforts of others, fare better than those who pay the cost of contributing. If successful behavior spreads, for instance through imitation, these defectors will eventually take over. Punishment reduces the defectors' payoff, and thus may solve the social dilemma. However, because punishment is costly, it also reduces the punishers' payoff. This raises a “second-order social dilemma”: Costly punishment seems to be an altruistic act, given that individuals who contribute but do not punish are better off than the punishers. The emergence of costly punishing behavior is acknowledged to be a major puzzle in the evolution of cooperation. “We seem to have replaced the problem of explaining cooperation with that of explaining altruistic punishment” (13).

This puzzle can be solved in situations where individuals can decide whether to take part in the joint enterprise. We considered four strategies. The nonparticipants (individuals who, by default, do not join the public enterprise) rely on some activity whose payoff is independent of the other players' behavior. Those who participate include defectors, who do not contribute but exploit the contributions of the others; cooperators, who contribute but do not punish; and punishers, who not only contribute to the commonwealth but also punish the defectors. We showed that in such a model, punishers will invade and predominate. However, in the absence of the option to abstain from the joint enterprise, punishers are often unable to invade, and the population is dominated by defectors. This means that if participation in the joint enterprise is voluntary, cooperation-enforcing behavior emerges. If participation is obligatory, then the defectors are more likely to win.

This result was originally presented by Fowler (14), but he based his argument on a model that lacked an explicit microeconomical foundation. It assumes (i) that single cooperators can play the public goods game alone, which fails to recognize that contributing to a joint effort is a risky investment, the return of which depends on the behavior of other players, and (ii) that cooperators will be punished, even in the absence of defectors, which fails to recognize that the cooperators' unwillingness to punish cannot be observed in that case. Correcting for this leads to a dynamic that is structurally unstable for infinitely large populations and hence inconclusive (15). It is thus necessary to tackle the stochastic dynamics of finite populations.

We considered a well-mixed population of constant size M, the members of which live on a small but fixed income σ. In this situation, N individuals are randomly selected and offered the option to participate instead in a risky, but potentially profitable, public goods game. Those who participate can decide whether or not to contribute an investment at a cost c to themselves. All individual contributions are added up and multiplied with a factor r > 1. This amount is then divided equally among all participants of the public goods game. After this interaction, each contributor can impose a fine β upon each defector, at a personal cost γ for each fine. By x we denote the total number of cooperators, by y that of defectors, by z that of the nonparticipants, and by w the number of punishers. Thus, M = x + y + z + w.

Among the random sample of size N, there will be Nx cooperators, Ny defectors, Nz nonparticipants, and Nw punishers. These are random variables distributed according to a multivariate distribution which describes sampling without replacement. Each nonparticipant receives a constant payoff σ. The group of those willing to participate in the public goods game has size S = Nx + Ny + Nw. If S > 1, each participant of the public goods game obtains an income r(Nx + Nw)c/S. The payoff for the contributors (i.e., the cooperators and the punishers) is reduced by c. The payoff for the defectors is reduced by βNw, and the payoff for punishers by γNy. The social enterprise is risky in the sense that if all defect, the payoff is below that of the nonparticipants; it is promising in the sense that if all cooperate, the payoff is larger than that of the nonparticipants. This means that 0 < σ <(r – 1)c. This assumption offers players a nontrivial choice: to stick with a safe, self-sufficient income or to speculate on a joint effort whose outcome is uncertain because it depends on the decisions of others. (If S = 1, then the public goods game does not take place. In this case, a single player who volunteers for the joint effort receives the default payoff σ.)

We next specify how strategies propagate within the population. We only need to assume that players can imitate each other and are more likely to imitate those with a higher payoff. This can be done in various ways (16, 17). For simplicity, let us assume here that players can update their strategy from time to time by imitating a player chosen with a probability that is linearly increasing with that player's payoff. In addition, we shall assume that with a small probability μ, a player can switch to another strategy irrespective of its payoff (we refer to this as “mutation” without implying a genetic cause; it simply corresponds to blindly experimenting with the alternatives).

The analysis of the corresponding stochastic dynamics is greatly simplified in the limiting case μ → 0. The population consists almost always of one or two types at most. Indeed, for μ = 0, the four monomorphic states are absorbing: If all individuals use the same strategy, imitation will not introduce any change. For sufficiently small μ, the fate of a mutant (i.e., its elimination or fixation) is settled before the next mutant appears (18). This allows us to calculate the probability that the population is in the vicinity of a pure state (i.e., composed almost exclusively of one type) (17). Computer simulations show that the approximation also holds for larger mutation rates (on the order of 1/M).

The outcome is notable: In the limit of rare mutations, the system spends most of the time in the homogeneous state with punishers only, irrespective of the initial composition of the population. For large populations (M = 1000 can be considered large for most of our prehistory) and small mutation rates, the system spends most of the time in or near the punisher state (Figs. 1A and 2A; fig. S1). The outcome is robust with respect to changes in σ and r (fig. S1).

Fig. 1.

Punishment and abstaining in joint-effort games. (A) Simulations of finite populations consisting of four types of players show that after some initial oscillations, punishers usually dominate the population. In longer runs, their regime can occasionally break down as a result of cooperators invading by neutral drift, but after another series of oscillations punishers will emerge again. The transient oscillations generally display a rock-paper-scissors–like succession of cooperators, defectors, and nonparticipants. When nonparticipants are frequent, groups are small, and punishing therefore is less costly, so that punishers have a chance to invade. (B) If participation is compulsory (no nonparticipants), defectors take over in the long run, even if the population consisted initially of punishers. Parameter values are M = 100, N = 5, r = 3, σ = 1, γ = 0.3, β = 1, c = 1, and μ = 0.001.

Fig. 2.

Stationary probability distributions, transition probabilities, and fixation times can be computed analytically for sufficiently small mutation rates, if we assume that players update their strategies according to some specified rule. [In all figures, we use a Moran process with selection strength s = 0.249 (17) (SOM text).] The dynamics are reduced to transitions between homogeneous population states consisting entirely of cooperators (C), defectors (D), nonparticipants (N), or punishers (P). The transition probabilities ρ denote the probabilities that a single mutant takes over; the conditional fixation time t indicates the average number of periods required for a single mutant to reach fixation, provided that the mutant takes over. (A) Voluntary participation in the joint-effort game with punishment. Parameter values are N = 5, r = 3, σ = 1, γ = 0.3, β = 1, c = 1, and M = 100. (B) Compulsory participation in a joint-effort game with punishment, for the same parameter values.

The situation is very different in the traditional case of a public goods game where participation is compulsory. If only cooperators and defectors are present, defectors obviously win. Adding the punishers as a third strategy does not change the qualitative outcome: In the limit of rare mutations, the system spends most of the time in or near the state with defectors only. For the same parameter values as before, the state is time dominated by defectors, and there is hardly any economic benefit from the interaction (Figs. 1B and 2B; fig. S2).

Volunteering in the absence of punishment leads to a more cooperative outcome than for the obligatory game, but not to the fixation of the cooperative state (Fig. 3A). Instead, the system exhibits a strong tendency to cycle (from cooperation to defection to nonparticipation and back to cooperation), as a result of a rock-paper-scissors mechanism (1921). If there are many defectors, it does not pay to participate in the joint enterprise, but if most players refuse to participate, then the typical group size can become sufficiently small such that the social dilemma disappears: Cooperators earn on average more than defectors (and nonparticipants). However, this is a fleeting state only; cooperators spread quickly, group size increases, the social dilemma returns and the cycle continues.

Fig. 3.

Punishment is best directed at defectors only. (A) Same as in Fig. 2A, but without punishers. The three remaining strategies supersede each other in a rock-paper-scissors type of cycle. (B) Same as in Fig. 2A, but assuming that punishers equally punish the nonparticipants. This makes it more difficult for punishers to dominate.

The gist of the analysis for small mutation rates is captured in Fig. 2. The effect of substantial mutation rates can only be handled by numerical simulations (17, 22). In the absence of punishers, defectors do worst, whereas nonparticipants and cooperators perform comparably well. In the compulsory game, punishers do not prevail, except for large mutation rates, in which case mutational drift supplying defectors keeps the punishers active and prevents them from being undermined by cooperators. If all four types are admitted, punishers prevail.

This result remains unaffected if we assume that the punishers are also punishing the cooperators (who are not punishing defectors, and thus can be viewed as second-order defectors). It is well known that any norm that includes the rule to punish those who deviate is evolutionarily stable—once established, it cannot be displaced by an invading minority of dissidents (9). But how can such punishing behavior gain a foothold in the population? The trait has to be rare, initially, and thus will incur huge costs by ceaselessly punishing. To model this situation, it seems plausible to assume that for this second type of punishment, fines and costs are reduced by a factor α, with 0 ≤ α ≤ 1 (14). Thus the payoff for cooperators is reduced by αβNw, and that for punishers by αγNx, provided that Ny > 0 (if there are no defectors in the group, nonpunishing behavior will go unnoticed). As it turns out, whether cooperators who fail to punish are punished plays a surprisingly small role. The parameter α has little influence on the dynamics (17). The reason is that for small μ, the three types of punishers, cooperators, and defectors rarely coexist. Hence, punishers cannot hold cooperators accountable for not punishing defectors. Interestingly, experimental evidence for the punishment of nonpunishers (i.e., for nonvanishing α) seems to be lacking (23).

We could also assume that punishers penalize nonparticipants, with a fine δβ and the cost to the punisher δγ (with 0 ≤ δ ≤ 1). Although this further stabilizes punishment once it is established, it also hinders the emergence of punishment (Fig. 3B) (17). It follows that resorting to stricter forms of social coercion may not be an efficient way to increase cooperation. Second-order punishment (α > 0) barely affects the outcome, whereas punishing nonparticipants (δ > 0) can even lead to contrary effects. The system responds to an increase in compulsion with a decrease in cooperation.

When punishers are common, individual-level selection against them is weak (because only little punishment occurs) and may be overcome by selection among groups (11). Several other models confirm that the punishment of defectors is stable provided that it is the prevalent norm. This happens, for example, if some degree of conformism in the population is assumed (10); individuals preferentially copy what is frequent. Similarly, cooperation in the public goods game can also be stabilized through additional rounds of pairwise interactions based on indirect reciprocity. In this case, players can reward contributors (24, 25). Even so, in each case, the emergence of the prosocial norm remains an open problem (26, 27).

Our model, in contrast, shows that even when initially rare, punishing behavior can be advantageous and is likely to become fixed. We consider the most challenging scenario, namely, a single well-mixed population whose members imitate preferentially the behavior that fares better, not the behavior that is more common. Once established, group selection, conformism, and reputation effects may maintain prosocial norms and promote their spreading. Eventually, institutions for punishing free-riders may arise, or genetic predispositions to punish dissidents.

Recent experiments show that if players can choose between joining a public goods game either with or without punishment, they prefer the former (28). The interpretation seems clear: Whoever freely accepts that defection may be punished is unlikely to be a defector. For contributors, it is thus less risky to join such a group. Players voluntarily commit themselves to sanctioning rules. This voluntary submission is not immediate, however. In the majority of cases, it requires a few preliminary rounds. Many players appear to have initial reservations against the possibility of sanctions and need a learning phase. In another series of experiments, it has been shown that a threat of punishment can decrease the level of cooperation in trust games (29). Experimental evidence for costly punishment can also be found in the ultimatum game (rejecting an unfair offer is costly to both players) (2) and in indirect reciprocity (by not helping defectors, players reduce their own chances of being helped) (30). If punishment is combined with rewarding through indirect reciprocity, punishment is focused on the worst offenders and is otherwise strongly reduced in favor of rewarding contributors (31). In all of these investigations, and in the experiments on voluntary public goods games without punishment (21), there is ample evidence that players can adapt their strategy from one round to the next, as a reaction to the current state of the population. Our model is based on this aptitude for social learning.

In our framework, the joint effort represents an innovation, a new type of interaction that improves the payoff of participants if it succeeds, but costs dearly if it fails. Abstaining from such a risky enterprise does not mean living a hermit's life. It means collecting mushrooms instead of participating in a collective hunt, remaining at home in lieu of joining a raiding party, dispersing in the woods rather than erecting a stronghold against an invader, and growing potatoes on one's plot of land instead of handing it over to a commons likely to be ruined by overgrazing.

Our model predicts that if the joint enterprise is optional, cooperation backed by punishment is more likely than if the joint enterprise is obligatory. Sometimes, there is no way to opt out of a public goods project—the preservation of our climate is one example (32). In that case, participation is obligatory, and defection widespread.

Reports from present-day hunter-gatherer societies often stress their egalitarian and “democratic” features: Individuals have a great deal of freedom (33). This creates favorable conditions for voluntary participation. On the other hand, ostracism was probably an early form of severe punishment. There seems to be a smooth transition between choosing not to take part in a joint enterprise and being excluded. Together, these two alternatives may explain the emergence of rule-enforcing institutions promoting prosocial behavior, following Hardin's recipe for overcoming the “tragedy of the commons”:mutual coercion, mutually agreed upon (34).

Supporting Online Material

SOM Text

Figs. S1 to S5


References and Notes

View Abstract

Navigate This Article