# The Prisoner's Dilemma

Shaped in the 1950s by Merrill Flood and Melvin Dresher, two employees of the RAND-corporation (a think tank for consulting the US-forces), and named by the US-mathematician Albert William Tucker, the Prisoner's Dilemma (PD) is often presented with the following story:

A crime has been committed. Two suspects are arrested and imprisoned. Each prisoner is in solitary confinement with no means to communicate to the other. The police has not enough evidence to convict them, so it offers each prisoner a bargain. Each prisoner is given the opportunity either to betray the other, by testifying that the other committed the crime, or to cooperate with the other by remaining silent. The following rules apply:

If A and B both betray each other, each of them serves 5 years in prison

If A betrays B but B remains silent, A will be set free and B will serve 10 years in prison - and vice versa.

If A and B both remain silent, both of them will serve 1 year in prison

Rapoport, Anatol / Chammah, Albert M. (1965): Prisoner's Dilemma. University of Michigan Press.

Note that in this situation betrayal (aka defection or non-cooperation) is the the dominant strategy. Betrayal is definitely a prisoner’s best choice if the opponent should remain silent (cooperate) - the prisoner is freed immediately -, and it yields still a higher payoff (less years in prison) if the opponent should betray too (defect). This is true for both players alike. If both remain silent however, both could go home after one year in prison. Cooperation hence would benefit them, but defection is dominant. The question is hence, how the step from individually beneficial defection to communally beneficial cooperation can be accomplished.

Formally, a Prisoner’s Dilemma is defined as a symmetric two-player normal form game in which the payoff of the action of one player depends on the choice of action of the other player. The particular PD-rules define the payoff for one-sided defection (“temptation”, T) to be higher than the “reward” (R) for mutual cooperation which in its turn has to be higher than “punishment” (P) for mutual defection which again has to be higher than one-sided cooperation (“sucker’s payoff”, S). In short: T > R > P > S.

 prisoner 2 cooperate defect prisoner 1 cooperate -1 / -1 -10 / 0 defect 0 / -10 -5 / -5

## Solutions

The Prisoner’s Dilemma knows the following solutions:

Dominant strategy: "defect"

Common good solution: "cooperate"/"cooperate", (-1/ -1)

Nash-equilibrium: "defect"/"defect" (-5/ -5)

A Nash-equilibrium is a situation in which nobody benefits from taking another decision as long as the opponent does not change behavior too. With defection being a dominant strategy for both prisoners, the behavior of the prisoners will results in a Nash-equilibrium "defect"/"defect" when the PD is repeated. The Nash-equilibrium serves as an attractor for the game

Note, that since defection is dominant but cooperation would be more profitable if one could know whether the opponent will cooperate too, the persistence of the sub-optimal Nash-equilibrium depends on information. As long as there is no information about the behavior of the other prisoner, it is profitable to defect.

The question "how cooperation emerges" hence can be reformulated as the question about "how information about the opponent's behavior emerges"

## The iterated Prisoner's dilemma

Robert Axelrod (1984, 1997) found this information (and hence cooperation) emerging when iterating, i.e. repeating the Prisoner's Dilemma (IPD). Prisoners are assumed to memorize unprofitable encounters and realign their strategy when put into the dilemma again. In respect to this possibility, Axelrod invited academic colleagues all over the world to devise software versions of strategies to compete in a computer-based iterated PD tournament.* A wide variety of programs were sent differing significantly in the complexity of their memorizing strategies. Surprisingly however, a very simple strategy, sent in by Anatol Rapoport, won the tournament: the strategy tit-for-tat, which simply suggests cooperation on the first iteration of a game and then simply copies the behavior of the opponent on the previous move.

*In order to prevent that alternating between cooperation and defection yields a greater reward than mutual cooperation, the iterated PD requires that 2R > T + S

The following interactive model simulates a simplified version of this tournament with just four competing strategies: pure cooperation, pure defection, pure randomness and tit-for-tat. The sliders allow to adjust the number of players for each strategy and the number of iterations played.

As can be learnd from the model, defection starts out strong and even pure randomness is doing better than cooperation and tit-for-tat when the game is played only a small number of times (e.g. 100 times with 30 players of each strategy=. However, cooperation and tit-for-tat catch up soon when the number of iterations is increased (e.g. to 500), eventually leaving no chance to defectors. In the long run, even randomness is doing better than pure defection.

Axelrod, Robert / Hamilton, William D. (1981): The Evolution of Cooperation; in: Science, 211 (4489): p. 1390-1396.

Axelrod R. (1984) The evolution of cooperation. Basic Books, New York.

Axelrod R. (1997) The complexity of cooperation. Agent-based models of competition and collaboration. Princeton University Press, New Jersey.

### Tit-for-Tat in sticklebacks

That Tit-for-Tat is an efficient strategy in iterated prisoner's dilemmas has been shown in an exciting experiments with sticklebacks - a family of fish. Such sticklebacks were confronted with a Cichlid (a predator) put behind a glass wall in their aquarium. At the same time, a companion of the same species was simulated to the fish by a system of mirrors. This companion either behaved "cooperative" or "deceiving".

Usually, sticklebacks try to get information about potential enemies first. In order to do so, (if they are a pair or a group) they alternately jerkily swim ahead, exchanging the leading position after each single action. For each fish it would be safer to let another fish swim ahead and not to take the leading position, i.e. "deceive". In the long run, this will be paid with a lower information-payoff (as a public good).

In the experiment, a mirror projection presented "cooperative" as well as "deceiving" behavior to the fish. Sticklebacks with a "cooperating mirror" stayed in the (unsafe) front half of the aquarium twice as often as sticklebacks with a "deceiving mirror". They seemed to adapt the strategy of tit-for-tat.

Hence, in the case of direct reciprocity, "cooperating" fish approached considerably closer to the predator than "deceiving" fish. Their level of information about the predator was significantly higher.

Milinski, Manfred (1987): TIT FOR TAT in sticklebacks and the evolution of cooperation, Nature 325, p. 433-435.

## The evolutionary Prisoner's Dilemma

In a slightly more complex suggestion for testing the evolution of cooperation Robert Axelrod (1987) considered a population of software agents endowed with a "chromosome" defining the strategy of an agent. In this scenario, a strategy, determined from the outcome of the last three confrontations ("memory three-strategies"), provides information about the next move in the next confrontation.

Starting with an initial population of 20 agents with randomly generated 70-bit chromosomes (or chromosomes with 70 genes, respectively) playing against each other, an agent's "fitness" is determined by the average success in 151 confrontations. Agents reproduce in respect to their "fitness" and their chromosomes are inherited via "crossover" and "mutation", two technologies often deployed in computational evolution.

During crossover, two parent chromosomes are cut at the same position and combined with each other to yield two new, genetically different chromosomes, thus bequeathing part of the mother's and part of the father's chromosome to the agent - a simplified version of a biological recombination of genes. Thereafter, a small part of the chromosome (one or a few genes) is altered via mutation.

Axelrod, Robert (1987) The Evolution of Strategies in the iterated Prisoner's Dilemma; in: Davis, L. (ed.): Genetic Algorithms and Simulated Annealing. London.

When running this model, the agents initially tend to drift away from cooperative behavior. Less cooperative strategies fare better, because at this early point there are only few other cooperative strategies they can score with. After 10 to 20 generations however, some strategies develop patterns to react reciprocally to other cooperative agents. The more these patterns stabilize, the better performs cooperation. And with better performance, the strategy is increasingly inherited (see plot below).

## The demographic Prisoner's Dilemma

Yet another variant of the prisoner's dilemma has been proposed by Joshua Epstein (1998). In this version computer-generated agents move on a 30x30 grid-torus and interact with their neighbors. Every agent is provided with the attributes "vision", "wealth", "age", and "strategy" and follows the rule: move to a free site within your vision and play your strategy ("cooperate" or "defect") against all your neighbors there. The agent and the neighbors then receive pay-offs according to the rule T > R > 0 > P > S, implying that (in contrast to Axelrod's model) pay-offs can also be negative (i.e. T = 6, R = 5, P = -5, S = -6). The agents' wealth is calculated by accumulating these pay-offs. If the pay-off decreases to a point below 0, the agent dies (is removed from the game); if the pay-off increases above 10, the agent "gives birth" to an offspring, which inherits her strategy, vision and part of her wealth (6 points).

Epstein, Joshua M. (1998) Zones of Cooperation in Demographic Prisoner's Dilemma. Complexity 4 (2): p. 36-48.

Epstein, Joshua M. (2006) Generative Social Science: Studies in Agent-Based Computational Modeling. Princeton.

In this setting, after 15 steps, clearly visible neighborhoods of cooperators (white) start to form. After 50 steps, they occupy almost all of the torus, only punctuated with single islands of defectors (red); a formation that remains stable. In this topological version of the prisoner's dilemma cooperation prevails and dominates among agents without any memory.