# A Genetic Algorithm Approach to the Iterated Prisoners’ Dilemma

Charlie Soeder

.Outline • A genetic algorithm (GA) was used to generate strategies in the iterated prisoners’ dilemma (IPD) • Strategies were represented as binary strings. chromosomes consisting of these strings were assigned fitness based upon their performance in IPD.

GA: Overview Initial Population Input Population Fitness. differential reproduction Breeding Population Crossover Mutation Altered Population Reintegration Iteration Output Population Final Population .

Defect) = (0. 1) • Payoff for a round given by the payoff matrix: Other’s Play 1 0 0 0 3 Self’s Play 1 5 1 .IPD Overview • Two options: (Cooperate.

• A game in IPD is a certain number of successive rounds of self vs. other • Cumulative payoffs determine success of a strategy .

Chromosomes start with a “false history.” • The next 2k bits describe plays based on the history. . • The first k bits record a history of the game.Implementation • Each chromosome consisted of a k + 2k binary string.

• Generic chromosome at round r: C =aa a bb b r 0 1… k-1 0 1… 2^k – 1 J= • Self’s play = bJ • Other’s play = p • Chromosome at round r + 1: C =aa p bb b r 1 2… 0 1… 2^k – 1 .

(C’s gains in game) C. .Fitness := (Number of rounds per game)(Maximum payoff per round) • This fitness function fed into the GA.• Strategic performance in a game was used to determine fitness.

the population should quickly learn to exploit AC. • Some debugging later… . • Competing against a constant strategy: always cooperate (AC).Results: Versus Always Cooperate • Initial population is randomly generated strings. • Proof of concept: if all goes well.

0. 0] 1. 0. 0. 1. 1. 1. 0. 0. 1.0* [1. 0. 0] 1. 0. 1. 0. 1. 0. 0. 0.0 [0. 1.0 [0. 0. 0. 1. 0.0 [0. 0] 1. 0.0 [0.0 [0. 0. 0. 1] 1. 1] 1. 0. 1. 0. 0] 1. 1. 0. 0. 1. 1. 0. 0. 1] 1.• After 5 GA iterations: [0. 0. 1. 0.0* *Different run. 0. 0. 1. 0. 1. 1. 0. 0. . 1.strategy: [1. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 1. 0. 0] 1. 0. 1. 0. 1. 0.0 [0. 1] 1.0 [0. 0. 0. 1. 0. 0. 0. 0. 1. 0. 0. 1. 1.0 • Compare differences in meta . same settings. 0. 0. 0] 1. 0. 1. 0. 1.0* [1. 1. 0. 0] 1.0 1. 0. 0. 0.0 [0. 1. 0. 0. 1. 0. 0. 0. 0. 0. 0. 0. 0. 1. 1. 1. 0. 1. 1. 0. 0. 1.0 [0. 1. 1] 1. 1. 0. 1. 1] 1. 0. 0. 0. 1. 0.

1.2 . 1. 1. 1.2 • [0. 1. 1. 1. 1. 1.2 – V2:[0. 1] 0. 1. 0. 0. 1.Results: Versus Always Defect • Should quickly learn to defend self. 1] 0. 1. 0. 0. 0. 0. 1. 0. 1] 0.2 • Robustness of V1: [1. 0. • Results are analogous: – V1:[1. 1. 0. 0. 0. 1] 0. 0. 1. 1. 0. 1] 0. 0. 1. 1. 0. 0. 0.2 • [1. 0. 1. 0. 1. 1. 1. 1. 0. 1. 0.

0. 0. 1. 0. 1. 1. 0. 0. 0. 0. 0.Results: Versus Tit for Tat • T4T = [0. 0. 0. 0. 0. 0. 1. 0] 0. 0.0. 0. 1] 0. 0. 0. 0. 1.58 [0. 1.56 [0. 0. 1. 0. 1. 1] • Output [1. 0. 0. 0. 0. 0. 1. 0. 1. 0. 0. 1. 0. 1. 0. 1. 0 . 0.56 0. 1] 0. • Attempts to exploit initial cooperation are suboptimal. 0. 0.58 [0. 0. 1.6 [1. 1. 0. 1. 0.6 [0. 0] 0. 0] 0. 0. 0. 0] 0. 0.582 • Trend is towards AC. 0. 0. . 1. 0. 1. 1.

often with high fitnesses .Versus assorted single strategies • Suspicious Tit for Tat: – Forgives initial defect • Grudger: – Trend towards AC • Random: – Defect-heavy strategies.

374 0.4324 0.298 0.44 0.3088 0.5088 0.4968 . Round-robin competition? • Fluctuating population fitnesses: Iteration 0 1 2 3 4 5 6 7 8 9 10 Population Fitness 0.42 0.Population versus Self • Random vs.4682 0.4424 0.3864 0.

5 Fitness 0.65 0.25 0 5 10 15 20 25 30 Iteration Number .6 0.Fitness Fluctuations Average Fitness versus Iteration for Round-Robin Population Competition 0.35 0.55 0.3 0.4 0.45 0.

– 0.36 seems somewhat stable as well. • Differences in stability? Why? . fitness of 0. these populations tend to play CCD or DCC against one another.44: CCDD has popped up as well.6 corresponds to a de facto AC population – Avg.6 corresponds to CD – 0. fitness of 0.• At least two somewhat stable equilibria. – Avg.

is there advantage to retaining Defect plays in the strategy? – Apparently.Tit for Tat Population • What is the long term behavior of a population consisting entirely of tit for tat? – In a de facto AC population. • Variation of mutRate and iteration number .

self competition? – Can strength of equilibrium attraction be characterized? – Influx of new strateg(ies) to stable population? .Further Research • Application of probability arguments to metastrategies? • Are there other stable populations in self vs.