Exercise Sheet E

Exercise

At this point, there is a change in the 'rules'. Up to now, each exercise was submitted twice, with feedback being given prior to the second submission. Henceforth, there will be only one submission. Your first submission of an exercise is your final submission.

Your work for this exercise sheet will be automatically collected in (if you put it in the right place) at 5 p.m. on Friday 14th February.

Put your work into a subdirectory of your cs2000 directory called sheetE.

Make sure that, by the submission date, your sheetE directory contains only the files you want me to mark. Do not leave old versions of the program or versions of other programs in the directory.

Background Information

The purpose of this exercise is for you to put into practice some of the most recent things that were covered in the lectures: class hierarchies, inheritance, polymorphism, dynamic method binding, etc.

You are going to write a program that plays a simple 'game'. The 'game' is called Prisoner's Dilemma. The game seems unbelievably simple, but it has generated huge quantities of research. It's a game whose principles may have wide-ranging explanatory power. For example, some biologists believe "that many wild animals and plants are engaged in ceaseless games of Prisoner's Dilemma". (See chapter 12 of the 2nd edition of Richard Dawkins' The Selfish Gene for a clear and fascinating explanation.) The game may also explain aspects of economic, social and political behaviour.

Here's the game.

You and your opponent are prisoners, suspected of collaborating in a crime. You have no way of communicating with each other. Your gaolers offer both of you a choice of two actions: Cooperate or Defect. Here are the outcomes (in terms of points scored) of the different pairs of actions:

		What your opponent does
		Cooperate	Defect
What you do	Cooperate	3	0
What you do	Defect	5	1

So if you both choose the Cooperate action, you both get 3 points. If you choose the Cooperate action but your opponent chooses to Defect, then you get 0 and your opponent gets 5. And vice versa: if you Defect and your opponent Cooperates, you get 5 and s/he gets 0. If you both Defect, you both get 1 point.

There is only one rational way to play this game: always Defect. (If you think your opponent will Cooperate, then you can get a higher score by choosing Defect. If you think your opponent will Defect, the best you can do is also to Defect.) Yet, and this is the dilemma, if you had both Cooperated, each of you would have done better! If only you could have reached a trust-worthy agreement...

As the game stands, it's only mildly interesting: as we've seen, rational players must both Defect. But there's another version of the game called Iterated Prisoner's Dilemma. In this version, you simply play the game repeatedly, an indefinite number of times. "The successive rounds of the game give us the opportunity to build up trust or mistrust, to reciprocate or placate, forgive or avenge." (Dawkins, p.206)

Now strategies other than `always Defect' might be worth playing. Consider a prisoner who plays the strategy known as 'nice_tit_for_tat'. In this strategy, the prisoner's first move is Cooperate. On subsequent bouts, this prisoner simply copies the previous move of the other prisoner.

Suppose two prisoners both playing 'nice_tit_for_tat' meet. They both start by Cooperating, and then they both copy each other's previous move. So they end up always Cooperating. And so they score n * 3, where n is the number of bouts they play. This is better than both always Defecting, which would have given them a score of n * 1.

In this exercise, we'll build a system that would allow a biologist, economist or political scientist to experiment with different strategies, with a view to seeing how and whether cooperation can emerge. (Our system is not unlike the system that was built by one of the pioneers of the field: Robert Axelrod.)

The Task

Read the entire exercise before writing any code. Draw a UML Class Diagram showing classes and class hierarchies. Show your diagram to me or one of my helpers before proceeding.

Prisoner

A Prisoner will have a name and a total score. In each bout of the game that the prisoner plays, s/he will have to decide whether to Cooperate or Defect. There are several types of prisoner, each playing different strategies:

rand: a prisoner who plays this strategy chooses randomly between Cooperate and Defect, each having a probability of 0.5.
nice_tit_for_tat: see above.
nasty_tit-for_tat: as per nice_tit_for_tat but the first move is Defect.
soft_majo: a prisoner who plays this strategy will choose his/her opponent's majority move, i.e. if the majority of the opponent's moves so far have been Cooperates, then Cooperate is chosen; if the majority have been Defects, then Defect is chosen; if it's a tie, then Cooperate is chosen.
hard_majo: as per soft_majo but, in the event of a tie, Defect is chosen.

Bouts and Round Robin Tournaments

A bout is a single encounter between two prisoners. In a bout, each prisoner chooses a move. The move is scored using the table given earlier. Each prisoner must then record his/her score.

In a round robin tournament several prisoners play each other repeatedly.

Each prisoner is paired with each other prisoner (excluding itself). The pair then play a game of Iterated Prisoner's Dilemma, lasting n iterations. They then move on to the play the next prisoner. (Eventually every prisoner will have gone n bouts with every other prisoner.)

Finally, you print out a league table that shows how well the different strategies have performed.

Just to give you an idea, here's the main method for my version of this program:

public static void main(String[] args)
{  Prisoner p1 = new TitForTat(Move.COOPERATE); // nice_tit_for_tat
   Prisoner p2 = new TitForTat(Move.DEFECT);    // nasty_tit_for_tat
   Prisoner p3 = new Rand();                    // rand
   Prisoner p4 = new Majority(Move.COOPERATE);  // soft_majo
   Prisoner p5 = new Majority(Move.DEFECT);     // hard_majo
   Prisoner[] prisoners = {p1, p2, p3, p4, p5};
   RoundRobinTournament rrt = 
      new RoundRobinTournament(200, prisoners); // 200 iterations
   rrt.run();
   rrt.displayLeagueTable();
}

Because one of the prisoners has a random strategy, the league table will be different each time you run the program. Here's what I got on one occasion:

nice_tit_for_tat: 2037
nasty_tit_for_tat: 1658
rand: 1788
soft_majo: 1939
hard_majo: 1769

And here's the fascinating thing. It is very very likely that the more cooperative strategies (nice_tit_for_tat and soft_majo) will beat the less cooperative strategies (nasty_tit_for_tat and hard_majo). (rand could come anywhere in the league table.)

When you think about this, it's remarkable! The rational thing to do when playing Prisoner's Dilemma is always Defect. But, when playing Iterated Prisoner's Dilemma, while you mustn't be a total sucker, a degree of selective cooperation pays off!

If you're interested enough to read more, here are a few links:

The Ethical Spectacle, vol. I, No. 9 September 1995. This gives examples of the Prisoner's Dilemma in business, the Holocaust, litigation, love, the movies, politics, software development and the welfare state!
Chaos, cheating and cooperation: potential solutions to the Prisoner's Dilemma, by Bjorn Brembs. This mentions some other strategies, and gives a few examples of where the game can be found in nature.
Entry in the Stanford Encyclopedia of Philosophy. This is a somewhat technical treatment of the Dilemma and many of its variants.

There are numerous other sites and papers on this topic. Put the topic through your search engine if you want to find out more. (Be prepared to be overwhelmed.)

Challenge Exercise

As usual, make a subdirectory of your sheetE directory called challenge and do your challenge work in this directory. The deadline for this challenge exercise is also 5 p.m. on February 14th.

Here are two ideas.

Extend your program and experiment with it.
- If you look at the literature, you'll find lots more strategies that you could implement. A strategy called pavlov, for example, Cooperates only if both prisoners made the same move in the previous bout. Periodic strategies, such as repeatedly playing C,C,D or C,D or D,D,C, have also been suggested. (If you implement the periodic strategies, try to do so in a general way, rather than requiring one class definition per periodic strategy.)
- In Axelrod's original formulation, a prisoner could play against him/herself. But we want to score strategies, not individual prisoners. And when a prisoner plays him/herself, we don't want him/her to receive both sets of points. Make a change to your program to handle this.
There is an alternative to Round Robin Tournaments. Let's refer to the alternative as an Evolution Tournament.
In an evolution tournament, there is an initial population of prisoners, Maybe the initial population contains one prisoner per strategy. Call this generation 0.

The prisoners in generation 0 play each other in round robin fashion.

At the end of this tournament, a new population of prisoners, generation 1, is created. It contains the same number of prisoners as in generation 0. But, points means offspring! So the number of nice_tit_for_tats in generation 1 will depend on tit_for_tat's score in generation 0; and similarly for each of the other strategies.

Repeat this for a large number of generations. What kind of population mix do you end up with? Do the nice guys come out on top? Does the population stabilise (i.e. no changes in the make-up of successive generations)?