# 6/1/12

**CS212 Unit 5 - Udacity Wiki
**

Back to course page | CS212 Unit 1 | CS212 Unit 2 | CS212 Unit 3 | CS212 Unit 4 | Print this page | Save as PDF

CS212 Unit 5

Contents 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13. 14. 15. 16. 17. 18. 19. 20. 21. 22. 23. 24. 25. 26. 27. 28. 29. 30. 31. 32. 33. 34. 35. 36. 37. 38. 39. 40. 41. 42. 43. 01 02 03 03 04 05 05 06 07 07 08 08 09 09 10 11 11 12 12 13 14 14 15 15 16 17 18 19 19 20 21 21 22 23 23 24 25 26 26 27 28 28 29 Welcome Back Porcine Probability q The State of Pig s The State of Pig l Concept Inventory p Hold and Roll s Hold and Roll l Named Tuples p Clueless s Clueless p Hold At Strategy s Hold At Strategy p Play Pig s Play Pig l Dependency Injection p Loading the Dice s Loading the Dice q Optimizing Strategy s Optimizing Strategy l Utility q Game Theory s Game Theory q Break Even Point s Break Even Point q Whats your Crossover l Optimal Pig l Pwin p Maxwins s Maxwins l Impressing Pig Scouts p Maximizing Differential s Maximizing Differential l Being Careful p Legal Actions s Legal Actions l Using Tools l Telling A Story q Simulation vs Enumeration s Simulation vs Enumeration l Conditional Probability q Tuesday s Tuesday l Summary

1. 01 Welcome Back

Hey, welcome back. Now, as we've said, this class is all about managing complexity. Now many types of software manage complexity by trying to artificially rule out any type of uncertainty. That is, say you have a checkbook-balancing program, and it says you've got to enter the exact amount. You've got to say $39.27. You can't say, oh I don't know about $40. It's easier to write programs that deal that way, but it constrains what you can do. So, in this unit we're going to learn about how the laws of probability can allow you to deal with uncertainty in your programs. Now, the truly amazing thing is that you can allow uncertainty and what you know about the world, or what's true right now and uncertainty in your actions, if the program does something, what happens next? Even though both of those are uncertain you can still use the laws of probability to calculate what it means to do the right thing. That is, we can have clarity of action. We can know exactly what the best thing to do is even though we're uncertain about what's going to happen. So follow with this unit, and we'll learn how to do that.

2. 02 Porcine Probability

This unit is about probability, which is a tool for dealing with uncertainty. Once you understand probability, you'll be able to tackle a much broader range of problems than you could with programs that don't understand probability. Often when we have problems with uncertainty, we're dealing with search problems. Recall, in a search problem, we are in a current state. There are other states that we can transition into, and we're trying to achieve some goal, but we can't do it all in one step. We have to paste together a sequence

wiki.udacity.com/CS212 Unit 5

1/11

and we know that we have to represent states for a search problem. "I'm going to roll. and for your hand you score not this total in pending. I'm going to move top-down. Let's say I roll again. The previous turn score. they might be interesting. I add 6 to the pending. but they don't really help us to find the current state. that player's score. and we have a score. and now player number 1 says.Udacity Wiki
of steps. Now these things. the other player's score. at the mid level there's the concept of current state of the game. goes. We just represent it once for the whole game. we're going to play a dice game which is called Pig. And that's what we're going to deal with. I'm going to have a function play-pig. Now the other player. And the object of the game is to score a certain number of points. I guess I should say here that we're assuming that the goal of the game. We're going to say 50 points. so I roll again. we're building up a search frontier that we're continuing to explore from." gets a 4. Let's say it's my turn. and this time I get a 1. the implementation of the players and of the player to move. And the score starts off 0 to 0. And then at the high level. We'll call that the pending score. So that's how the game of Pig works. Then the player whose turn it is." gets a 5.udacity. we might want that to be part of the state. 100 is more common. In particular. and when I'm doing the implementation. whether I just rolled a five or something else. so I add 2 to the pending score. I get 7. 03 s The State of Pig
Well. the implementation of scores. remember. what happened before. it may be that we start off in one of four possible states and all we know is that we're somewhere in there. "I'm going to roll again. all of these are possibilities. I want you to tell me which one of these are necessary to describe the state of the game. I can guarantee no porcine creatures were harmed in the creation of this unit. 04 l Concept Inventory
At the low level--I count as low-level things like the roll of a die. one place where people are used to dealing with uncertainty is in playing games that employ dice. me. we'll have players with the imaginative names of player 0 and player 1. So your turn continues until you either hold or pig out. and the pending score that hasn't been reaped yet. "I think I've had enough. high-level concepts. had three things. you. A 1 is called a pig out. Then 5 goes into the pending score. what's
wiki. "I'm going to roll again. Here's how the game works: There are two players. We have to know what player is playing. Now.6/1/12
CS212 Unit 5 . And now player 1's turn ends. We're sort of inching towards a search problem. I don't know why the game is called Pig. although you could play with more. The previous role of the dice. I don't see any difficulties in implementing any of these pieces. I get 13. The scoreboard. but just the 1. the 12 points. If we're thinking of search problems then we also have to know about actions we can take. I've written it as p. I'm going to hold. the things that were on the scoreboard. say this action here--action A--it may be that we don't get to one specific state but. if you decided to hold. and it's player 0's turn. And you keep on taking turns until somebody reaches the target--here. but I don't score any points yet. and the score is just 1 if you pigged out. In doing that. we want to know the current state of the game. I pick up the die. Now there's another part of the scoreboard that is not part of the player's score. So. trying to resolve what the right representation is for one of these difficult pieces. And I'm going great. Now let's go to try to describe the game in a form that we can program. So. And so we'll see techniques for dealing with both of these types of uncertainty.
5. and that would inform my high-level decisions. and mid-level concepts. but we're not sure exactly where we are. So. Let's say player number 1 says. I roll it. 50. the state's going to end up being something like a four tuple. So now we have 12 in the pending. But since I don't see any difficulty. And so it's my turn. pending. As we saw in the discussion forums there's always a question of where do you want to start. we're uncertain as to what the action will do. First. and on his turn a player has the option to roll the dice--a single die-as often as he wants or to hold--to stop rolling. the number of points you need win. The players take turns. the player to move. So my score would be just the 1. So I started sort of middle-out saying these are the kinds of things I think I'm going to need. we're going to make an inventory of concepts in the game. that might be part of the state. that plays a game between two players. 03 q The State of Pig
So as usual. This time I get a 2. I might spend more time now. Now let's think about how to implement these things." gets a 3. 1) We can be uncertain about the current state. This time I'm going to try to break things out a little bit. now I have a good enough feel for them that I feel confident in moving top-down. the goal--so these are all things that we're going to have to represent. but 50 will be easier on the Udacity servers in terms of the amount of computation required. if you didn't pig out. rather. I'm lucky. Now it's my turn again. Do I roll or do I hold--stop rolling? Let's say I want to roll again. uncertainty can come into play in two ways. and I have the notion of a strategy--a strategy that a player is taking in order to play the game. I get a 6. We have to know how much is pending. because that's going to affect the score. and when you roll a pig out it means you lose all the pending points. and I'm going to talk about low-level concepts. put them up on the board for player 1's score. then I'll be able to make choices later on without feeling constrained. at the low level. So. how much did the other player just make on their turn? So.
3. Which of these are necessary for the current state?
4.com/CS212 Unit 5
2/11
. You might be able to think of other possibilities. 2) The other place uncertainty can come in is when we apply an action. We know that there are two actions: Roll and hold. So here's a scoreboard. If I thought there was something down here that was difficult to deal with. here's some candidates for what's in the current state. and your score for the turn is the sum of your rolls. Now. I'm going to jump to the high level. Do you want to describe the low level first and build up from there? Do you want to describe the high level first and build down? I think for this case we'll take more of a middle out approach. If I start at the top. And a 1 is special. we certainly have to know the score." and that means we take these points from the pending. and we might end up in this state or this state or this state instead of the one that we were aiming at. and let's say I get a 5. Rather than knowing exactly where we are. player number 1. we're assuming that's constant and doesn't need to be represented in each individual state. So those are unnecessary. Now.

But that doesn't seem quite right. When I roll. There are a few negatives as well. What about the die? That seems to take and effect that roll by itself is not a function from state to state. And we'll have to make a choice of how we represent these players. If it was zero it becomes one. When you have four components that's probably okay. which takes a set of possible moves and picks one at random. we can call the actions just by strings. The player to move--we can represent that as an index. and it's certainly familiar to most Python programmers. and I can ask for the components of s by name.6/1/12
CS212 Unit 5 . We could have something more complex. The subsequent state would remain the same if the player continues and would swap between one and the other otherwise. 07 s Clueless
Here's my solution: I gave you the hint of importing the random module. hold--where we're explicitly creating a new state. I want you to write a strategy function. I really start to worry about that.Udacity Wiki
play-pig? Well. and I'm breaking it up like this into it's four components. So. Then if I hold it becomes the other player's turn.
9.me. Remember a strategy is a function. So the dice can come up as D. roll--if we wanted to specify it--would be a function from a state to a set of states. again let's figure out what's in the state.4). That's why we have an uncertain or a nondeterministic domain is because an action doesn't have a single result. We can define a new data type called state and use capitalized letters for data types. similarly for roll. And.
wiki. Here the state that results from holding. and now roll. It's a little bit more verbose. We look at the state. we can use something called a namedtuple that gives a name to the tuple itself as well as to the individual elements. Now. and the pending gets reset to zero.2.
8. which is a tuple that has four components. So go ahead and write that. so we use the strings "roll" and "hold" and that could be what the strategy returns. How would I choose between this representation for states and the normal tuple representation? Well the namedtuple had a couple of advantages. into an array of players. Scores can be represented as integers. and then the score that I got--I just add in the pending. If the role is not a one then it's still my turn. and pending. which take a state and return a state. and it may be unfamiliar to some programmers. Let's just say now how are we going to represent these actions? Well. but I add d onto the pending. the state. hold and role. They look fairly similar. it becomes the other player's turn. A and B. the state. which takes a state as input. and let's just say that its input is two players. Here's what hold and roll look like in this new notation.
7. in some cases it makes sense to go ahead and implement these actions as functions that look like that. but it would be even more verbose. and that represents the fundamental uncertainty. The die can be represented as an integer. So if you ask for the p field of something that's not a state that would give you an error. the simplest way to do it is just to represent the player by their strategy. We can also represent that as a function. function "hold" takes a state and returns a new state. that was you previously. 07 p Clueless
Now I'm going to talk about strategies for a minute. you.pending and so on. Namedtuples is in a module. 0 or 1. if I had five or six components. We're starting to move down. Go ahead and write those functions. What I want you to do first is write these two action functions. roll or hold.you. So. and we haven't decided yet how we're going to represent those. I'm sort of up in the air whether this representation is better than the previous representation with tuples. and it returns an action or a move in the game. It may take them a while to understand what namedtuples mean. Now what's a strategy? Well. And its output is--let's say it's going to be the winner of the game. 05 p Hold and Roll
Now you're probably itching to write some code by now--so let's get started. Say state is equal to a namedtuple. The strategy is a function. So there are other possibilities where we can be more explicit about what the state is rather than just have it be a set of undifferentiated elements of a tuple that we then define like this. me. So players will be strategy functions. But then we'll also need something that implements these actions. which we're calling clueless. Whereas if you just broke up something that was four elements into these components that would work even if it didn't happen to be a proper state.p. If the other player.udacity. So now remember the second place is the score of the player whose turn it is. So. In this game we said that the actions are roll and hold. roll or hold. there's a problem here. rather. So I can just go ahead and make that assertion. It's either zero or one. I don't change my score so far. It helps you catch errors. You notice the lines are a little bit longer in terms of we're being more explicit. one comment on style.com/CS212 Unit 5
3/11
. from a particular state with the particular die roll. So. I reap all of those. We can define it ahead of time. It's always a great idea to write some test cases. although not so much. that is going to return a single state rather than a set of states. if it was one it becomes zero. but I ended up with an implementation where I talk about the different possibilities for the dice. which is score accumulated so far but not yet put onto the scoreboard. Likewise the goal. If I had more than four. 05 s Hold and Roll
Here's my solution: So. The other player's score is the same as it was before. I just call it the random choice function. Here the state that results from rolling and getting a d. 06 l Named Tuples
Now here's an alternative. and the fields of the data type are p. So its a function that takes a state as input. and it's going to return one of the possible moves. two integers indicating the score. That's why we need probability to deal with it.3. it can have a set of possible results. so we'll have to have something that's a function that says--let's say-. Rather. and that could represent the player. but it's getting to worry me a little bit that maybe I won't be able to remember which part of the state is which. Now I can say s = state (1. Instead of just defining a state by just creating a tuple and then getting at the fields of a state by doing an assignment. It's explicit about the types. I should say we could also do the same type of thing by defining a class. I think that's going to be a function. And I just think it's easier to deal this way. Right here I'm taking this state. Me and you.the function "roll" takes a state and returns a new state. That has all the same positives. that return sets of states. but it seems like we don't need anything more than that. Now the rest seems to be pretty easy. one of the numbers 1 to 6. the player. and I only got one lousy point. A state is represented by this four tuple of p.
10. Here's just a way to map from one player to the other. if the roll is one that's a pig out. And the players themselves? Well. I have my state--I just broke it up into pieces so that I know what I'm talking about. I could go either way. And I considered that as a possibility. the score of the player to play and the score of the other player. And it takes as input a state. although in other applications you might want to deal that way. Pending gets reset to zero. and the state. Maybe A is the winner. and it's output is one of the action names. and then pending. from collections import namedtuple gives me access to it. and the name of the data type is state. a strategy--people sometimes use the word "policy" for that.
6. it takes a little bit more to say that. It does that by ignoring the state and just choosing one of the possible moves at random.

Apply the action. player "A" and "B. I start out in the start state.o. and then that will give back an action. so "A" and "B" are going to be strategy functions that we pass in. the more you're risking. When you get to the next state. and that won't be the same every time.rno. we're going to stick with the representations of states. e yu edn tt 6 7 r t r 'od i (pnig> x o (edn+e> ga ) e s 'ol e u n hl' f (edn = ) r pnigm = ol ) l e rl' 8 9 srtg. there's a single successor for each action. I should say there's one subtlety here that we'll build in to hold at. me and you score.)) tt olsae admrnit16)
Toggle line numbers
wiki. But there's one more trick here--when we were doing a normal search. Then I repeat this loop. Then it has to do the action. pass that state to the strategy function for the appropriate player whose turn it is." Same if the other player is greater than the goal. where n is an integer.pnig=sae . and then it holds. It has to keep score--it needs the score for player "A" and for player "B" and for pending. and then when one player wins.
14.all of that is managing the current state.." and we decided that we're going to represent this as a function. We had to figure out what the actions were. Since hold at x is a whole family of strategy of functions. If "P" is "0._ae_='oda(d'%x taey_nm_ hl_t%) 1 0 r t r srtg e u n taey
Toggle line numbers
13. So. I would risk pigging out and only scoring one point and getting to 41. We decided that this is a two player game. I've accumulated so much pending that I don't want to risk any more. but we're not going to want to worry about that for now. Then I hold if the pending score is greater than or equal to the x." Apply that strategy function to the state and if it's whole. Play Pig. it's going to return a strategy function. is greater than the goal. and whose turn it is-. and that disambiguates the action of rolling and makes it not generate a set of possible states. That will give me the new state. and that will generate a new state and we have to keep track of the state we're in. You have to write the code within the strategy function. Tell me everything I know about the current state. and perform the roll action on the state and on a random integer from 1 to 6. If the score of the player whose turn it is. then we're going to fix up it's name a little bit to describe it better. Whereas I know if I hold now I have 40 + 6 + 4 is 50.com/CS212 Unit 5
4/11
. It has to call the strategy functions. For example. where state is a four tuple of the player to move.
12. It has to figure out whose turn it is. hold at 30 describes that family of strategies. I should say. We'll talk in a bit about how to test programs like this. 09 s Play Pig
Here's my solution. And so we have to do one more thing.adn(." edn = r lyr ece ol"" 4 d f srtg(tt) e taeysae: 5 p m .rolling the dice as necessary. The reason is it's hard to test this. and I continue until we find a winner. hold at 20 is a strategy that keeps on rolling until the pending score is greater than or equal to 20. At some point in the future we might want to allow multiplayer games with more than two players. Another way to say that is. and if it doesn't return whole. 08 p Hold At Strategy
Now I want to describe a family of strategies that I'm calling hold at n.Udacity Wiki
11. zero one. 09 p Play Pig
Now let's talk about the design of the function. It has to take turns. I've already won the game. I'll give it the benefit of the doubt there.udacity." One thing I note is--I don't have any tests here. but it would be silly for me to keep rolling at that point. I pick out the strategy function for the player to play. updating the state." then strategy "0" is "A. 08 s Hold At Strategy
Here's my solution: I break up this data to its components." then strategy "1" is "B. or when you win by holding. Nothing has happened.yu pnig =sae p e o. I haven't gotten up to 20 yet. I put the strategies into a list because we're going to be indexing into that. edn) tt 6 i m > ga: f e = ol 7 r t r srtge[] e u n taeisp 8 e i yu> ga: l f o = ol 9 r t r srtge[te[] e u n taeisohrp] 1 0 e i saeisp(tt)= 'od: l f ttge[]sae = hl' 1 1 sae=hl(tt) tt odsae 1 2 es: le 1 3 sae=rl(tt. the score for "A. but the action plus the die--that generates a single state. So. or if I've already won if my current score plus the pending score is already greater than or equal to the goal. I want you to write the function "Play Pig. hold at 20.. Player number 0. So hold at 10. but you risk points by rolling as well. and that that turn keeps going until they hold or pig out. otherwise. Rather. that was it. hold at 20 will hold when pending is greater than or equal to 20. But here there's multiple successors for an action. that is "Player A. which is let's say that the goal is 50." is the player to move.6/1/12
CS212 Unit 5 .) tt 0000 4 wieTu: hl re 5 (. I return roll as my move. The higher the pending score is. No points awarded. either roll or hold. Otherwise. According to hold at 20 I should keep on rolling because my pending score is only 10. and then I'm going to hold. I apply the hold action to the state to get a new state.] taeis AB 3 sae=(.B: e lypgA ) 2 srtge =[.
1 d f hl_tx: e oda() 2 ""eunasrtg ta hlsi adol i "Rtr taey ht od f n ny f 3 pnig> xo pae rahsga. It's hard to write a deterministic test because part of playing the game is rolling the die. Otherwise. which is roll the die. It has to keep track of the current state." the score for "B.
1 d f pa_i(. I want you to go ahead and implement that. "Player 0" or "Player 1"--"A" or "B. and the pending score. Then let's say I roll a 6 and a 4. which plays a single game of Pig. then that player wins and. then it does return roll. the roll or hold. The point of this strategy is you get points by rolling.m. So I've given you this outline of saying we're going to define a strategy function." the pending. and my score when I start my round is say 40." If "P" is "1. hold at x is not going to be a strategy function. So let's make a list of what the function has to do. plays the game." which takes two strategies as input. then that player wins. keeping track of what's going on-. I assume that the strategy function is legal. inclusive. So there should be some point at which you're saying that's too much of a risk. Then we're going to return it. we turn that player either "A" or "B.

so that works out to average.
17. "A" is hold at 50. what's the quality of this action in this particular state? So if these were the actions. 10 l Dependency Injection
Now." assuming we're diametrically opposed. So the worst score for "me" would be the best score for the opponent. the shortest path. So "die rolls" should be an iterable that generates rolls. from that. obviously. or is the opponent trying to come up with the outcome that is average? And tell me the same for the dice. there were other states we could go to. But if I specify it." and if the dice average out. that's going to be a sequence or an iterable that will generate possible "die rolls. we introduced another complication. If there's another state where we lose. how you get from one state to the next. 12 s Optimizing Strategy
And the answer is. Tell me what happens. we'll give that a utility of 1. but none that are shorter. is our opponent trying to get the best. which has a rather scary and intimidating-sounding name. So that's saying. there's something that we want to affect. Even though we're dealing with uncertainty. we'll just average over all of them. Dependency Injection says this function depends on this random number generator. I'm going to maximize my choice. and the quality of
wiki. in the game of pig the opponent is trying to defeat "me. the value of a state is called its utility. trying to improve and make a strategy better and better. we'll give that a utility of 0. which is equivalent to saying never hold until you win. I think I said that "die rolls" have to be an iterable. but what if we could make a leap? Instead of incrementally coming up with a slightly better strategy. something we want to monitor or track or change. with this play pig. but it's actually a pretty simple idea. it's a big complicated function and way down somewhere inside. The idea is we've got a function like this. then I have control over it. and if we specified an algorithm that found the best path. Now we've gone beyond search in two ways." There's an optional argument. If I leave that out it should behave exactly like it did before. That's if it was my choice here. and then a 2--that gives me 50--and that allows "A" to win. How do we inject something into a function? Well. but when we want to test the function we can inject the "die rolls" that we want." or is the opponent trying to get the worst score for "me. Rather than trying to choose the best. which is the function on a state and an action and gives us a number--a utility number. In the normal case.
18. maybe we can still define what the world looks like and discover the optimal solution. and they could go in either way. that will just be random numbers exactly like it was before. the question is. Actually. By default.
20. and that can have six outcomes. There are other sequences that are of the same length. So now we have a way of describing the world. which is our opponent. and that means best score for "me. then we'd say for my opponent the quality of rolling from this state would give us this utility. So here's my implementation of the Dependency Injected Play Pig. "die rolls" just says we're going to generate an infinite sequence of random integers from 1 to 6.
19. the one that acts at random. And now this question of what each of these three are trying to do. Everything is equal in terms of outcome for the dice. 11 p Loading the Dice
So now.6/1/12
CS212 Unit 5 . Is the dice with "me" in trying to get the best result for "me?" Is the dice plotting against "me" in trying to get the worst result for "me?" Or is the dice going to average out? Go ahead and click the appropriate boxes there. and let's call it "die rolls" and say. would it be possible to leap to the best strategy? To make it sound more mathematical. We keep backing up the tree that way. Let's say there's one here. 13 l Utility
Now in economics and in game theory. or one of the shortest possible lists that allows "A" to win and then you can check Play a Game of Pig between "A" and "B" with these rolls and make sure that "A" wins. so let's be able to inject that. Here we go down and we ask for the next one out of those rolls and get it back. Can we do that and what would it even mean? On the surface it's not exactly clear. 2. then my opponent is going to minimize my score or maximize their score and go in this direction. We can just pass in a list saying what happens if the "die rolls" are a 6 and a 1 and then a 3 and a 5. or the least cost path. It's just a number. And I look at all these paths through that keep on going until they get to the end of the game. All we knew is that we were trying to arrive at some goal location. we would arrive at the best possible solution. it was always sort of one agent doing the searching. So when we were doing search. I still have the regular arguments "A" and "B.udacity. and I can describe the value of those paths. We started out in some state and we knew there were several different states we could go to. so let's call that "me. If it was my opponent's choice. and from there. what it has to be is an iterator such as a generator expression or something else. So maybe we can do the same thing here. "B" is the clueless function. and if there's 1 state where we win. for the big game. So let's add in the argument here. but I want you to write in there the list which is the shortest possible list. if I always choose the best. So "A" and "B" are going to be my two contestants." so they want the worst for "me. and then in addition to that. and I want you to tell me. that eventually. and I'm going to move there. it's "me" and I have options I can take-. by applying that algorithm to that description of the world. When we start out. hold and roll. and if my opponent always chooses the worst for "me. Now if I have a choice here--it's my turn to move--I have a choice to go either way. We don't care what they are. So we'll say the utility of this state is 0 for me. we can call it the optimal strategy. that includes this nondeterministic component? One thing we want to be able to do is inject into here some deterministic numbers to say this is the sequence of "die rolls" I want to give you and then. When we did search. we just add it as an argument. And then if I say. and rolls is going to be an iterator of some list of numbers." and what am I looking for? Am I looking for the best path or the worst path? Well.com/CS212 Unit 5
5/11
. we didn't know what our first action was.
16. we're looking for the best path and we can describe that and once we've got that description we've got to search it outward. But we knew if we just specified how the domain works. in order for it to have the next apply to it. 3 or whatever you want. with the dependancy injection. and we could keep doing that. That gives me 48 points. Oops. And I want to also introduce here another idea called the quality. This is an example of a concept called Dependency Injection. and let's say ends up here. then I can describe all the paths to the end. 11 s Loading the Dice
Here's my answer. how can I test a function like this. forcing me to lose and allowing them to win. That means the utility of this state is going to be 1 because I know I can get 1 by taking the optimal strategy.Udacity Wiki
15.roll or hold--and I go in one direction and I get to a point where it's the dice's turn to roll. and now it's my opponents turn to move and my opponent makes a choice. here's a test that I can actually run. So we're at the end of the game. maybe 1. I've rolled eight 6s. 12 q Optimizing Strategy
So we've seen several different strategies and we've compared them and tried to find one that was better." The dice has no intentionality. The most obvious is we're dealing with probability. and so on. so we've got dice or whatever other random element there is. I think I misspoke there. then I can check if it's doing the right thing. with the goal being 50.

backing up the tree to say. I can go backwards and say. 14 q Game Theory
Now when you have a decision under uncertainty and there's an opponent. but you can ignore that.Udacity Wiki
holding would give us this utility. that was just hold and gamble. sort of the arithmetic function.com/CS212 Unit 5
6/11
. what's my--and given utility. Now you're given a decision. yeah. now we're saying what's the best_action in a particular state if you tell me what the available actions are.then from the dice we're going to average over all the possibilities. but only describes it if I have a utility function. what's the value of this start state? I can collect those values.5 times better than $1 million. Now my best_action tells me that what I should be doing is holding. we're finding our way from the start to the goal. So that describes the quality of the state. it tells me the best_action is gamble. you get nothing.that value of the amount of money that I currently have to say that I'm indifferent between the 2-. What's the value of a random dice roll? I know that. So the gamble is more. 15 q Break Even Point
So now I want to ask you a question. assuming my value of money is still logarithmic. if I hold. but the idea is the same. and there are many. what's the value of one of my moves? Oh! I know that. and then let the decision theory decide. now I'm starting to look at money as more closely linear again. So in any state. and we never have to program anything again because programming can be difficult. which is defined by the game.' there must be a point at which there is a crossover-. and what's the quality of holding from this state. And if I gamble. it's usually called decision under uncertainty. For the game we defined. It has to tell you one answer is right or wrong. So what I want you to tell me is: to the nearest million. 14 s Game Theory
So I predict that most people say they would hold. if it was the dice's turn--and let's say there are 6 outcomes. If roll has the highest quality from this state. so the utility of this state is 1/2. what the best action is for this state.. then my quality
wiki. What's the value of my opponents? I know that. There is other names. but to all the outcomes that were covered by all the possibility of the dice. First. That going from $100 to a $1 million is a big. So now we have a way--if we know the value of the end states. what's the quality of rolling from this state. Expected meaning average. So given this quality function Q. and you want to say. Going from $1 million to $3 million is a smaller jump than that. But that's only true if $3 million really is 3 times better than $1 million. I have to know how much do I like money? Well. what if I had $10 million already. then would I take the bet. and we'll see--in the game of pig--the start state for the first player has a little bit better than 50% chance just because they go first. what should I do? Should I keep the million or should I go for the 3 million? What I'm going to do is code up a model for this.that if I have C dollars.could be anything. under the identity function. For most people. however many dollars I have in my pocket-. 50% that. and I don't mind not gaining the additional $1 million. Here's the decision I'm going to give you. given that state action pair under the utility function? And that means that the Q had to deal with the averaging. So you analyze the outcome of this and you believe that this is a choice by the coin that has a 50% probability of each outcome.
22.
23. meaning if you have a certain amount of money. from the end state. But the amazing thing is is that we can completely generalize this. We can work out. utility is used with the abbreviation U and quality with abbreviation Q. and that doesn't mean that people are irrational in any way. That works for any possible domain that you can define.5 on average. since we saw that for some values of this state the best action-. and some utility for that. $3 million is 3 times better than $1 million. where we're only going to deal with 1 state.now we have a way of essentially searching backwards to say. and it did that. You're on a game show. And given that state. It's an amazing thing that we solved all the problems at once. then I can tell you what the best_action is. what is that crossover point C? That value of state-. I'm risking my $1 million. I should. and best_action tells me that yes. what state is worth to me that's going to tell me the value of that state action pair? And the actions available to me are holding and gambling. but we make this perfectly general. and that's more than 1. Now here's--the amazing thing is. and you get a prize of 1 million dollars or euros or whatever currency you want to use. the actions available are holding and gambling. I can also ask. the simplest choice for utility function is the identity function.
21. and then the optimal strategy is just the one that says choose the move which has the highest quality. So what's the average utility of each of the actions. And the state that I start with is. Say the identity function just takes any input x and returns x. even though arithmetically it's more. It's the input itself. and assuming that our utility function is the log function. You can keep the $1 million. And a 50% chance that I get nothing more than I have now. which stands for Expected Utility. So we haven't solved everything. then that's the move that we should do. and then there's some utility on that--how much do I value having what I have now plus 1 million. defined by when we win and lose and how many points each player gets for that-. If you are faced with that problem. So I'm going to define here a quality function that says. and why is that? Well. and the value of the state of having a million is a million. So the complications going to be more complex. We can also work out the quality for each of these moves. and you won. It's the maximum. so it's half of one and half of 0. what's the best_action starting from $100 in my pocket. so I'm willing to take that bet. it seems like less. If I have $10 million. and I've defined the average utility as the quality of that state. so if we just add in parameters. but valuing money with logarithmic function rather than with the identity or linear function. many problems which you can describe.54 utility for the player who goes first. or the host will flip a coin and you believe it to be a fair coin. but 3 of them lead to this state and 3 of them lead to this state-. So let's try again. what the quality of each state action pair is. There's some utility for that. well. I could say. There's some problems that don't fit into this type of formulation. Let's go ahead and solve it. That corresponds to my intuition. If there's no opponent. it's called game theory. It's the minimum. and let's say I start off with $100. and if you call it correctly you get $3 million. Now traditionally. what's my best_action? Then when I run that. I've already got $10. and now I'm going to ask. Rather it's something more like a logarithmic function.udacity. Now we could go all the way back. and I can have a utility for this start state. Similarly to the way in search where we had 1 best_search algorithm that could solve all of the search problems. It said. I just define a variable million because it's hard to see the number of 0's and count correctly. the value of money is not a linear function. the value of the state of having nothing is 0. my state is going to be increased by $1 million. I'm at this stage where logarithmic function is approximately linear locally. So let's look at an example of that first before we get back to the game of pig. Now that doesn't sound quite right to me. that's not true. or would you hold with 1 million? And there's no right or wrong answer to this despite what the interface has to do. Finally. I just want to collect some data on how many people think that they would gamble in that situation and how many people think they would hold. It's the average. It's a good bet because if I win. Let's solve this problem. So just as we did in search. and what it's going to be is the maximum over all the possible actions from the state. there's a 50% chance that I get 3 million more than I have now. in terms of what you can do. Now it doesn't mean that we're done.where the two values are approximately equal. We can do that with a random problems. I can just write out what the optimal strategy is. you don't get twice as much value out of having that money. I'm going to input the math module. That that's the right thing. Would you take the gamble--try to go for the 3 million. and so half of $3 million is 1. but it is amazing how much we can do with just this 1 short function. That's the value of gambling. I think it works out to about 54%-. If I've got $10 million.6/1/12
CS212 Unit 5 . but we have to find a way not to just 1 individual outcome. and what the utility is over states. Now this best_action function solves this particular problem.the action with the highest quality is 'hold' and other values is 'gamble. big jump. Instead what it means is that for people. assuming you had $100 to your name. if you double that money. given a state and an action. maximized by EU. and so we could say. but that's no big deal. rather you just get 1 increment more of having that money. 50% this. if I start with nothing. I get 3 or 0--that's 1. but which you can't solve in a feasible amount of time. but if you get it wrong. Let's go ahead and make that explicit.

we play a tournament with these strategies. we just take all the possibilities for the other die rolls. strategies that try to solve the problem in 4 chunks. So just tell me what your crossover point is.the best possible strategy for Pig. his probability of winning because either one player or the other has to win So our probability of winning is 1 minus our opponent's probability of winning. You wouldn't be able to hit the run button and do this because it would time out. I think the best thing to use is the probability of winning because we get 1 point for winning and no points for losing. And it turns out that if we increase
wiki. I just said.com/CS212 Unit 5
7/11
.
24. Here.and that's a two-value tuple-. the die gets to roll. Then I win automatically just by reaping those pending. but still it's nice to know that no strategy can do better. then we pig out. that should be worth 0 points. It seems like we've got a lot of work to do. and I'll let you finish it off. And so the probability of winning is our score. and it's our probability of winning from the result of rolling in that state. there is no right or wrong answer. don't write 1. look for the Q value of that action-.
26. The Q function we'll call Q_pig. What value of C to the nearest million is that true of? And just to make it easier for those who can't divide by e in your head.
25. so our utility would be 1 minus the opponent's utility-. And what is that? Well. if you prefer that--at which you'd be indifferent between accepting this gamble and holding? And put that in as an integer number. and that's going to be my probability of winning. it's more complicated.
28. We've defined the problem. Now. That's a good way of thinking about the game. then that should be worth 1 point. We just call best action from the current state using the Pig actions. and we have to assign utilities. It takes a state and an action and evaluates the quality of that against the utility function. so it gets a Pwin of 0. that's just folded in because rather than explicitly worrying about me and my opponent. and we still have our hold and roll functions that tell us what state we get to when we hold. Let's get started on that. I can roll or hold. I'm going to raise the value. so that takes care of the averaging-. but what's the number of dollars-. which is the same as probability of winning. So we need a Q and a U function. If we win none of the time. Tell me what the crossover point is to the nearest million. but actually. If we roll. So I'm going to define a set of strategies-. and you can write it in terms of the functions we've defined above and in terms of a call to best action. and divide by 6. 17 l Optimal Pig
Now let's get to work on defining an optimal pig strategy. I'm making the best choice by maximizing.and we see. then we're turning it over to our opponent.the clueless strategy we expect to do the worst. So we add them all up. and if I ask for the quality of C gambling versus hold with the log10 utility function-. the probability of scoring-. if we hold. And otherwise.and go ahead and write your code there. My probability of winning is 0 if your score is greater than the goal and I haven't won. my probability of winning is the probability that I get by taking the best action. What are the actions in this state? Well.wins 325. and he's going to do his best.--at least the number of expected wins-. I'm just going to roll. "What's the end point?" So remember.or euros. in fact. we've almost solved the whole thing.only wins 23 games out of 500. they're the same.try to maximize that. but if you bring it in to your own development environment. So you can see that the clueless strategy does very poorly-. using the quality function for Pig and trying to maximize the probability of winning. 15 s Break Even Point
And the answer is crossover C is 1 million. So this is a losing state. The max win strategy does the best of all-.and what about the worst choice that the opponent makes? Well. So our utility is just the probability of winning. you can do that. I can use That's the probability of the opponent winning. and then we have some end positions where the game is over.not that much worse off than the optimal strategy. And so our expected outcome is going to be somewhere between 0 and 1. and here's the results we get back. If we win all of the time. So we have to average them. We assign all of those.of winning is 1. And here we see max wins gets So only a couple percent better for max wins over hold at 20. And if the action wasn't hold or roll.among all the actions I can do. And otherwise.udacity. That's the only thing that makes sense to do. if there's some pending numbers. and that's six probabilities all together. which is either 0 or 1. and it's 1 minus our opponent's probability of winning because it's his turn. We'll call this function "max_wins"-. This is a winning state. and an actions function. and if they're not.the strategy function that maximizes the number of wins-. We've defined how the game works. we start out in the start position. Let's see how that works.from the current state according to the utility function-. Here. 16 q Whats your Crossover
Now I want to gather a little bit more data.Udacity Wiki
for gambling is the same or approximately the same as my value for holding. So.
27. 19 s Maxwins
And this is all it is. 19 p Maxwins
So now we're almost there. in 3 chunks in 2 chunks. and we're averaging-. I just want to do this sort of as a sociological experiment to see where people are.6/1/12
CS212 Unit 5 . if your crossover point is 1 million. So we said that we had 3 choice points.what he can do best. It gets Pwin of 1. and then the max win strategy. And that holds up if we play a tournament with more games just to get a little bit more accuracy. write 1 million here. and here.
30. there is no right or wrong answer even though the interface may tell you that your answer was right or wrong. and then all the other states that depend on these-.we've already figured that out in terms of the Q function. All we have to do is say. and that's like a probability. So hold at 20 wins 314-. we'll use the log10 logarithms rather than the natural logarithms so that the log10 logarithm of a million is 6.
29. but there's some competitors that are pretty close.we're summing them all up and dividing by 6. not the number of millions. and we're ready to write the optimal strategy-. "Well. So for all the actions-. And then it's our opponent's. 20 l Impressing Pig Scouts
Now let's see how we did. So that's saying I can make the best choice that I can. And what should we use for the utility function? Well. Again. again. So the probability of winning is 1 if my current score plus the pending is greater than or equal to goal. If we roll a 1. 18 l Pwin
Now what's the probability of winning from a state? It seems complicated. and to solve it all in one win.

34. So I start off by defining a bunch of states. And what it says is if we're at the end of the game. and pending. but if I won by a lot-. and then it comes down here.then maybe this guy would take notice." And given a state. and I made a mistake in naming them max_diffs and win_diff. maybe your utility would be to maximize the differential. If it's roll. I think these function names are too similar. In other languages. does that number equal to hold?" No. otherwise-. if somebody's won. it doesn't work. watching the game with excitement. You can do it in terms of what we've defined before
32. "If I just won the game by a couple points. but the annoying thing was that mistake went unnoticed for a while. and play_pig just says. I want to count up. roll. because they're recursive. and I'm going to reap the pending-. and try to go for the maximizing your differential. you. there's a scout from the NPA--the National Pig Association. there's only 2 actions you could do. you don't have that protection. and I'll show you part of the problem. maybe 1 of them is going to be more aggressive or taking more chances than the other. The actions and the Q function are just like before.
36. Otherwise. and there we have
wiki.
31. and when I was playing with this I put in the utility function where I meant to put in the strategy function. I want to see how many of each do we have. and then I go through all the states. "Well. then before.com/CS212 Unit 5
8/11
. so probably most of the time. and it applies the appropriate strategy function to a state. how do these 2 compare? When are they different. You'd get an error message before you ran it.it's returning a number rather than an action-. you should maximize the probability of winning. The question is. Here's the play_pig function. they're recalling themselves over and over again. You type something in. These have to be strategy functions. I called this utility function "The Winning Differential.if I really clobbered my opponent-. and the way the game is defined. And what you want to do is not just win the game-. So then I'm just going to assume that you meant roll. Probably I should have come up with better names. we defined our utility as a probability of winning. I have the same problem over and over again. you lose. to say.well. Let's see if we can analyze that. Otherwise. it tells me what that differential is-expected differential for that state. and I'm just going to look from 1 player's point of view. nobody is going to notice.
35. That makes sense. remember that our utility function was 0 or 1. that's really the only sensible one. Maybe you're in a big Pig tournament. and that would be worth more to me. and we tried out the logarithmic utility. etc. and that's fine as long as I know how to correct them. This was the utility function over states. it's not. They sound too much alike. then we do the hold action. I should have called this win_diff_utility or something like that to make it clear that this is the utility function and this is the strategy function. so you've got to build in the protection yourself. what's the utility function going to return? Well. and the program where you accidentally used a utility function where you expected a strategy function-. 21 p Maximizing Differential
So here I've written the utility function. that's one of the complaints that people have about Python is that it's too easy to make that mistake because you don't have to declare for each function what are its inputs and what are its outputs. hold. we just do the same thing with a Q function that we did before. and it says if that strategy function decides to hold.
33.that program wouldn't even compile. So if you make an illegal move. so we memoize them so we only have to each date computation once.because lots of people are going to win the games-. 22 l Being Careful
Now. and in the stands are lots of spectators. but here utility function is my score-. And note that we're always careful to memoize these functions. 23 p Legal Actions
So what I'd like you to do is update the play_pig function so that it looks at the result that comes back from the strategy function. 21 s Maximizing Differential
And the answer is you call best action from the state.6/1/12
CS212 Unit 5 . and that's an easy mistake to make because they sound the same. We don't want to repeat those computations. 24 l Using Tools
So now let's go back and analyze this maximize differential strategy versus the maximizing probability of winning strategy. then that's all good down here. I added a variable called "action" to hold the result of the strategies. and when are they the same? If you're trying to impress the scouts. Now. collect all those states.000 of them. What about here? Well. and I increment the count for a result for the tuple of the action that's taken by max_wins and the action that's taken by max_diffs. it's going to return a number." So you'd give up on the goal of just winning. I know you guys are. but the utility function we're trying to maximize is the differential. we hold. We tried out the linear utility. what I want you to do is write the strategy function. rolling the dice. If instead of passing in a strategy function you accidentally pass in a utility function. This is going to be hold. If you're trying to win the game. And you know that somewhere in the stands. right? Now here's the problem.which is me. you'd expect the 2 strategies to agree. I keep making mistakes. It turns out that there's 35. but some of the time. and make sure that it's either hold or roll and if it's not one of those.but you really want to get the attention of that NPA scout so that you can move on and have a professional career. No numbers are equal to hold. So maybe what your utility function would be would not just be to win the game.went completely unnoticed. roll. we roll.Udacity Wiki
the goal and made a longer game than just playing to 40 points-.minus your score. you lose the game right there. Then I define a variable r to be a default dictionary. so for all these values of me. But maybe your only goal isn't just to maximize the probability of winning. and I typed the wrong function at one point. 23 s Legal Actions
Here's how I did it--makes the function just a little more complicated. but I'm making mistakes all the time. and let's convert r back to standard dict. You say strategies of P apply to state. in general. now I want you to write the strategy function. In the betting game. we had different utility functions. It takes two strategies: A and B. you would do that. which means that the other strategy wins. Now.that the advantage for max wins over any of these other strategies would only increase. If it's hold. which counts up integers. and your seated at the Pig table. and instead what my strategy was-the utility function that returns a number acted as if it was a strategy function that always said roll. I want to say right here that I made a mistake. so I'll do the roll action.udacity. then let's decide that what we do is that that strategy automatically loses a game. and that's really applying a utility function to state. In Python. so it starts at 0. and I haven't talked about this very much over the course of these lectures. It doesn't really matter to have both since it's symmetric. Here the mistake I made is--the mechanical mistake was I messed up. you're not going to be making some crazy moves. And so the fact that I passed in a completely wrong function that's doing nothing related to strategy-.

how many times did max_wins decide to roll versus how many times did max_diffs decide to roll? And just to consider the ones in which the 2 differ. and in some languages. and then we got into things like the utility function and the quality function. some problems are so complex that it would take forever to do that. first it tells us that there is an interesting distinction about how we wrote our function to maximize differential. play_pig by itself--the top level function we define--that's not going to help us. so I've got to keep rolling. so that's worth doing. but 10 times more often. If he's trying to maximize the probability of winning. the 2 strategies agree. I'll sacrifice winning in order to maximize the differential. it's all about building the tower. let's group the states in terms of the number of pending points in that state." So look what the story told us.
38. and max_diff not at all. both strategies agree that hold is the right thing to do. and that's the power of breaking up these aspects into what's happening versus maximizing. at the top. and yes. 301 states all together and they differ on 3975 + 381. they will be helpful. What's the probability of having 2 boys given that there is at least 1 boy in the family? And the universe of possibilities is only the families that have exactly 2 children. 25 l Telling A Story
So here's what I did. and. I still want to win the game. Why could that be? I think I might know the answer. but computers are much better at it than people are. I was able to do that because I had the tools around. hey. and then Event A might be the probability of having a boy or a girl. he might say. and here's my idea. So the results that came back were surprising to me because I didn't really understand how the what and the how interacted. they differ. The story I want to tell is. It's rolling more often. and say he's accumulated 30 points. it's a good design and strategy to say let's just build up components along the way so that we--yes. we see it's max_wins who's willing to roll. it's all a loss. and we can still call that function. I thought that was going to be more aggressive. but it's worth it for that small chance of winning. but we can also go out in other directions. if I can get 30 points rather than 1. even though I didn't understand what that program is actually doing. So Event B might be the event of the family having exactly 2 children. But for now we're only interested in the children that live in those houses. I expected maximize differential to be aggressive. 26 q Simulation vs Enumeration
So we've talked about some probability problems that we can handle with simulation--that is. So there's 35. now we're trying to analyze the situation to understand this story of why are these 2 different? Well. and we have things like the dice and the score. most of the time. It does tell a story. in real life. So it wasn't just that I built a monolithic program that could do 1 thing. over all the states. and one assumption we can make is that it's exactly 50% probable that you get a boy and 50% probable that you get a girl and that one birth is independent from another. "Even though I'm risking 24. and he's willing to roll. max_wins says hold and max_diffs says roll. the opponent's going to win on the next move." Max_diff says. it's common and in many languages. So we could put that here in the condition as well-.000--both strategies agree that roll is the right thing to do. you get a close enough representation. I thought it was going to be rolling trying to rack up a really big score. It must be because the opponent is just about to win and he says. and hoping that they're representative of the problem. So sometimes. But I still need an idea. Maximize differential. if you're in such and such a position. and then we want to be able to ask questions of them. It's not trying to rack up a really big score. I wrote this little function to tell a story. we're going to write specific rules for pig to say. we're only interested in the houses that have exactly 2 children. right? So that's the one that's trying to impress the scouts. where we actually go over all the possibilities and we can compute an exact probability.700 out of the 35. Then another 1200 times. So throw away all of the states in which they take the same action. and yes. Here's the result. The second part that's interesting here is that I was able to do exploration. to try to rack up the big points.Udacity Wiki
it. So we didn't go in and say. Look what's happening. but at the bottom. that cuts the differential way down. I thought that this man in the arena who was playing the max_diff strategy was going to impress the NPA scouts by playing aggressively. figure out what pending is and increment the pending count for the person who decided to roll. but I don't know yet.
37. So I start off--I have a default dictionary and the default is that I have 2 values-. we're building from the ground up. That happened 381 times. But in Python.com/CS212 Unit 5
9/11
. The maximize differential strategy would say. So we're going to not consider some of them. Here's what I'm trying to do. and so it's a powerful strategy. He says. So what is the probability of Event A given Event B? And an event is just a state of affairs. you don't care if you lose by 1 or if you lose by 40. I don't have that good of a chance of winning. This one has a zebra. and we don't even quite know what questions to ask. so we built all these up. so that's a perfect segregation between the 2 in this crossover point between 13 and 16. When I wanted to understand something that was different from what I originally did.at least 1 boy and 2 children total. it's max_wins that says roll and max_diff that says hold. and then for each of those number of pending points. What do I mean by that? Well. and then I just go and print them out. we can do enumeration if we make certain assumptions. and so on. then we can quickly assemble pieces from down here and build something that can address that. and they can scale up to more complex ones. It's that I build a set of tools that could explore the area. It makes it easy to explore. we have all these useful tools. That actually surprised me. and these houses have differeing poperties. and that was the perfectly general best actions. Let me just describe briefly what I do. When you're done. And so we've crossed off these houses. he would keep on rolling. say all the states for which there are 10 points pending. We're here. well--say he's behind 39-0 in a game to 40.Then I go through--get the 2 strategy functions to apply to the state. So what's going on? Well. Let's see what that looks like.udacity. So I've got all the pieces available. What do you think this probability is equal to? Let's put your answer here and enter it in the form of
wiki. But max_diff is never willing to do that. if you're maximizing the probability of winning. I'm going to reap them right now. That's what the maximize win probability strategy would do. but when they differ. but all these little tools that we built down here. And if you choose enough. So let's imagine all the possible families living in houses. But as the number of points increase--the number of pending points increase.6/1/12
CS212 Unit 5 . it's the max_diff strategy that's deciding to roll. that's all you have. If they're different. Max_wins is rolling all the time with very high pending amounts. and that's 12% of the states that they differ on. So now when we're not just about playing pig. In mathematics. I thought it was going to be the max_diffs strategy. Rather we just said. But in 2 cases. Well. We'll show you some simple examples. so I've got to roll. we have a play_pig function. I may lose. which is the rules of the game for how pig is played. So what does that tell us. Now. This one is colored red and has an Englishman. We can start to put them together and explore. but I was still able to write a program that maximized the differential. So most of the time. So remember we always start our design with an inventory of concepts. but I'm really going to cut down that differential. We're going to constrain ourselves to ask conditional probability questions. we have the tower. When I have a small number of points pending. first it might be nice just to quantify how different they are since I kind of asked that question. An alternative strategy is enumeration. Now that's a suggestion of a story. but all that counts is winning. But no! So the data tells a different story. So it's the max_wins strategy that's really more aggressive. So we built this tower. And we can ask probability questions. You do the rest. and it turned out that the story was completely different. Note that the way we wrote it is we completely separated the what. and we're only taking these other ones. I think it might be that the maximized differential is more willing to lose rather than more excited about winning by a lot. 29. and consider the other ones that have exactly 2 children. If I stop now. So let's address the question. do such and such. If we're interested--not just in playing pig--but we're interested in figuring out this story. we could do simulation by going out and polling and asking people. Is that the right story? Let's find out. in fact. and I found out that it was actually maximizing the probability of winning that was more aggressive that rolled more often. well. here's how pig works. from the how of how does it make decisions. The maximized differential--if he's losing by a fair amount. some samples. So what must be happening here is 300 times max_wins has 24 pending points. And yes. but it's here that some of our design choices start to pay off. choosing using a random number generator. "Are you crazy? I got 24 points on the board. So what's the story? Where do those 12% of the states come from? We still don't know. and the tower built up to define the play pig function. Probably I'll pig out and only get 1 point. Now.

born on Tuesday--is 13/27. put 11 and 17. You can go through and you can make sure that that's correct.we're interested in boys born on Tuesday. all the way through to the last one: girl born on Saturday. given at least 1 boy-. given (e). So we'll have like the string. we draw 2 circles: one of the right-hand side of the event-.the predicate that you care about and whether you want the results to be verbose or not. it's this huge thing of (2 X 7 X 2 X 7) entries. you wouldn't be in a computer class. And here's the reason--at least 1 boy. as part of this. then." So we have the technology to model that. That says what's the probability of (p). And secondly. Wednesday. why would that be any different than any other day? So is it 1/3? Well. that wouldn't make any difference.a function. where "BT" appears is in the string.it should still be 1/3 because why does Tuesday matter? After all. so the answer is one-third. And so now it should be clear: 7. 7 elements over here. but it's much closer to 1/2 than it is to 1/3. So just having the birthday there really changed things a lot. I'm going to ask you what you think it is. So Enter as a fraction. "Let us calculate. So there's going to be some others over here where there's.how many out of the event satisfy the predicate. So what do we have? Well. First. a random variable for day of the week-_ and I had to fool around with the capitalization there.and then I just said: get all results. 3.
40. and then these individual sets of circles are called events--like the event of having (2 boys). 2 boys--that's here--so it's 13 out of the 27. and then we count up out of those how many appear on this side. 1. 26 s Simulation vs Enumeration
Here's the way to look at it is we count up the number of equally probable events on this side. given that there's one boy born in December so I threw that in as well. likewise. as a string.and. that's not what we're doing. 'GG'. you might think that the answer should be the same-. I don't have the right visualization. I searched for--and found--a new class. and of these. Both the drawing it out with a pen and the computing worked out to the same answer. and we can represent events two ways: as a collection of strings or as a predicate-. there's also 7 but now I've double-counted because in one of these 14 cases is a boy born on Tuesday. and I'd like to define my report function so that it gives me that intuition but right now. within the fractions module.I just strung the possibilities together. called fractions. But. given this event of at least 1 boy born on Tuesday? Well. it comes out: 13/27. and this one.
41. probable. That would be True here. But we're not even considering them.33333. What does that look like? Well. So either "BT" will be the first 2 characters or the last 2 characters. it's not 1/3.udacity. How did that happen? Well. We're just considering the ones that match here. called fraction. put 1 and then 2. So here's my random variable. I didn't want to see that it's . or the event of having at least 1 boy. 28 q Tuesday
Now let's move on to a slightly more complicated question: out of all the families with two kids-. 28 s Tuesday
If I go ahead and execute this and print the result. which itertools. So there's 7 elements of the sample state there.plus ample space of two_kids_bday. there's 7 of these by the same argument we used in the other case. Then I said we can combine random variables with a cartesian product--and I used itertools.product produces tuples. knowing the boy born on Tuesday shouldn't make any difference? I think the answer is because we're associating that fact with an individual boy.so that's got to be the right answer. I get this collection. Now why is it that we have a strong intuition that. as Gottfried Leibniz said. given one_boy--and the answer is 1/3. and predicate. we should draw this state as either a boy born on Tuesday. given 1 boy is 1/3. So a sample point is BG. and they know how to do arithmetic. And like before. and born in December is 23/47. 21. If we did that. That's saying that's True when the count of the number of boys is equal to (2). Then a boy born on Tuesday is all the elements of this. And here's the output I get: 2 boys.with at least 1 boy. and so the True elements are just the ones for which it is True. Thursday.and of these.product-. You can give it a bunch of cases that you care about-. you can't really argue with that. just one. Wow! Where did that come from? So that' surprising--first of all. GB or BB. And so we're going to do our concept inventory. 27. You could follow along. And now we're finally at the point where we can say: given at least 1 boy_tuesday. say. And so. by the way. these individual results here come from random variables. followed by another boy or a boy.
42. The first one: Boy born on Sunday. not just in boys-. So a random variable is like the first child born. the second kid. Then I can ask. So that's the result. the kid's gotta be born sometime and if it happens to be Tuesday. in this situation.G' but I just put them together. The whole universe is called the sample space. and born on Tuesday is 13/27. 2. I also looked at the question of what's the probability of two_boys. here's what I see: The probability of 2 boys. and then what's the probability of 2 boys. And an event consists. So here's what I did: I imported itertools because we're going to need that.com/CS212 Unit 5
10/11
. along with some other partner-. And one_boy is just the points in two_kids that have at least one boy in the string. I wrote up a little function here to report my findings. And then I can define two_boys as a predicate. 6. has 27 elements--and there they are-. Seems hard to argue with. put a 1 here and a 3 here--or whatever. So if I evaluate that. which is true of certain strings and not of others. and there's a function or a constructor.6/1/12
CS212 Unit 5 . If you think it's 11-17ths. to make sure that we have 7 distinct letters: Sunday. followed by a boy born on Tuesday. boy born on Sunday. If you think it's 1/3. It's not quite intuitive yet.what's the probability of two boys? Now. In terms of representation. Monday. one kid with their day of birth. and you can look at the other elements of the sample space and say no. followed by a boy born on Tuesday. which can be a boy or a girl. Friday. where (e) is an event specified as a list of sample points. I wanted to see that it was exactly 1/3. 13 are 2 boys--and there they are. this one. And how many of those are there? Well. we didn't miss any-. And the reason I wanted that is because when the answer is 1/3. I can define my conditional probability. I could have said set of 'BG' or the list of 'B. we're throwing all those out. So if you think it's one-half. 14. 27 l Conditional Probability
Now if you could do all of these just by writing with a pen. either one of the 2 can be a boy born on Tuesday.there's only 1 way to do that--but there's 7 ways for the girls to be born. born on a Tuesday-. So I'll just count 6 here. which you might have thought should be the answer if you believe the argument that Tuesday doesn't matter. with their day of birth. what's the probability of two_boys? And before I show the results. and I want to look at them as strings.maybe a boy born on Saturday. which produces an exact fraction. So now two_kids is the product of two children and we're looking at their sex. We're
wiki. of a collection of sample points. not only is it not 1/3.of the conditional probability. And so how many of those are there? Well. Tuesday. So it would be this one. There's 27 on the right-hand side. girl born on Saturday. So I've got to do some of the work myself. So that's what we expected. born on Tuesday. and then I returned a fraction-. And finally. paired together. So really. I represent that as a collection of possibilities in here-.Udacity Wiki
a fraction. So let's start modeling this. all equal. And it just prints out some information-. Now how many elements over here? Well here. we're going to just represent sample points as strings. what's the conditional probability of two_boys. I can turn on the verbose option to report In that case. So all a fraction is is a numerator and a denominator. boy born on Wednesday. and here's its arguments.
39. either with pencil and paper or do the computation or just think it out in your head. Now. there's 7 possibilities here because the boy has to be born on Tuesday-. is a predicate that returns True or False of elements of that event. And here's what I came up with: We still have the four possibilities that we showed before but now we're interested. Saturday-. We're like taking that fact and nailing it on to him--and it's true.

making repeated random choices. You only have to tell the computer what the situation is. Python. we can handle uncertainty in our Search. on your own. at least. that was a lot to cram into one Unit.6/1/12
CS212 Unit 5 . Now. And we just don't have very good intuitions about what it means to say something about a pair of people. And we learned that if the total number of possibilities is small. So if you followed along all of that-. you can just enumerate them. we looked at the notion of a wrapper function. to deal with that because Python doesn't give you the seatbelts that other languages have.Udacity Wiki
not saying anything about any individual boy. but about pairs.not about boys. It gives us the best-action function with which we can solve any problem that can be specified in the form that best-action expects.and that you have to be careful. we'll see you in the next Unit. If we did that. and just counting up in how many one answer occurs. over Exact Certain domains.udacity. You can count them all. And we learned--or. We learned that the notion of Utility gives us a powerful and beautiful general approach to solving the Search problems. as an exact fraction rather than an approximation. that was a completely different question than what I'd designed the PIG program for. Here. how we inject functionality into an existing function.com/CS212 Unit 5
11/11
. And there are more advanced techniques for dealing with approximations to that. it was easy to do the exploration and come to an understanding. like we did Search in the previous Unit. And we learned you can deal with probability through simulation. Rather. When we were trying to figure out how to add printing to our game.and that's why the answer comes out to 13/27. We learned that probability is a powerful tool for tackling problems with uncertainty. for the work you've done. But it's incredibly powerful because it separates out the How versus the What. some of them are so complex that they can't be computed in a feasible amount of time. You've learned a lot. where we take the aspect of printing out what's happening and keep that separate from the main logic of the program. rather than about an individual person. When I was looking at the two strategies for playing PIG and where they differed. I learned because I was the one who made the mistake-. We learned that we can do Search with uncertainty. And we learned some general strategies that don't have to do with probability. we're making this assertion that at least one was born on Tuesday-.
last edited 2012-05-21 03:04:24 b y Ed Grochowski
wiki. to protect yourself from those type of errors. and you can get an exact answer. You don't have to tell it how to find the best answer. So you have to be vigilant. particularly in the types of arguments and results that functions expect in return-. Because I had put together the right pieces. We learned that you can do exploratory data analysis. and it automatically finds the best answer.congratulations. And this is an example of aspect-oriented programming. Have fun with the homework. and that's a wide variety of problems. the computation wouldn't change. and that's what we did here-. versus another.that errors can pop up. And finally. That is. 29 l Summary
So let's summarize what we did in this Unit. by sneaking it in on top of one of the arguments.
43.