You are on page 1of 11

Probability and Random Variables

We define the probability of an event A to be its long-run frequency. Example: The Truth about Cats and Dogs . Suppose we conduct a very large survey of EMBA students and their state of dog/cat ownership. The results of one such study are summarized in the 22 contingency table shown below. 1 The position of each cell indicates the particular state, and the percentage in the cell indicates the fraction of EMBAs who fell in that state. What must these four percentages add up to? Cats Dogs 7.5% No Cats 41%

No Dogs



To simplify the notation, let C = event an EMBA owns a Cat, D = event an EMBA owns a Dog. 1. What is the probability than an EMBA owns a cat (= Pr(C))? (Ans. 19%) 2. What is the probability than an EMBA owns a dog (= Pr(D))? (Ans. 48.5%) 3. What is the probability that a person owns both a cat and a dog (= Pr(CD))? (Note: the symbol means and so C D means cats and dogs) (Ans. 7.5%) 4. Given the person is a cat owner, what is the probability that they own a dog? (Ans. 7.5/[7.5+11.5] = 39.5%) 5. Given the person is a dog owner, what is the probability that they own a cat? (Ans. 7.5/[7.5 + 41] = 15.5%) The first probability is called the marginal probability of owning a cat (=19%). The second is called the marginal probability of owning a dog (= 48.5%). The third probability is called the joint probability of owning a dog and a cat because it depends on two random events occurring jointly (dog ownership and cat ownership). In general, marginal probabilities capture the probability of one random event (e.g., cat ownership) without reference to any other random event (e.g., dog ownership). In contrast, a joint probability captures the likelihood of two (or more) random events occurring jointly. The fourth and fifth probabilities are called conditional probabilities (or posterior probabilities) because they are based on or conditioned on some other set of information. A conditional probability can be thought of as the relative probability of something happening restricted to a particular subset of possibilities. For example, the probability of owning a dog given the person is a cat owner, denoted by Pr(D|C), is the relative percentage of dog owners among cat owners (the percentage of Ds out of the Cs). You could probably figure this out by brute force; its 7.5/ [7.5+11.5] = 39.5%. The general formula is Pr(D|C) = Pr(DC)/Pr(C) = 7.5/[7.5+11.5] = 39.5%. Similarly, the probability of owning a cat given the person owns a dog is Pr(C|D) = Pr(CD)/Pr(D). This is the relative percentage of cat owners among dog owners (the percentage of

For your amusement, these are the actual percentages based on informal polls of previous SMU EMBA students. 1

Copyright John Semple 2007

Cs out of the Ds). Using either the formula or brute force, you can calculate this to be 7.5/[7.5 + 41] = 15.5%. Conditional probabilities are important tools in marketing, especially when you are trying to identify consumers who are more apt to buy a product. Example: AIDS Testing (From Chapter 6 of Statistics for Management and Economics, by G. Keller and B. Warrack, Duxbury Press, 1997). We are given a test for the AIDS virus that is always right if a person is truly infected but gives a false positive .5% of the time (probability =.005) for non-infected individuals. If 5% of the general population is infected, what is the probability that a person receiving a positive test result really is infected?

The Concept of Independence

In English, people say two things are independent to mean they have nothing to do with one another. This means the same thing in statistics. But how do statisticians formalize this idea? Recall that 19% (= Pr(C)) of EMBAs own cats, and 48.5% (= Pr(D)) of EMBAs own dogs. If cat ownership and dog ownership are independent of each other, then we would expect 19% of dog owners to also own cats (or 48.5% of cat owners to also own dogs). In other words, the frequency of one random event (cat ownership) is unaffected by knowledge of the other (dog ownership). This implies the percentage of people who own both dogs and cats (= Pr(CD)) should be .19 . 485 (= Pr(D) Pr(C)) = .092 = 9.2%. In other words, independence in statistics means a particular mathematical condition must hold. This notion is formalized in the following definition. Definition. We say two events C and D are independent if and only if Pr(CD) = Pr(D) Pr(C). This is not a formula but rather a condition that we must check if we want to claim two events are independent. If the condition is true, then the events are independent. If not, they are dependent. For example, are dog ownership and cat ownership independent in our previous example? Lets check. Pr(CD) = 7.5%, and Pr(D) Pr(C) = .19 .485 = 9.2%. These percentages are not equal, so dog ownership and cat ownership are not independent.

The Distribution of a Random Variable

Copyright John Semple 2007 2

In this course, 95% of our efforts will be devoted to events that have numeric outcomes (e.g., the return on an asset, the selling price of a house, the monthly demand for a product or service, etc.). Probabilistic events that have numerical outcomes are called random variables. To describe a random variable completely, we need to know two things: (1) each possible outcome; and (2) the probability of each outcome. If we have a complete description of both, then we say we know the random variables distribution. There are two general types of random variables (or distributions) encountered in this course: discrete and continuous. A discrete random variable is one whose outcomes can be listed (like the roll of a die). A continuous random variable is one whose outcomes are so numerous they cannot be listed. An example of a continuous random variable would be the time elapsed between customers entering a retail establishment. If measured to infinitesimal accuracy, one could not list all of the possibilities. However, if we only measured elapsed times to the nearest second (or minute), then the distribution of elapsed times would be discrete. In practice, continuous distributions are often used as approximations to discrete distributions in instances where the number of possible outcomes is so large that a continuous distribution makes the analysis easier. Example: A Discrete Distribution. Define a random variable whose value is the sum of the dots obtained from rolling a pair of dice. Construct the probability distribution for this random variable. Outcomes Probabilities

We frequently summarize information for a discrete random variable by means of its probability histogram. The probability histogram is simply a visual display of the outcomes (plotted along the x-axis), and their associated probabilities, which are represented by bars (graphed along the y-axis).

Histogram for Sum of Dice Roll

0.175 0.15 Probability 0.125 0.1 0.075 0.05 0.025 0 2 3 4 5 6 7 8 9 10 11 Sum 12

Measures of a Distribution: Expectation and Variance (Discrete Case)

Copyright John Semple 2007 3

The expected value or mean of a discrete random variable (r.v. for short) X is denoted by E(X) or and given by the formula
= E ( X ) = xi pi .

The expected value is the theoretical average obtained by weighting each outcome by its respective probability and then summing. For the sum of the dice we have Possible values ( x i ) 2 3 4 5 6 7 8 9 10 11 12 Probability ( p i ) 1/36 2/36 3/36 4/36 5/36 6/36 5/36 4/36 3/36 2/36 1/36 Product ( x i p i ) 2/36 6/36 12/36 20/36 30/36 42/36 40/36 36/36 30/36 22/36 12/36 252/36 = 7

Sum of third column

( = = E( X ) )

Plot the value 7 on the probability histogram. Observe that E ( X ) is a measure of centrality. Example. You sell big electric motors. During a given week, demand for your 100-hp motor is 0, 1, or 4 (4 come on a palette). The distribution is described below. Demand Probability 0 .45 1 .40 4 .15

What is your expected demand for a week? E ( X ) =(0)(.45)+(1)(.4)+(4)(.15) = 1.00 Another measure of interest is the expected value of the expression ( X E ( X )) 2 , called the variance of X, and given by the formula
Var ( X ) = E ( X ( E ( X )) 2 = ( x i E ( X )) 2 p i .

The variance also goes by the Greek letter . The formula looks bad, but a few simple examples will clarify its calculation and help us understand what it tells us. Note that the mean of X is needed before computing the variance. Recall that p i is the probability that X takes on the value x i .

Example. Step 1. Step 2.

Calculate the variance of demand for the previous motor problem. What is E(X)? From a previous calculation, it is 1.00. List all outcomes for ( X E ( X )) 2 , their associated probabilities, and the products.

Copyright John Semple 2007

Possible Outcomes for ( X E ( X )) 2 (0-1)2 = 1 (1-1)2 = 0 (4-1)2 = 9 Step 3. Sum the products

Probability ( pi ) .45 .40 .15

Product ( x i ) 2 pi 1 .45 = .45 0 .40 = .0 9 .15 = 1.35 Sum = 2 =Var(X) = 1.80

The variance of a random variable is a measure of its dispersion. There are other rules for calculating means and variances for constant multiples and translations of random variables. These can help save time. I will demonstrate these using Excel on the motor example above. You may want to follow along on your laptops. The rules are summarized below. Some Rules for Means and Variances 1. If X is a random variable with mean E(X) and variance Var(X), then for any constant k, kX is a (new) random variable with mean kE ( X ) and variance k 2Var ( X ) . 2. If X is a random variable with mean E(X) and variance Var(X), then for any constant k, k + X is a (new) random variable with mean k + E ( X ) and variance Var(X). Example: Take the previous motor problem. Calculate the mean and variance of 3X and 2+7X directly and by using the formulas above.

The variance, as a measure of dispersion, is hard to interpret. An easier measure to interpret is the standard deviation, which is the square root of the variance. For the random variable in the preceding example, the standard deviation is denoted by the Greek letter = 1.80 2. The standard deviation is a measure that helps us determine which outcomes typically occur (probabilistically speaking). For most distributions encountered in practice, about 68% of all outcomes typically occur within one standard deviation ( 1 ) of the mean; about 95% of all outcomes typically occur within two standard deviations ( 2 ) of the mean. Virtually 100% of all outcomes typically occur within three standard deviations ( 3 ) of the mean.

Means and Variances: Applications to Business


Note that this is consistent with using

for the variance. 5

Copyright John Semple 2007

Example: Inventory/Sales Analysis. The file CigaretteSales.xls describes the distribution of daily sales for a particular brand of cigarettes (this was based on data from a convenience store in Dallas, TX). The store currently stocks 100 packs per day. Suppose the store intends to implement a new policy of stocking only 50 packs per day. (a) What is the probability of running out of cigarettes? (You will need to use Excel) (b) What is the expected number of sales (per day)? (You will need to use Excel) (c) What is the expected number of lost sales? (You will need to use Excel) Example: Risk Perception. In a recent article, Cross-cultural Differences in Risk Perception, but Cross-cultural Similarities in Attitudes Towards Perceived Risk (Management Science, 44(9), pp. 1205-1217, 1998), E.U. Weber and C. Hsee investigate cultural differences in risk preferences. They surveyed individuals from four countries: America, Germany, Poland and China. They posed the following 12 investment options to each individual, all re-stated in US dollar equivalents. Each respondent was told they had $20,000 to invest of their own money. Note: EV is the expected value of the option, SD is the standard deviation ( ) of the option.

Copyright John Semple 2007

Investment Option 1 2 3 4 5 6 7 8 9 10 11 12

Outcome 1 $3500 400 1700 1250 2600 9300 4700 1000 900 350 4600 17200

P1 .79 .56 .01 .56 .11 .11 .01 .79 .11 .79 .56 .01

Outcome 2 -$5300 -150 800 -450 950 3400 2300 -1400 350 -400 -1700 8300

P2 .20 .38 .20 .38 .44 .44 .20 .20 .44 .20 .38 .20

Outcome 3 -$16000 -750 -50 -2200 -650 -2400 -120 -4800 -200 -1600 -8100 -450

P3 .01 .06 .79 .06 .45 .45 .79 .01 .45 .01 .06 .79

$1544 122 137 397 411 1439 412 462 163 180 1444 1476

$3937 342 374 1073 1077 4022 1058 1094 367 349 3847 3836

Respondents were asked to indicate their maximum willingness to pay (WTP) for each option. This payment is like the ante in a poker game, i.e., you have to pay to play. For example, a respondent might be willing to pay a maximum of $750 for investment option 1, which has an expected payoff of $1544. On average, this respondent would come out ahead by $1544 - $750 = $794. For the 12 options listed above, the following average WTPs were observed: Nationality American German Polish Chinese WTP $320 315 352 487

One can check that the average EV of the 12 options is $682. What does this say about risk preferences by nationality?

Discrete Distributions: The Binomial Distribution

A common discrete distribution occurring in practice is the Binomial distribution. This distribution can be motivated and developed through an actual example. Example (From: Brian Downs, Aspen Technologies, greatly modified). You are managing a company that sells shares in corporate jets. You have sold 3 shares to clients, and each share entitles a client to 1/5 of the aircrafts flight time. If the probability that a client wants to use the aircraft on any given day is 1/5 and requests are independent of one another, what is the probability that exactly 2 clients request jet service on a particular day? Answer: Let Y = client requests jet service, N = client does not request jet service on a given day. The possible outcomes and their probabilities can be represented by the following tree.
Copyright John Semple 2007 7

Client 1

Client 2

Client 3 Y

Total Ys 3 (YYY)

Probability (1/5)3 Client-level responses.

Y N Y Y N N Y Y N N Y N N 0 (NNN) (4/5)3 1 1 (NYN) (NNY) (1/5)1(4/5)2 (1/5)1(4/5)2 1 2 (YNN) (NYY) (1/5)1(4/5)2 (1/5)2(4/5)1 2 2 (YYN) (YNY) (1/5)2(4/5)1 (1/5)2(4/5)1

Observe that there are three different combinations of client-level responses (Ys and Ns) that result in exactly 2 clients requesting service. Each of these client-level responses has probability (1/5)2(4/5)1. The probability of exactly 2 clients requesting jet service is therefore 3(1/5) 2(4/5)1 = . 096. Problems involving the binomial distribution are essentially coin-flipping experiments. A coin is flipped n times and we want to know the probability of getting k heads. The only significant difference in a binomial problem is that the probability of getting heads on any flip need not be 50%, as it is for a fair coin. The rest is largely language and notation. In a binomial problem, the coin flips are called trials, and n is used to represent the number of trials (flips). In the jet problem, n=3. Instead of the outcome of a trial being a head or a tail, they are labeled a success or a failure. In the jet problem, it seems reasonable to label a request for jet service as a success and the absence of a request as a failure. In a binomial problem, the probability of success is denoted by p and it must be constant across all trials. In the jet problem, the probability of requesting service is p = 1/5, and it is indeed constant for all clients. In a binomial problem, the trial results must be independent of one another. In the jet problem, this means the clients requests are unrelated to one another, which seems fairly plausible. Finally, in binomial problems one typically wants the probability of observing a certain number of successes, called k, out of n trials. In our jet problem, we wanted the probability of exactly k=2 successes out of n=3 trials where the probability of success is p=1/5 on each (independent) trial. We calculated this to be .096.

Using Excel to Calculate Binomial Probabilities

Copyright John Semple 2007 8

Once you identify the particular binomial distribution in your problem, you can calculate binomial probabilities using Excels built in binomial probability function, BINOMDIST( k, n, p, True/False). The first argument, k, is the number of successes. The second argument, n, is the number of trials. The third argument, p, is the probability of success on any given trial. The final argument is an option regarding whether you want the probability of exactly k successes or the cumulative probability of k successes. Typing false or 0 for this option results in the probability of exactly k successes. Typing true or 1 for this option results in the cumulative probability of k successes, which will be discussed shortly. In the jet problem, we wanted the probability of exactly k=2 successes in n=3 trials with a probability of success of p=.2 on each trial. You should check that when you type =BINOMDIST(2,3,.2,0) in a cell you get .096. Example. In an MBA STAT course with 20 students, the probability that a student gets an A is .4. (i) What is the probability that between 5 and 9 students (inclusive) will receive As? (ii) What is the probability that more than seven students receive As? Answer. Both parts of this problem are made easier if you know something about cumulative probabilities. The cumulative probability of k successes in n trials is the probability of getting k or fewer successes in n trials (compare this with the statement exactly k successes in n trials). For example, if we equate success with getting an A, then the cumulative probability of k=4 As in n=20 trials with probability p=.4 on each trial is Probability of exactly 0 successes in 20 trials + Probability of exactly 1 success in 20 trials + Probability of exactly 2 successes in 20 trials + Probability of exactly 3 successes in 20 trials + Probability of exactly 4 successes in 20 trials =BINOMDIST(0,20,.4,false)+BINOMDIST(1,20,.4,false)+BINOMDIST(2,20,.4,false)+ BINOMDIST(3,20,.4,false)+ BINOMDIST(4,20,.4,false) = .0595. You can do the same calculation faster by setting the cumulative option of BINOMDIST to TRUE. When you do this, Excel computes the cumulative probability automatically. You should check that BINOMDIST(4,20,.4,TRUE) = .0595. Answer to (i) (Using Excel)

Answer to (ii) (Using Excel)

The Mean and Variance of a Binomial Random Variable Suppose X is a random variable having a binomial distribution with n trials and probability of success p on any trial. This means X has n+1 possible outcomes corresponding to the number of successes in those n trials, namely, X=0, X=1, X=2,., X=n. For example, in the jet problem, X=0,1,2,3see the previous tree diagram. The corresponding probabilities are BINOMDIST(0,n,p,False), BINOMDIST(1,n,p,False), BINOMDIST(2,n,p,False), etc. We could
Copyright John Semple 2007 9

then compute the expected value ( = E ( X ) ) and variance ( 2 = E ( X ) 2 ) for X using our previous formulas. However, there are some shortcut formulas that simplify these calculations. The Expected Value (Mean) and Variance of a Binomial Distribution (Shortcut Formulas) The mean and variance of a Binomial distribution with n trials and probability p of success on any trial are = np and 2 = np(1 p) Observe that this spares us having to compute the mean and variance for each possible binomial distribution the long way as you did on previous problems. All we need to know are the values of n and p.

An Application of the Binomial Distribution

Example (From: Steve Patterson, President, Vinson & Company). You are working as a jury consultant for the defense in a billion-dollar patent infringement lawsuit. Your survey data has revealed that 40% of the people are initially biased in favor of the defense and 60% are initially biased in favor of the plaintiffs. You know that once a pool of prospective jurors is assembled, you can identify the particular initial bias of each individual based on their written responses to court questionnaires as well as their verbal responses to direct (pretrial) questioning. The plaintiffs attorneys have likewise retained a jury consultant, and it is safe to assume that they can identify each persons initial bias as well. Consequently, once the pool is drawn, both sides have a pretty good idea of what they are up against. The defenses goal is to seat a final jury with at least one juror who is initially biased in their favor. This should help bring about a hung jury, which is typically considered a win in these cases. The original proposal for seating a jury of 10 individuals was to draw a pool of 20 people and allow each side 5 strikes (i.e., they can each eliminate 5 individuals from the pool that they dont want seated on the final jury). The plaintiffs attorneys have proposed a new format: draw a pool of 30 people and allow each side 10 strikes. The defense attorneys all like the new format because it gives them more strikes, which in turn gives them a greater sense of control. Which format would you recommend for the defense? Does it really make any difference? What is the probability that the defense can accomplish their goal under each format? Solution: Calculate the probability of getting at least one juror biased in favor of the defense under each seating process. In the first format, the prosecution will strike up to 5 jurors biased in favor of the defense, hence the defense needs at least 6 people out of the initial pool of 20 who are biased in their favor. This is equivalent to P ( X 6) (or equivalently 1 P ( X 5) ) where X is Binomial with parameters n = 20 and p = 0.4. In the second format, the prosecution will strike up to 10 jurors biased in favor of the defense, hence the defense needs at least 11 people out of the initial pool of 30 who are biased in their favor. This is equivalent to P ( X 11) (or equivalently 1 P( X 10) ) where X is Binomial with parameters n = 30 and p = 0.4.

Copyright John Semple 2007


Homework #1
1. 2. 3. 4. 5. 6. Book, 4.42 (Chapter 4, problem 42) Note: posterior probability = conditional probability Book, 5.19 Book, 5.20 Book, 5.21 Book, 5.32 Book, 5.35

7. 5 pt. Bonus (Note: This problem is hard). Dr. Sagi's BreastCare Test (Pamela Druckerman, Wall Street Journal, January 6, 1999). The following paragraphs from the WSJ describe Dr. Sagis BreastCare test:
The evidence Dr. Sagi cites for BreastCare goes back to the early 1980's. At New York's Memorial Sloan-Kettering Cancer Center and other hospitals, the product, then called the Breast Cancer Screening Indicator, was tried on 179 women who were scheduled for biopsies. It registered positive for 83% of women whose subsequent biopsies showed they had cancer. For women who had tiny cancers less than 1 centimeter across, it registered positive nearly 90% of the time. But the test also came out positive for many women who didn't have cancer, nearly half of such patients. And it gave negative readings to 55 women - one third of whom in fact did have cancer, as shown by later biopsies. Says David Dershaw, director of Sloan-Ketterings breast-imaging department: It's bad to tell women that don't have cancer that they do have cancer. It's worse to tell women that do have cancer that they don't. This does both.

(1) Using the data above, what is the probability that a person has cancer if they test positive using BreastCare? Hint: Use a 22 table as we did for the AIDS example. Figure out how the 179 cases break out into the 4 cells. You do not need all the statements in the text; choose those that are most definitive, avoid those that are approximations. (2) Again using the data above, what is the probability that a person who does not have cancer tests positive using BreastCare?

Copyright John Semple 2007