You are on page 1of 63

Probability Distributions

Random Variables
Expected Value
And Normal Distributions
Random Variable

Random variable
Outcomes of an experiment expressed
numerically
e.g.: Toss a dice twice; Count the number
of times the number 4 appears (0, 1 or 2
times)
Discrete Random Variable

Discrete random variable


Obtained by counting (1, 2, 3, etc.)
Usually a finite number of
different values
e.g.: Toss a coin five times;
Count the number of tails
(0, 1, 2, 3, 4, or 5 times)
Discrete Probability
Distribution Example
Event: Toss 2 Coins. Count # Tails.
Probability Distribution
Values Probability
T 0 1/4 = .25
1 2/4 = .50
T
2 1/4 = .25
T T
Discrete Probability Distribution

List of all possible [Xj , p(Xj) ] pairs


Xj = value of random variable
P(Xj) = probability associated with value

Mutually exclusive (nothing in common)


Collectively exhaustive (nothing left out)
0  P X j  1  P X j  1
Summary Measures

Expected value (the mean)


Weighted average of the probability
distribution:

  E  X    X jP X j 
j
Summary Measures
continued

Example of expected value (the mean):


Toss two coins, count the number of
tails, compute expected value
   X jP X j 
j

  0   2.5    1  .5    2   .25   1
Summary Measures
(continued
)
Variance
Weight average squared deviation about
the mean

  E   X       X j    P X j 
2 2 2

 
Summary Measures
(continued
)
Example of variance:
Toss two coins, count number of tails,
compute variance
   X j    P X j 
2 2

  0  1  .25   1  1  .5   2  1  .25  .5


2 2 2
Covariance and its Application
N
 XY    X i  E  X    Yi  E  Y   P  X iYi 
i 1

X : discrete random variable


X i : i th outcome of X
Y : discrete random variable
Yi : i th outcome of Y
P  X iYi  : probability of occurrence of the i th

outcome of X and the i th outcome of Y


Computing the Mean for
Investment Returns
Return per $1,000 for two types of investments

Investment
P(XiYi) Economic condition Dow Jones fund X Growth Stock Y
.2 Recession -$100 -$200
.5 Stable Economy + 100 + 50
.3 Expanding Economy + 250 + 350

E  X    X   100   .2    100   .5    250   .3  $105

E  Y   Y   200   .2    50   .5    350   .3  $90


Computing the Variance for
Investment Returns
Investment
P(XiYi) Economic condition Dow Jones fund X Growth Stock Y
.2 Recession -$100 -$200
.5 Stable Economy + 100 + 50
.3 Expanding Economy + 250 + 350

   .2   100  105    .5   100  105    .3  250  105 


2 2 2 2
X

 14, 725  X  121.35


   .2   200  90    .5   50  90    .3  350  90 
2 2 2 2
Y

 37,900  Y  194.68
Important Discrete
Probability Distributions
Discrete Probability
Distributions

Binomial Hypergeometric Poisson


The Binomial Random Variable

Binomial Random variable


– An experiment of n identical trials
– 2 possible outcomes on each trial, denoted as
S( success) and F( failure)
– Probability of success (p) is constant from trial
to trial. Probability of failure (q) is 1-p
– Trials are independent
– Binomial random variable – number of S’s in n
trials
The Binomial Random Variable

Computer retailer selling desktop (D) and laptop


(L) PCs online. Sales of 80% desktop, 20% laptop.
What is the probability that next 4 sales are
Laptops?
Sample points for next 4 online
purchases
DDDD LDDD LLDD DLLL LLLL
DLDD LDLD LDLL
DDLD LDDL LLDL
DDDL DLLD LLLD
DLDL
DDLL
The Binomial Random Variable

Use multiplicative rule to calculate probabilities of


the possible outcomes
P(DDDD) = .8*.8*.8*.8=.84=.4096
P(LDDD) = .2*.8*.8*.8=.2*.83=.1024
…..
P(LLLL) = .2*.2*.2*.2=.24=.0016
The Binomial Random Variable

What is the probability that 3 of the next 4 online


sales are laptops?
P(3 of the next 4 customers purchase laptops) =
4(.2)3(.8)=4(.0064) = .0256
What is the probability that 3 of the next 4 online
sales are desktops?
P(3 of the next 4 customers purchase desktops) =
4(.8)3(.2)=4(.1024) = .4096

Do you see a pattern?


The Binomial Random Variable

Formula for the probability distribution p(x)


n 
p( x ) = 
x
x n −x
• p q
 
Where p = probability of success on single trial
q = 1-p
n = Number of trials
x = number of successes in n trials
n n!
  =
 x  x!(n − x)!
The Binomial Random Variable

Mean: µ = np

Variance: σ = npq
2

Standard deviation σ = npq


The Binomial Random Variable

Using Binomial Tables


Binomial tables are cumulative tables, entries
represent cumulative binomial probabilities
Make use of additive and complementary
properties to calculate probabilities of individual
x’s, or x being greater than a particular value.
The Binomial Random Variable
If x < 2, and p =.2, n =10, then P(x<2) =.678
If x = 2, and p =.2, n =10, then P(x=2) = P(x<2) - P(x<1)=.678-.376 = .302
If x >2, and p = .2, n =10, then P(x>2) = 1- P(x<2) =1-.678 = .322

Binomial probabilities for n=10 (partial table)

p
k .01 .05 .10 .20 .30
0 .904 .599 .349 .107 .028
1 .996 .914 .736 .376 .149

2 1.000 .988 .930 .678 .383

3 1.000 .999 .987 .879 .650

4 1.000 1.000 .998 .967 .850


Expected Values of Discrete
Random Variables
Probability Rules for a Discrete Random Variable

Probability Rules for a Discrete Random Variable


Chebyshev’s Rule Empirical Rule

Applies to any Applies to mound-


distribution shaped and symmetric
distributions
P( µ − σ < x < µ + σ ) ≥0 ≈ .68
P ( µ − 2σ < x < µ + 2σ ) ≥3 4 ≈ .95
P ( µ − 3σ < x < µ + 3σ ) ≥8 9 ≈ 1.00
Poisson Distribution
Poisson process ( =x|λ
PX
-λ x
Discrete events in an “interval” e λ
The probability of one success
in an interval is stable x!
The probability of more than
one success in this interval is 0
The probability of success is
independent from interval to
interval
e.g.: The number of customers arriving in 15
minutes
e.g.: The number of defects per case of light
bulbs
Poisson Distribution
Characteristics
Mean .6
P(X) λ = 0.5
  E X    .4
.2
N 0 X

  XiP  Xi  0 1 2 3 4 5

i 1

.6
P(X) λ =6
Standard deviation .4
.2
and variance 0 X

   
2 0 2 4 6 8 10
Poisson Probability
Distribution Function
 X
e 
P X  
X!
P  X  : probability of X "successes" given 
X : number of "successes" per unit
 : expected (average) number of "successes"
e : 2.71828 (base of natural logs)
e.g.: Find the probability of four e 3.6 3.64
P X    .1912
customers arriving in three 4!
minutes when the mean is 3.6.
Continuous Random Variables
Continuous Probability Distributions

Continuous Probability Distribution – areas under


curve correspond to probabilities for x
Area A corresponds to the probability that x lies
between a and b
Do you see the similarity in shape between the continuous and discrete
probability distributions?
The Normal Distribution

“Bell shaped” f(X)


Symmetrical
Mean, median and
mode are equal X
µ
Interquartile range
Mean
equals 1.33 σ Median
Random variable Mode

has infinite range


The Mathematical Model
1
1   X    2

f  X 

2
e
2 2
f  X  : density of random variable X
  3.14159; e  2.71828
 : population mean
 : population standard deviation
X : value of random variable    X   
The Normal Distribution

A normal random variable has a probability


distribution called a normal distribution

The Normal Distribution


Bell-shaped curve
Symmetrical about its mean μ
Spread determined by the value
of it’s standard deviation σ
The Normal Distribution

The mean and standard deviation affect the


flatness and center of the curve, but not the
basic shape
The Normal Distribution
Probabilities associated with values or ranges of a random
variable correspond to areas under the normal curve
Calculating probabilities can be simplified by working with a
Standard Normal Distribution
A Standard Normal Distribution is a Normal distribution with
µ =0 and σ =1
The standard normal
random variable is
denoted by the
symbol z
The Normal Distribution
Table for Standard Normal Distribution contains probability
for the area between 0 and z
Partial table below shows components of table

Probability
associated with a
particular z value, in
Value of z a this case z=.13,
combination of p(0<z<.13) = .0517
column and
row Z .00 .01 .02 .03 .04 .05 .06 .07 .08 .09
.0 .0000 .0040 .0080 .0120 .0160 .0199 .0239 .0279 .0319 .0359
.1 .0398 .0438 .0478 .0517 .0557 .0596 .0636 .0675 .0714 .0753
.2 .0793 .0832 .0871 .0910 .0948 .0987 .1026 .1064 .1103 .1141
.3 .1179 .1217 .1255 .1293 .1331 .1368 .1406 .1443 .1480 .1517
Many Normal Distributions
There are an infinite number of normal
distributions

By varying the parameters σ and µ, we


obtain different normal distributions
Finding Probabilities

Probability is
the area
under the P c  X  d   ?
curve!
f(X)

X
c d
Which Table to Use?

An infinite number of normal distributions


means an infinite number of tables to look
up!
Solution: The Cumulative Standardized
Normal Distribution
Cumulative Standardized
Normal Distribution Table
(Portion) Z  0 Z 1
Z .00 .01 .02
.5478
0.0 .5000 .5040 .5080
Shaded
Area
0.1 .5398 .5438 .5478 Exaggerate
d
0.2 .5793 .5832 .5871 0
Probabilities
0.3 .6179 .6217 .6255 Z = 0.12

Only One Table is


The Normal Distribution

What is P(-1.33 < z < 1.33)?


Table gives us area A1
Symmetry about the mean
tell us that A2 = A1

P(-1.33 < z < 1.33) = P(-1.33 < z < 0) +P(0 < z < 1.33)=
A2 + A1 = .4082 + .4082 = .8164
The Normal Distribution

What is P(z < .67)?


Table gives us area A1
Symmetry about the mean
tell us that A2 = .5

P(z < .67) = A1 + A2 = .2486 + .5 = .7486


The Normal Distribution

What is P(|z| > 1.96)?


Table gives us area .5 - A2
=.4750, so A2 = .0250
Symmetry about the mean
tell us that A2 = A1
P(|z| > 1.96) = A1 + A2 = .0250 + .0250 =.05
The Normal Distribution

What if values of interest were


not normalized? We want to know
P (8<x<12), with μ=10 and σ=1.5
Convert to standard normal using
x−µ
z=
σ
P(8<x<12) = P(-1.33<z<1.33) = 2(.4082) = .8164
Solution: The Cumulative
Standardized Normal
Distribution
Cumulative Standardized
Normal Distribution Table
(Portion) Z  0 Z 1
Z .00 .01 .02
.5478
0.0 .5000 .5040 .5080
Shaded
Area
0.1 .5398 .5438 .5478 Exaggerate
d
0.2 .5793 .5832 .5871 0
Probabilities
0.3 .6179 .6217 .6255 Z = 0.12

Only One Table is


Standardizing Example
X   6.2  5
Z   0.12
 10
Normal Standardized
Distribution Normal
  10 Distribution
Z 1

6.2 X 0.12 Z
 5 Z  0
Shaded Area Exaggerated
Example:
P  2.9  X  7.1  .1664
X   2.9  5 X   7.1  5
Z   .21 Z   .21
 10  10

Normal Standardized
Distribution Normal
  10 Distribution
.0832 Z 1
.0832

2.9 7.1 X 0.21 0.21 Z


 5 Z  0
Shaded Area
Example:
P  2.9  X  7.1  .1664(continued
)
Cumulative Standardized
Normal Distribution Table Z  0 Z 1
(Portion)
Z .00 .01 .02
.5832
0.0 .5000 .5040 .5080 Shaded
Area
0.1 .5398 .5438 .5478 Exaggerate
d
0.2 .5793 .5832 .5871 0
Z = 0.21
0.3 .6179 .6217 .6255
Example:
P  2.9  X  7.1  .1664(continued
)
Cumulative Standardized
Normal Distribution Table Z  0 Z 1
(Portion)
Z .00 .01 .02 .4168
-03 .3821 .3783 .3745 Shaded
Area
-02 .4207 .4168 .4129 Exaggerate
d
-0.1 .4602 .4562 .4522 0
Z = -0.21
0.0 .5000 .4960 .4920
Example:
P  X  8   .3821
X   85
Z   .30
 10

Normal Standardized
Distribution Normal
  10 Distribution
Z 1
.3821

8 X 0.30 Z
 5 Z  0
Shaded Area
Example:
P  X  8   .3821 (continued
)
Cumulative Standardized
Normal Distribution Table Z  0 Z 1
(Portion)
Z .00 .01 .02 .6179
0.0 .5000 .5040 .5080 Shaded
Area
0.1 .5398 .5438 .5478 Exaggerate
d
0.2 .5793 .5832 .5871 0
Z = 0.30
0.3 .6179 .6217 .6255
Finding Z Values for Known
Probabilities
Cumulative Standardized
What is Z Given Normal Distribution Table
Probability = (Portion)
0.1217 ?
Z  0 Z 1 Z .00 .01 0.2

0.0 .5000 .5040 .5080


.6217
0.1 .5398 .5438 .5478

0.2 .5793 .5832 .5871


0
Shaded 0.3 .6179 .6217 .6255
Area Z  .31
Exaggerat
Recovering X Values for Known
Probabilities
Normal Standardized
Distribution Normal
  10 Distribution
.6179 Z 1
.3821

X
 5 ? Z  0
0.30 Z

X    Z  5   .30   10   8
The Normal Distribution

Steps for Finding a Probability Corresponding to a


Normal Random Variable
•Sketch the distribution, locate mean, shade area
of interest
x−µ
•Convert to standard z values using z =
σ
•Add z values to the sketch
•Use tables to calculate probabilities, making use
of symmetry property where necessary
The Normal Distribution

Given a normally distributed


variable x with mean 100000 and
standard deviation of 10000, what
value of x identifies the top 10%
of the distribution?
 x0 − µ   x0 − 100,000 
P ( x ≤ x0 ) = P z ≤  = P z ≤  = .90
 σ   10,000 
The z value corresponding with .40 is 1.28. Solving for x0
x0 = 100,000 +1.28(10,000) = 100,000 +12,800 = 112,800
Assessing Normality

Not all continuous random variables are


normally distributed
It is important to evaluate how well the data
set seems to be adequately approximated
by a normal distribution
Assessing Normality
(continued
)
Construct charts
For small- or moderate-sized data sets, do
stem-and-leaf display and box-and-whisker
plot look symmetric?
For large data sets, does the histogram or
polygon appear bell-shaped?
Compute descriptive summary measures
Do the mean, median and mode have similar
values?
Is the interquartile range approximately 1.33 σ?
Is the range approximately 6 σ?
Assessing Normality
(continued
)
Observe the distribution of the data set
Do approximately 2/3 of the observations lie
between mean  1 standard deviation?
Do approximately 4/5 of the observations lie
between mean  1.28 standard deviations?
Do approximately 19/20 of the observations lie
between mean  2 standard deviations?
Evaluate normal probability plot
Do the points lie on or close to a straight line
with positive slope?
Assessing Normality
(continued
)
Normal probability plot
Arrange data into ordered array
Find corresponding standardized normal
quantile values
Plot the pairs of points with observed data
values on the vertical axis and the
standardized normal quantile values on the
horizontal axis
Evaluate the plot for evidence of linearity
Assessing Normality
(continued
)
Normal Probability Plot
for Normal Distribution
90
X 60
30 Z
-2 -1 0 1 2

Look for Straight


Normal Probability Plot
Left-Skewed Right-Skewed
90 90
X 60 X 60
30 Z 30 Z
-2 -1 0 1 2 -2 -1 0 1 2

Rectangular U-Shaped
90 90
X 60 X 60
30 Z 30 Z
-2 -1 0 1 2 -2 -1 0 1 2
Example
e.g.: Customers arrive at the check out
line of a supermarket at the rate of 30
per hour. What is the probability that
the arrival time between consecutive
customers to be greater than 5
minutes?
  30 X  5 / 60 hours
P  arrival time >X   1  P  arrival time  X 


 1 1 e
30 5 / 60 

 .0821
Descriptive Methods for Assessing
Normality
•Evaluate the shape from a histogram or
stem-and-leaf display
•Compute intervals about mean x ± s, x ± 2s, x ± 3s
and corresponding percentages
•Compute IQR and divide by standard
deviation. Result is roughly 1.3 if normal
•Use statistical package to evaluate a
normal probability plot for the data
Approximating a Binomial Distribution with a
Normal Distribution

You can use a Normal Distribution as an


approximation of a Binomial Distribution for large
values of n
Often needed given limitation of binomial tables
Need to add a correction for continuity, because of
the discrete nature of the binomial distribution
Correction is to add .5 to x when converting to
standard z values
Rule of thumb: interval µ+3σ should be within
range of binomial random variable (0-n) for normal
distribution to be adequate approximation
Approximating a Binomial Distribution with a
Normal Distribution

Steps
Determine n and p for the binomial distribution
Calculate the interval µ ± 3σ = np ± 3 npq
Express binomial probability in the form P(x<a) or
P(x<b)–P(x<a)
Calculate z value for each a, applying continuity
correction
Sketch normal distribution, locate a’s and use table
to solve
Approximating a Binomial Distribution with a
Normal Distribution

You can use a Normal Distribution as an


approximation of a Binomial Distribution for large
values of n
Often needed given limitation of binomial tables
Need to add a correction for continuity, because of
the discrete nature of the binomial distribution
Correction is to add .5 to x when converting to
standard z values
Rule of thumb: interval µ+3σ should be within
range of binomial random variable (0-n) for normal
distribution to be adequate approximation