You are on page 1of 63

# Probability Distributions

Random Variables
Expected Value
And Normal Distributions
Random Variable

Random variable
Outcomes of an experiment expressed
numerically
e.g.: Toss a dice twice; Count the number
of times the number 4 appears (0, 1 or 2
times)
Discrete Random Variable

## Discrete random variable

Obtained by counting (1, 2, 3, etc.)
Usually a finite number of
different values
e.g.: Toss a coin five times;
Count the number of tails
(0, 1, 2, 3, 4, or 5 times)
Discrete Probability
Distribution Example
Event: Toss 2 Coins. Count # Tails.
Probability Distribution
Values Probability
T 0 1/4 = .25
1 2/4 = .50
T
2 1/4 = .25
T T
Discrete Probability Distribution

## List of all possible [Xj , p(Xj) ] pairs

Xj = value of random variable
P(Xj) = probability associated with value

## Mutually exclusive (nothing in common)

Collectively exhaustive (nothing left out)
0  P X j  1  P X j  1
Summary Measures

## Expected value (the mean)

Weighted average of the probability
distribution:

  E  X    X jP X j 
j
Summary Measures
continued

## Example of expected value (the mean):

Toss two coins, count the number of
tails, compute expected value
   X jP X j 
j

  0   2.5    1  .5    2   .25   1
Summary Measures
(continued
)
Variance
the mean

  E   X       X j    P X j 
2 2 2

 
Summary Measures
(continued
)
Example of variance:
Toss two coins, count number of tails,
compute variance
   X j    P X j 
2 2

##   0  1  .25   1  1  .5   2  1  .25  .5

2 2 2
Covariance and its Application
N
 XY    X i  E  X    Yi  E  Y   P  X iYi 
i 1

## X : discrete random variable

X i : i th outcome of X
Y : discrete random variable
Yi : i th outcome of Y
P  X iYi  : probability of occurrence of the i th

## outcome of X and the i th outcome of Y

Computing the Mean for
Investment Returns
Return per \$1,000 for two types of investments

Investment
P(XiYi) Economic condition Dow Jones fund X Growth Stock Y
.2 Recession -\$100 -\$200
.5 Stable Economy + 100 + 50
.3 Expanding Economy + 250 + 350

## E  Y   Y   200   .2    50   .5    350   .3  \$90

Computing the Variance for
Investment Returns
Investment
P(XiYi) Economic condition Dow Jones fund X Growth Stock Y
.2 Recession -\$100 -\$200
.5 Stable Economy + 100 + 50
.3 Expanding Economy + 250 + 350

2 2 2 2
X

##  14, 725  X  121.35

   .2   200  90    .5   50  90    .3  350  90 
2 2 2 2
Y

 37,900  Y  194.68
Important Discrete
Probability Distributions
Discrete Probability
Distributions

## Binomial Hypergeometric Poisson

The Binomial Random Variable

## Binomial Random variable

– An experiment of n identical trials
– 2 possible outcomes on each trial, denoted as
S( success) and F( failure)
– Probability of success (p) is constant from trial
to trial. Probability of failure (q) is 1-p
– Trials are independent
– Binomial random variable – number of S’s in n
trials
The Binomial Random Variable

## Computer retailer selling desktop (D) and laptop

(L) PCs online. Sales of 80% desktop, 20% laptop.
What is the probability that next 4 sales are
Laptops?
Sample points for next 4 online
purchases
DDDD LDDD LLDD DLLL LLLL
DLDD LDLD LDLL
DDLD LDDL LLDL
DDDL DLLD LLLD
DLDL
DDLL
The Binomial Random Variable

## Use multiplicative rule to calculate probabilities of

the possible outcomes
P(DDDD) = .8*.8*.8*.8=.84=.4096
P(LDDD) = .2*.8*.8*.8=.2*.83=.1024
…..
P(LLLL) = .2*.2*.2*.2=.24=.0016
The Binomial Random Variable

## What is the probability that 3 of the next 4 online

sales are laptops?
P(3 of the next 4 customers purchase laptops) =
4(.2)3(.8)=4(.0064) = .0256
What is the probability that 3 of the next 4 online
sales are desktops?
P(3 of the next 4 customers purchase desktops) =
4(.8)3(.2)=4(.1024) = .4096

## Do you see a pattern?

The Binomial Random Variable

## Formula for the probability distribution p(x)

n 
p( x ) = 
x
x n −x
• p q
 
Where p = probability of success on single trial
q = 1-p
n = Number of trials
x = number of successes in n trials
n n!
  =
 x  x!(n − x)!
The Binomial Random Variable

Mean: µ = np

Variance: σ = npq
2

## Standard deviation σ = npq

The Binomial Random Variable

## Using Binomial Tables

Binomial tables are cumulative tables, entries
represent cumulative binomial probabilities
Make use of additive and complementary
properties to calculate probabilities of individual
x’s, or x being greater than a particular value.
The Binomial Random Variable
If x < 2, and p =.2, n =10, then P(x<2) =.678
If x = 2, and p =.2, n =10, then P(x=2) = P(x<2) - P(x<1)=.678-.376 = .302
If x >2, and p = .2, n =10, then P(x>2) = 1- P(x<2) =1-.678 = .322

## Binomial probabilities for n=10 (partial table)

p
k .01 .05 .10 .20 .30
0 .904 .599 .349 .107 .028
1 .996 .914 .736 .376 .149

## 4 1.000 1.000 .998 .967 .850

Expected Values of Discrete
Random Variables
Probability Rules for a Discrete Random Variable

## Probability Rules for a Discrete Random Variable

Chebyshev’s Rule Empirical Rule

## Applies to any Applies to mound-

distribution shaped and symmetric
distributions
P( µ − σ < x < µ + σ ) ≥0 ≈ .68
P ( µ − 2σ < x < µ + 2σ ) ≥3 4 ≈ .95
P ( µ − 3σ < x < µ + 3σ ) ≥8 9 ≈ 1.00
Poisson Distribution
Poisson process ( =x|λ
PX
-λ x
Discrete events in an “interval” e λ
The probability of one success
in an interval is stable x!
The probability of more than
one success in this interval is 0
The probability of success is
independent from interval to
interval
e.g.: The number of customers arriving in 15
minutes
e.g.: The number of defects per case of light
bulbs
Poisson Distribution
Characteristics
Mean .6
P(X) λ = 0.5
  E X    .4
.2
N 0 X

  XiP  Xi  0 1 2 3 4 5

i 1

.6
P(X) λ =6
Standard deviation .4
.2
and variance 0 X

   
2 0 2 4 6 8 10
Poisson Probability
Distribution Function
 X
e 
P X  
X!
P  X  : probability of X "successes" given 
X : number of "successes" per unit
 : expected (average) number of "successes"
e : 2.71828 (base of natural logs)
e.g.: Find the probability of four e 3.6 3.64
P X    .1912
customers arriving in three 4!
minutes when the mean is 3.6.
Continuous Random Variables
Continuous Probability Distributions

## Continuous Probability Distribution – areas under

curve correspond to probabilities for x
Area A corresponds to the probability that x lies
between a and b
Do you see the similarity in shape between the continuous and discrete
probability distributions?
The Normal Distribution

## “Bell shaped” f(X)

Symmetrical
Mean, median and
mode are equal X
µ
Interquartile range
Mean
equals 1.33 σ Median
Random variable Mode

## has infinite range

The Mathematical Model
1
1   X    2

f  X 

2
e
2 2
f  X  : density of random variable X
  3.14159; e  2.71828
 : population mean
 : population standard deviation
X : value of random variable    X   
The Normal Distribution

## A normal random variable has a probability

distribution called a normal distribution

## The Normal Distribution

Bell-shaped curve
of it’s standard deviation σ
The Normal Distribution

## The mean and standard deviation affect the

flatness and center of the curve, but not the
basic shape
The Normal Distribution
Probabilities associated with values or ranges of a random
variable correspond to areas under the normal curve
Calculating probabilities can be simplified by working with a
Standard Normal Distribution
A Standard Normal Distribution is a Normal distribution with
µ =0 and σ =1
The standard normal
random variable is
denoted by the
symbol z
The Normal Distribution
Table for Standard Normal Distribution contains probability
for the area between 0 and z
Partial table below shows components of table

Probability
associated with a
particular z value, in
Value of z a this case z=.13,
combination of p(0<z<.13) = .0517
column and
row Z .00 .01 .02 .03 .04 .05 .06 .07 .08 .09
.0 .0000 .0040 .0080 .0120 .0160 .0199 .0239 .0279 .0319 .0359
.1 .0398 .0438 .0478 .0517 .0557 .0596 .0636 .0675 .0714 .0753
.2 .0793 .0832 .0871 .0910 .0948 .0987 .1026 .1064 .1103 .1141
.3 .1179 .1217 .1255 .1293 .1331 .1368 .1406 .1443 .1480 .1517
Many Normal Distributions
There are an infinite number of normal
distributions

## By varying the parameters σ and µ, we

obtain different normal distributions
Finding Probabilities

Probability is
the area
under the P c  X  d   ?
curve!
f(X)

X
c d
Which Table to Use?

## An infinite number of normal distributions

means an infinite number of tables to look
up!
Solution: The Cumulative Standardized
Normal Distribution
Cumulative Standardized
Normal Distribution Table
(Portion) Z  0 Z 1
Z .00 .01 .02
.5478
0.0 .5000 .5040 .5080
Area
0.1 .5398 .5438 .5478 Exaggerate
d
0.2 .5793 .5832 .5871 0
Probabilities
0.3 .6179 .6217 .6255 Z = 0.12

## Only One Table is

The Normal Distribution

## What is P(-1.33 < z < 1.33)?

Table gives us area A1
tell us that A2 = A1

P(-1.33 < z < 1.33) = P(-1.33 < z < 0) +P(0 < z < 1.33)=
A2 + A1 = .4082 + .4082 = .8164
The Normal Distribution

## What is P(z < .67)?

Table gives us area A1
tell us that A2 = .5

## P(z < .67) = A1 + A2 = .2486 + .5 = .7486

The Normal Distribution

## What is P(|z| > 1.96)?

Table gives us area .5 - A2
=.4750, so A2 = .0250
tell us that A2 = A1
P(|z| > 1.96) = A1 + A2 = .0250 + .0250 =.05
The Normal Distribution

## What if values of interest were

not normalized? We want to know
P (8<x<12), with μ=10 and σ=1.5
Convert to standard normal using
x−µ
z=
σ
P(8<x<12) = P(-1.33<z<1.33) = 2(.4082) = .8164
Solution: The Cumulative
Standardized Normal
Distribution
Cumulative Standardized
Normal Distribution Table
(Portion) Z  0 Z 1
Z .00 .01 .02
.5478
0.0 .5000 .5040 .5080
Area
0.1 .5398 .5438 .5478 Exaggerate
d
0.2 .5793 .5832 .5871 0
Probabilities
0.3 .6179 .6217 .6255 Z = 0.12

## Only One Table is

Standardizing Example
X   6.2  5
Z   0.12
 10
Normal Standardized
Distribution Normal
  10 Distribution
Z 1

6.2 X 0.12 Z
 5 Z  0
Example:
P  2.9  X  7.1  .1664
X   2.9  5 X   7.1  5
Z   .21 Z   .21
 10  10

Normal Standardized
Distribution Normal
  10 Distribution
.0832 Z 1
.0832

## 2.9 7.1 X 0.21 0.21 Z

 5 Z  0
Example:
P  2.9  X  7.1  .1664(continued
)
Cumulative Standardized
Normal Distribution Table Z  0 Z 1
(Portion)
Z .00 .01 .02
.5832
Area
0.1 .5398 .5438 .5478 Exaggerate
d
0.2 .5793 .5832 .5871 0
Z = 0.21
0.3 .6179 .6217 .6255
Example:
P  2.9  X  7.1  .1664(continued
)
Cumulative Standardized
Normal Distribution Table Z  0 Z 1
(Portion)
Z .00 .01 .02 .4168
Area
-02 .4207 .4168 .4129 Exaggerate
d
-0.1 .4602 .4562 .4522 0
Z = -0.21
0.0 .5000 .4960 .4920
Example:
P  X  8   .3821
X   85
Z   .30
 10

Normal Standardized
Distribution Normal
  10 Distribution
Z 1
.3821

8 X 0.30 Z
 5 Z  0
Example:
P  X  8   .3821 (continued
)
Cumulative Standardized
Normal Distribution Table Z  0 Z 1
(Portion)
Z .00 .01 .02 .6179
Area
0.1 .5398 .5438 .5478 Exaggerate
d
0.2 .5793 .5832 .5871 0
Z = 0.30
0.3 .6179 .6217 .6255
Finding Z Values for Known
Probabilities
Cumulative Standardized
What is Z Given Normal Distribution Table
Probability = (Portion)
0.1217 ?
Z  0 Z 1 Z .00 .01 0.2

## 0.0 .5000 .5040 .5080

.6217
0.1 .5398 .5438 .5478

## 0.2 .5793 .5832 .5871

0
Area Z  .31
Exaggerat
Recovering X Values for Known
Probabilities
Normal Standardized
Distribution Normal
  10 Distribution
.6179 Z 1
.3821

X
 5 ? Z  0
0.30 Z

X    Z  5   .30   10   8
The Normal Distribution

## Steps for Finding a Probability Corresponding to a

Normal Random Variable
•Sketch the distribution, locate mean, shade area
of interest
x−µ
•Convert to standard z values using z =
σ
•Add z values to the sketch
•Use tables to calculate probabilities, making use
of symmetry property where necessary
The Normal Distribution

## Given a normally distributed

variable x with mean 100000 and
standard deviation of 10000, what
value of x identifies the top 10%
of the distribution?
 x0 − µ   x0 − 100,000 
P ( x ≤ x0 ) = P z ≤  = P z ≤  = .90
 σ   10,000 
The z value corresponding with .40 is 1.28. Solving for x0
x0 = 100,000 +1.28(10,000) = 100,000 +12,800 = 112,800
Assessing Normality

## Not all continuous random variables are

normally distributed
It is important to evaluate how well the data
set seems to be adequately approximated
by a normal distribution
Assessing Normality
(continued
)
Construct charts
For small- or moderate-sized data sets, do
stem-and-leaf display and box-and-whisker
plot look symmetric?
For large data sets, does the histogram or
polygon appear bell-shaped?
Compute descriptive summary measures
Do the mean, median and mode have similar
values?
Is the interquartile range approximately 1.33 σ?
Is the range approximately 6 σ?
Assessing Normality
(continued
)
Observe the distribution of the data set
Do approximately 2/3 of the observations lie
between mean  1 standard deviation?
Do approximately 4/5 of the observations lie
between mean  1.28 standard deviations?
Do approximately 19/20 of the observations lie
between mean  2 standard deviations?
Evaluate normal probability plot
Do the points lie on or close to a straight line
with positive slope?
Assessing Normality
(continued
)
Normal probability plot
Arrange data into ordered array
Find corresponding standardized normal
quantile values
Plot the pairs of points with observed data
values on the vertical axis and the
standardized normal quantile values on the
horizontal axis
Evaluate the plot for evidence of linearity
Assessing Normality
(continued
)
Normal Probability Plot
for Normal Distribution
90
X 60
30 Z
-2 -1 0 1 2

## Look for Straight

Normal Probability Plot
Left-Skewed Right-Skewed
90 90
X 60 X 60
30 Z 30 Z
-2 -1 0 1 2 -2 -1 0 1 2

Rectangular U-Shaped
90 90
X 60 X 60
30 Z 30 Z
-2 -1 0 1 2 -2 -1 0 1 2
Example
e.g.: Customers arrive at the check out
line of a supermarket at the rate of 30
per hour. What is the probability that
the arrival time between consecutive
customers to be greater than 5
minutes?
  30 X  5 / 60 hours
P  arrival time >X   1  P  arrival time  X 

 1 1 e
30 5 / 60 

 .0821
Descriptive Methods for Assessing
Normality
•Evaluate the shape from a histogram or
stem-and-leaf display
•Compute intervals about mean x ± s, x ± 2s, x ± 3s
and corresponding percentages
•Compute IQR and divide by standard
deviation. Result is roughly 1.3 if normal
•Use statistical package to evaluate a
normal probability plot for the data
Approximating a Binomial Distribution with a
Normal Distribution

## You can use a Normal Distribution as an

approximation of a Binomial Distribution for large
values of n
Often needed given limitation of binomial tables
Need to add a correction for continuity, because of
the discrete nature of the binomial distribution
Correction is to add .5 to x when converting to
standard z values
Rule of thumb: interval µ+3σ should be within
range of binomial random variable (0-n) for normal
Approximating a Binomial Distribution with a
Normal Distribution

Steps
Determine n and p for the binomial distribution
Calculate the interval µ ± 3σ = np ± 3 npq
Express binomial probability in the form P(x<a) or
P(x<b)–P(x<a)
Calculate z value for each a, applying continuity
correction
Sketch normal distribution, locate a’s and use table
to solve
Approximating a Binomial Distribution with a
Normal Distribution

## You can use a Normal Distribution as an

approximation of a Binomial Distribution for large
values of n
Often needed given limitation of binomial tables
Need to add a correction for continuity, because of
the discrete nature of the binomial distribution
Correction is to add .5 to x when converting to
standard z values
Rule of thumb: interval µ+3σ should be within
range of binomial random variable (0-n) for normal