You are on page 1of 55

Theoretical

Probability Distributions
Lecture Topics
 The Probability of a Discrete Random
Variable
 Binomial Distribution
 Poisson Distribution
 The Normal Distribution
 The Standardized Normal Distribution
 Evaluating the Normality Assumption
Random Variable
A variable x is a random variable if the value that it
assumes, corresponding to the outcome of an
experiment, is a chance or random event.

You can think of many examples of random variables:


 x = number of defects on a randomly selected piece of
furniture
 x = STATs score for a randomly selected MPH applicant
 x = number of telephone calls received by a crisis
intervention hotline during a randomly selected time period
Discrete Random Variable
 Discrete Random Variable
 Obtained by counting (0, 1, 2, 3, etc.)
 Usually a finite number of different values
 E.g., Toss a coin 5 times; count the number of tails (0, 1,
2, 3, 4, or 5 tails)
 Discrete Probability Distributions
 The probability distribution of a discrete random
variable is a table, graph, or formula that gives the
probability associated with each possible value that the
variable can assume.
Discrete Probability Distribution
Example
Event: Toss 2 Coins Count # Tails

Probability Distribution
Values Probability

T 0 1/4 = .25
1 2/4 = .50
T 2 1/4 = .25

T T This is using the A Priori Classical


Probability approach.
 This distribution can be shown in a form of a table as shown.

X 0 1 2
P(X=x) 0.25 0.50 0.25
 Or graphically
P(X=x)
Discrete Probability Distribution
 List of All Possible [Xj , P(Xj) ] Pairs
 Xj = Value of random variable
 P(Xj) = Probability associated with value

 Mutually Exclusive (Nothing in Common)


 Collective Exhaustive (Nothing Left Out)
0  PX j  1  PX  1 j
Discrete Probability Distribution
 Suppose that X has the following properties
 It is a discrete variable
 It can only assume values x1, x2, x3, …..,xn
 The probabilities associated with these values are
p1, p2, p3, ---,pn
 Then X is a discrete random variable if p1+p2+p3+
………+pn =1
Probability Distribution
 Consider the table
X 0 1 2 3
P(X=x) 1/10 3/10 5/10 1/10

 The table represents the probability distribution of X.


the function that is responsible for the allocation of
probabilities is known as the probability density
function (p.d.f) of X
 Note that 1/10+3/10+5/10+1/10 =10/10 =1
 This important property of the sum of the probabilities
is equal to ONE is a useful check to see if the
distribution has been properly defined.
Example 1
 A discrete random variable X has the following
probability distribution.
X 0 1 2 3
P(X=x) 1/18 7/18 5/18 a

Where a is a constant.
Find (i) the value of a
(ii) P(X>2)
 A die with faces numbered 1,2,3,4 ,5,6 is thrown twice. If X is the
random variable ‘the total score when two dice are thrown’.
Tabulate the distribution of X. Hence calculate P(X>=7).
Expectation of X, E(X)
 The expectation of X (or expected value)
written as E(X) is given by:
 E(X) =  x.P( X  x)
allx

 A discrete random variable X has the following


probability distribution.
X 1 2 3 4
P(X=x) 0.1 0.4 0.2 0.3

 Find E(X).
Variance of X, Var(X)

 The variance of X, written as var(X), is given by


 Var(X) = E(X2) – E2(X)
 Or
 Var(X) = E(X2) – E(X)2

 = 2
 x P( X  x)  E ( X )
allx
2

Find Var(x) from previous problem.

The standard deviation is the square root of the variance.


Binomial Probability Distribution
 There are a fixed number of ‘n’ Identical Trials
 E.g., 15 tosses of a coin; 10 light bulbs taken from
a warehouse

 2 Mutually Exclusive Outcomes on Each Trial


 E.g., Heads or tails in each toss of a coin; defective
or not defective light bulb
 These two are refereed as success (p) and failure
(q)
 Note p+q = 1 Hence q =1-p
Binomial Probability Distribution
(continued)

 Trials are Independent


 The outcome of one trial does not affect the outcome of the
other. This means p and q remain constant for each trial.

 Constant Probability for Each Trial


 E.g., Probability of getting a tail is the same each time we
toss the coin
 The Binomial question is: What is the probability that r
successes will occur in n trials of the process under
study?
Binomial Probability Distribution
Function
n!
PX   p 1  p 
X n X

X ! n  X !
P  X  : probability of X successes given n and p
X : number of "successes" in sample  X  0,1, , n 
p : the probability of each "success"
n : sample size Tails in 2 Tosses of Coin
X P(X)
0 1/4 = .25
1 2/4 = .50
2 1/4 = .25
Compute the probability of X successes, using the
binomial formula.
a. n = 6, X = 3, p = 0.03
b. n = 4, X = 2, p = 0.18
c. n = 5, X = 3, p = 0.63
d. n = 9, X = 0, p = 0.42
e. n = 10, X = 5, p = 0.37

Chap 5-16
© 2003 Prentice-Hall, Inc.
In a survey, 30% of the people interviewed said that they
bought most of their books during the last three months of
the year (October, November, December). If nine people are
selected at random, find the probability that exactly three of
these people bought most of their books during October,
November and December.

Chap 5-17
© 2003 Prentice-Hall, Inc.
Binomial Distribution Characteristics
 Mean
   E  X   np
 
 E.g.,   np  5 .1  .5

 Variance and P(X) n = 5 p = 0.1


.6
Standard Deviation .4

 2  np 1  p  .2
0 X

  np 1  p 
0 1 2 3 4 5

 E.g.,
  np 1  p   5 .11  .1  .6708
POISSON DISTRIBUTION
 Measures the number of occurrences of a particular outcome of
a discrete random variable in a predetermined time, space or
volume for which an average number of occurrences of the
outcome can be determined.
 Examples:
 The number of telephone calls received in 1 hour
 The number of typing errors on a page
 The number of planes arriving at an airport in two hours
 The poisson Question is: What is the probability of r
occurrences of a given outcome being observed in a
predetermined time, space or volume interval?
The poisson Distribution
 x
e 
P( x) 
xI

 Where lampda = the mean number of occurrences


per predetermined time
 e = a mathematical constant equal to 2.71828
 X = the number of occurrences for which a
probability is required.
Example
 A textile producer has established that a
spinning machine stops randomly due to
thread breakages at an average rate of 5
stoppages per hour. What is the probability
that in a given hour
a) 3 stoppages will occur
b) at most 2 stoppages will occur
c) more than for stoppages will occur
d) not more than 1 stoppage will occur.
Solution
 The random variable, ‘machine stoppages’ fits
the Poisson process for the following reasons:
 The random variable is discrete. It measures the
number of machine stoppages per hour
 The problem is a Poisson process as it describes the
number of occurrences (stoppages) in a
predetermined time interval (1 hour)
 The average number of stoppages can be
determined or is given
 Solutions in class
Continuous Probability Distributions
 Continuous Random Variable
 Values from interval of numbers, such as heights and
weights, income, blood cholesterol level, experimental
laboratory error
 Can assume infinitely many values corresponding to
points on a line interval
 Continuous Probability Distribution
 Distribution of continuous random variable
 Most Important Continuous Probability Distribution
 The normal distribution
The Normal Distribution
 “Bell Shaped” f(X)
 Symmetrical
 Mean, Median and
Mode are Equal X

 Interquartile Range
Equals 1.33 s Mean
Median
 Random Variable Mode
Has Infinite Range
The Mathematical Model

1  (1/ 2)  X    /  
2

f X   e
2
f  X  : density of random variable X
  3.14159; e  2.71828
 : population mean
 : population standard deviation
X : value of random variable    X   
Many Normal Distributions
There are an Infinite Number of Normal Distributions

Varying the Parameters  and , We Obtain


Different Normal Distributions
The Standard Normal Distribution

If x is normally distributed with mean  and standard


deviation , then
x
z

is normally
distributed with mean
0 and standard
deviation 1, a
standard normal
distribution.
Importance of Normal
Distribution
 Most widely used continuous distribution
 Many phenomena in medicine and nature
follow (or closely approximate) the normal
distribution
 Possesses mathematical properties that make
it easy to interpret and manipulate
 Forms the basis for inferential statistics
Properties of Normal Distribution
 Bell-shaped curve
 not every bell-shaped curve is a normal distribution
 Extends infinitely in both directions
 The curve does not intercept the abscissa
 Symmetrical about the mean
 All follow a particular distribution for area under
the curve
 Regardless of magnitude of mean and standard
deviation, area between any two points is the same
 Total area under the curve is equal to 1
The Empirical Rule
• Knowing the value of the mean and the
value of the standard deviation for a data
set can provide a great deal of information
about the data set.
• In particular, if the data set has a single
mound and is symmetrical (“bell-shaped”),
then one can generalize some properties of
the distribution.
• One such generalization is called the
Empirical Rule.
Empirical Rule

• One Sigma Rule – Approximately 68% of


the data values will lie within one standard
deviation from the mean.
• That is, one can expect a deviation of
more than one sigma from the mean to occur
once in every three observations.
• This true because approximately 33%
(approximately 1/3) of the values are outside
one standard deviation from the mean
Empirical Rule - One Sigma Rule

Graphical Display of the One Sigma Rule


Empirical Rule

• Two Sigma Rule – Approximately 95% of


the data values will lie within two standard
deviations from the mean.
• That is, one can expect a deviation of
more than two sigma from the mean to occur
once in every twenty observations.
• This true because approximately 5% (1/20)
of the values are outside two standard
deviations from the mean
Empirical Rule - Two Sigma Rule

Graphical Display of the Two Sigma Rule


Empirical Rule

• Three Sigma Rule – Approximately 99.7%


of the data values will lie within three
standard deviations from the mean.
• That is, one can expect a deviation of
more than three sigma from the mean to
occur once in every 333 observations.
• This true because approximately 0.3%
(1/333) of the values are outside three
standard deviations from the mean
Standard Normal Distribution
 Has a mean of 0 (zero) and a standard
deviation of 1
 Allows us to use one table and one curve for
all calculations and hypothesis tests

To find the standardized transformed


z score we use :
x
z

Areas Under the Normal Curve
0.5000 0.5000

 
N  ,  2  N (0,1)

0.0215 0.0215

0.3413
0.3413
0.1359

0.1359
z
2 0 2
3 1 1 3
0.6826
0.9544
0.9974
Finding Probabilities
Probability is
the area under
the curve! P c  X  d   ?

f(X)

X
c d
Which Table to Use?

Infinitely Many Normal Distributions


Means Infinitely Many Tables to Look Up!
Solution: The Cumulative
Standardized Normal Distribution
Cumulative Standardized
Normal Distribution Table Z  0 Z 1
(Portion)
Z .00 .01 .02
.5478
0.0 .5000 .5040 .5080

0.1 .5398 .5438 .5478


0.2 .5793 .5832 .5871 0
Probabilities
0.3 .6179 .6217 .6255 Z = 0.12

Only One Table is Needed


Standardizing Example
X   6.2  5
Z   0.12
 10
Normal Distribution Standardized
Normal Distribution
  10 Z 1

6.2 X 0.12 Z
 5 Z  0
Example:
P  2.9  X  7.1  .1664
X   2.9  5 X   7.1  5
Z   .21 Z   .21
 10  10
Normal Distribution Standardized
Normal Distribution
  10 Z 1
.0832
.0832

2.9 7.1 X 0.21 0.21 Z


 5 Z  0
Example:
P  2.9  X  7.1  .1664(continued)
Cumulative Standardized
Normal Distribution Table
(Portion)
Z  0 Z 1
Z .00 .01 .02
.5832
0.0 .5000 .5040 .5080
0.1 .5398 .5438 .5478

0.2 .5793 .5832 .5871 0


Z = 0.21
0.3 .6179 .6217 .6255
Example:
P  2.9  X  7.1  .1664
(continued)

Cumulative Standardized
Normal Distribution Table
(Portion)
Z  0 Z 1
Z .00 .01 .02 .4168
-0.3 .3821 .3783 .3745
-0.2 .4207 .4168 .4129

-0.1 .4602 .4562 .4522 0


Z = -0.21
0.0 .5000 .4960 .4920
Example:
P  X  8   .3821
X   85
Z   .30
 10
Normal Distribution Standardized
Normal Distribution
  10 Z 1
.3821

8 X 0.30 Z
 5 Z  0
Example:
P  X  8   .3821 (continued)

Cumulative Standardized
Normal Distribution Table
(Portion)
Z  0 Z 1
Z .00 .01 .02 .6179
0.0 .5000 .5040 .5080
0.1 .5398 .5438 .5478

0.2 .5793 .5832 .5871 0


Z = 0.30
0.3 .6179 .6217 .6255
Finding Z Values for Known
Probabilities
Cumulative Standardized
What is Z Given Normal Distribution Table
Probability = 0.6217 ? (Portion)

Z  0 Z 1 Z .00 .01 0.2

0.0 .5000 .5040 .5080


.6217
0.1 .5398 .5438 .5478

0.2 .5793 .5832 .5871


0
0.3 .6179 .6217 .6255
Z  .31
Recovering X Values for Known
Probabilities
Normal Distribution Standardized
Normal Distribution
  10
.6179 Z 1
.3821

X
 5 ? Z  0
0.30 Z

X    Z  5  .30 10   8
More Examples:Normal Distribution
A set of exam grades in this year’s class was found to be
normally distributed with a mean of 73 and a standard
deviation of 8.
What is the probability of getting a grade no higher than 91
on this exam?

X ~ N 73, 8 2  P  X  91  ?  8
Mean 73
Standard Deviation 8

Probability for X <= X


X Value 91
Z Value 2.25
  73 91
P(X<=91) 0.9877756
Z
0 2.25
More Examples:Normal Distribution
(continued)

What percentage of students scored between


65 and 89?

X ~ N 73, 8 2  P 65  X  89   ?

Probability for a Range


From X Value 65
To X Value 89
Z Value for 65 -1
Z Value for 89
P(X<=65)
2
0.1587
X
P(X<=89) 0.9772 65   73 89
P(65<=X<=89) 0.8186 Z
-1 0 2
More Examples:Normal Distribution
(continued)

Only 5% of the students taking the test scored


higher than what grade?
P ?  X   .05
X ~ N 73,8 2

Find X and Z Given Cum. Pctage.
Cumulative Percentage 95.00%
Z Value 1.644853
X Value 86.15882
X
  73 ? =86.16
Z
0 1.645
Assessing Normality
 Not All Continuous Random Variables are
Normally Distributed

 It is Important to Evaluate How Well the Data


Set Seems to Be Adequately Approximated
by a Normal Distribution
Assessing Normality
(continued)
 Construct Charts
 For small- or moderate-sized data sets, do the
stem-and-leaf display and box-and-whisker plot
look symmetric?
 For large data sets, does the histogram or polygon
appear bell-shaped?
 Compute Descriptive Summary Measures
 Do the mean, median and mode have similar
values?
 Is the interquartile range approximately 1.33 s?
Assessing Normality(continued)
 Observe the Distribution of the Data Set:
Remember Empirical Rule?
Lecture Summary
 Addressed the Probability of a Discrete
Random Variable
 Discussed Binomial Distribution
 Discussed Continuous Random Variables
 Discussed the Normal Distribution
 Described the Standard Normal Distribution
 Evaluated the Normality Assumption

You might also like