You are on page 1of 11

Probability Review

Lesson 6 Lesson 5 covered these Probability topics


Part 1  Addition rule for mutually exclusive events

 Joint, Marginal and Conditional probability for


two non-
non-mutually exclusive events
 Identifying Independent events
Normal Distribution  Screening test measures as examples of
conditional probabilities
 Application of Bayes Theorem to calculate
PPV and NPV

PubH 6414 Lesson 6 Part 1 2

Lesson 6 Overview Lesson 6 Outline


 Lesson 6 describes probability  Probability Distributions
distributions for numerical variables  Normal Distribution
 Part 1: Distributions for continuous data  Standard Normal (Z) Distribution
 Part 2: Distributions for discrete data
 Excel functions for normal distribution
probabilities
 NORMDIST and NORMSDIST functions
 NORMSINV and NORMINV functions

PubH 6414 Lesson 6 Part 1 3 PubH 6414 Lesson 6 Part 1 4

Probability Distributions Random Nominal Variable


 Any characteristic that can be measured or  Blood type is a random nominal variable
categorized is called a variable
variable..
 If the variable can assume a number of different  The blood type of a randomly selected individual
values such that any particular outcome is is unknown but the distribution of blood types in
determined by chance it is called a random the population can be described
variable.
Distribution of Blood types in the US
 Every random variable has a corresponding
probability distribution.
distribution. 50%
40%
Probability

 The probability distribution applies the theory of 30%

probability to describe the behavior of the 20%


10%
random variable.
variable. 0%
O A B AB

Blood Type

PubH 6414 Lesson 6 Part 1 5 PubH 6414 Lesson 6 Part 1 6

1
Probability Distributions for Probability Distributions for
Continuous Data Continuous Data
 Continuous Data can take on any value within
the range of possible values so describing the
distribution of continuous data in a table is not
very practical

 One solution is a histogram

PubH 6414 Lesson 6 Part 1 7 PubH 6414 Lesson 6 Part 1 8

Probability Density Curve


 Definition: a probability density curve is a curve
describing a continuous probability distribution

 Unlike a histogram of continuous data, the probability


density curve is a smooth line.

 Imagine: As the width of the intervals decreases (to


almost 0) and the sample size gets larger and larger.

PubH 6414 Lesson 6 Part 1 9 PubH 6414 Lesson 6 Part 1 10

Area under a Probability


Probability Density Curve
Density Curve
 The probability density is a smooth idealized
.4

curve that shows the shape of the distribution of a


random variable in the population
.3

The total area under a probability density curve =


Percentage


.2

1.0
 The probability density curve in the systolic blood
.1

pressure example has the bell-


bell-shape of a normal
distribution. Not all probability density curves
0

80 100 120 140 160 180


Systolic BP (mmHg)
have a normal distribution.
The Probability Density Curve for BP values for an infinite sample of men

PubH 6414 Lesson 6 Part 1 11 PubH 6414 Lesson 6 Part 1 12

2
Shapes of Probability Density
Normal Distribution
Curves  The Normal Distribution is also called
 There are many possible shapes for the the Gaussian Distribution after
probability density curves of continuous Karl Friedrich Gauss, a German
data. mathematician (1777 1855)
 Right Skewed
 Left Skewed  Characteristics of any Normal Distribution
 Bell
Bell--shaped curve
 Bimodal
 Unimodal peak is at the mean
 Multimodal
 Symmetric about the mean
 Mean = Median = Mode
 The most commonly used probability  Tails of the curve extend to infinity in both
distribution in the study of statistics is the directions
normal distribution
PubH 6414 Lesson 6 Part 1 13 PubH 6414 Lesson 6 Part 1 14

Normal Distribution
Q Is every variable normally distributed?
A No there are skewed (asymmetric)
distributions and there are bimodal
distributions.
Q Then why do we spend so much time studying
the normal distribution?
A Two answers:
1. Many variables in health research are normally
distributed
2. More importantly: Many statistical tests are based
on the normal distribution

PubH 6414 Lesson 6 Part 1 15 PubH 6414 Lesson 6 Part 1 16

Describing Normal
Symbol Notation
Distributions
A convention in statistics notation is to use Roman letters
for sample statistics and Greek letters for population  Every Normal distribution is uniquely
parameters. Since the density curve describes the population, described by its mean (
() and standard
Greek letters are used for the mean (mu) and SD (sigma) deviation (
()
 The Notation for a normal distribution is
Density
N(,
N(, )
Symbol Sample Curve
 N(125, 4) refers to a normal distribution with
mean = 125 and variance = 16.
Mean X
Standard
Deviation s
PubH 6414 Lesson 6 Part 1 17 PubH 6414 Lesson 6 Part 1 18

3
The 68-
68-95-
95-99.7 Approximation
for all Normal Distributions
Regardless of the mean and standard deviation of
the normal distribution:
 68% of the observations fall within one standard
Normal density with Two normal densities with different
mean=5 and =1 mean values and same deviation of the mean
 95% of the observations fall within approximately*
two standard deviations of the mean
 99.7% of the observations fall within three
standard deviations of the mean

Two normal densities with different and the


same mean
PubH 6414 Lesson 6 Part 1 19 PubH 6414 Lesson 6 Part 1 20

The 68-
68-95-
95-99.7 Approximation Distributions of Blood Pressure
for all Normal Distributions
.4

 A very small % of the observations are beyond


.3
3 standard deviations of the mean 68% = 125 mmHG
.2
= 14 mmHG
* 95% of the observations fall within 1.96 SD of the mean 95%

.1
99.7%

0
83 97 111 125 139 153 167

The 68-95-99.7 rule applied to the distribution


of systolic blood pressure in men.
PubH 6414 Lesson 6 Part 1 21 PubH 6414 Lesson 6 Part 1 22

Calculating Probabilities from Calculating Probabilities from


a Normal Distribution Curve a Normal Distribution Curve
 The total area under the curve = 1.0 which is the  What is the probability that a man has blood
total probability pressure between 111 and 139 mmHg?
 Areas for intervals under the curve can be
Using the 68% rule for a normal distribution and the mean and
interpreted as probability
standard deviation for SBP (previous slide), we know the
 What is the probability that a man has blood probability that a randomly selected man has blood pressure
pressure between 111 and 139 mmHg? between 111 and 139 mmHg = 0.68.

PubH 6414 Lesson 6 Part 1 23 PubH 6414 Lesson 6 Part 1 24

4
Calculating the Areas under
Areas under the Curve
the Curve
 What if you wanted to find the probability of a  Calculating area (or probability) under a normal
man having SBP < 105 mmHg? distribution curve is a numeric problem involving
integration of the formula for the normal
The 68-95-99.7
We want the area distribution (see page 77 of text). This is not an
below 105 rule cant be used
to find this area easy calculation. Other options are:
under the curve  Table A-
A-2 in the text is a table of areas under the
standard normal curve the normal distribution
with mean = 0 and standard deviation = 1
83 97 111
105
125 139 153 167  The NORMDIST function in Excel can be used
SBP in mmHg to find the area under a normal distribution
density curve.
PubH 6414 Lesson 6 Part 1 25 PubH 6414 Lesson 6 Part 1 26

NORMDIST function in Excel Using NORMDIST function


 NORMDIST returns the cumulative area from the
far left (negative infinity) of the normal density  What is the probability that a randomly
curve to the value specified. This is equal to the selected man has SBP < 105 mmHg?
probability of being less than the indicated value
 SBP ~ N(125, 142)
(X).
 You need to provide the value, the mean, the  For area less than some value use
standard deviation and an indicator (1 or TRUE) NORMDIST(value, , , , 1)
to request this cumulative area.
 =NORMDIST(X, , ,1) returns the probability of having
 =NORMDIST(105
=NORMDIST( 105,, 125, 14, 1) = 0.076
a value less than X.  The probability that a randomly selected
 1-NORMDIST(X, , ,1) returns the probability of having
a value greater than X
man has SBP < 105 mmHg = 0.076

PubH 6414 Lesson 6 Part 1 27 PubH 6414 Lesson 6 Part 1 28

Areas under the Curve Using NORMDIST function


 What if you wanted to find the probability of a  What is the probability that a man has
man having SBP > 150? SBP > 150 mmHg?
 Mean = 125, standard deviation = 14
We want the area  For area greater than some value, use
above 150
1 NORMDIST(value, , , , 1)
 =1-
=1-NORMDIST(
NORMDIST(150 150,, 125, 14, 1) = 0.037
83 97 111 125 139 153 167
 The probability that a randomly selected

SBP in mmHg
150 man has SBP > 150 = 0.037
PubH 6414 Lesson 6 Part 1 29 PubH 6414 Lesson 6 Part 1 30

5
Areas under the Curve Using NORMDIST function
 What if you wanted to find the probability of a  What is the probability that a man has
man having SBP between 115 and 135? SBP between 115 and 135 mmHg?
We want the area  For area between two values, subtract the
between 115 and 135
area to the left of the smaller value from
the area to the left of the larger value
 =NORMDIST(135, 125, 14, 1)
NORMDIST(115, 125, 14, 1) = 0.52
83 97 111 115 125 135139 153 167
 The probability that a man has SBP
between 115 and 135 mmHg = 0.52
SBP in mmHg

PubH 6414 Lesson 6 Part 1 31 PubH 6414 Lesson 6 Part 1 32

NORMDIST summary Standard Normal Distribution


 In this course, we wont be using the table of areas  The Standard Normal Distribution is the
under the standard normal curve to find
probabilities. normal distribution with
 Mean = 0
 Instead, use NORMDIST to find the area  Standard deviation = 1
(probability) under a normal distribution density  The notation for the Standard Normal
curve Distribution is N(0,1)
 For area < x: =NORMDIST(x, , ,1)  Note the 1 refers to the the SD
 For area > x: =1-
=1-NORMDIST(x, , ,1)  In the Standard Normal Distribution, the
 For area between a and b with b > a: variance is equal to the standard deviation.
=NORMDIST(b, , , , 1) NORMDIST(a, , , , 1)
PubH 6414 Lesson 6 Part 1 33 PubH 6414 Lesson 6 Part 1 34

Standard Normal
Formula for Z-
Z-score
Transformation
 Any normal distribution of some variable X can be
transformed to a standard normal distribution by
the following calculations: X
 Subtract the mean (
() from each value for X Z=
 Divide each value of X by the standard deviation
 These transformed variables are called Z-
Z-scores. Z is calculated by subtracting the mean () from X
 Sometimes referred to as z-
z-variables or zz--values or and dividing by the standard deviation ()
standard scores Subtracting the mean centers the distribution at 0
Dividing by , rescales the standard deviation to 1

PubH 6414 Lesson 6 Part 1 35 PubH 6414 Lesson 6 Part 1 36

6
Standard Normal Scores
Divide by standard deviation
SubtractMean =
the mean The z-
z-score is interpreted as the number of SD
Subtract
SD = the mean an observation is from the mean
 Z = 1: The observation lies one SD above
the mean
Standard normal curve  Z = 2: The observation is two SD above the
mean
 Z = -1: The observation lies 1 SD below the
mean
 Z = -2: The observation lies 2 SD below the
mean
PubH 6414 Lesson 6 Part 1 37 PubH 6414 Lesson 6 Part 1 38

Standard Normal Distribution Standard Normal Distribution


95% of
area

50% of area < 0 50% of area > 0 2.5% of 2.5% of


probability=0.5 probability=0.5 area area

Standard Normal
Since the area under the curve = 1.0, 50% of the area is on either side of Distribution with 95% area
the mean. marked
Therefore, the probability of an observation being greater than 0 = 0.50 95% of the probability is between z = 1.96 and z = -1.96 on the standard normal curve
and the probability that an observation is less than 0 = 0.50.
PubH 6414 Lesson 6 Part 1 39 PubH 6414 Lesson 6 Part 1 40

Standard Normal Scores NORMSDIST function in Excel


 Example: Male Systolic Blood Pressure has
mean = 125, standard deviation = 14 mmHg  The NORMSDIST function in Excel can be used
 For SBP = 150 mmHg what is the ZZ--score? to find the area under a standard normal curve
 You can remember that NORMS
NORMSDIST is for the
Standard Normal distribution because of the S
 Z=
 NORMSDIST(Z) gives the area to the left of the
 The probability of having SBP > 150 is equal to the
indicated z-
z-score. The mean and standard
area under the standard normal curve > 1.79
deviation do not need to be specified since they
are known (
( = 0 and = 1)
Area > 1.79
 For areas greater than a z-
z-score, use
1-NORMSDIST(Z)

PubH 6414 Lesson 6 Part 1 41 PubH 6414 Lesson 6 Part 1 42

7
Using NORMSDIST Using NORMSDIST
 What is the probability that a man has SBP < 105?
 What is the probability that a man has SBP > 150?  Calculate the z-
z-score for 105 from the normal
 First calculate the Z-
Z-score for 150 distribution with = 125 and = 14
150 125
Z= = 1.79 Z=
14
 In EXCEL use =1 - NORMSDIST(1.79) = 0.0367
 In Excel use =NORMSDIST(-
=NORMSDIST(-1.43) = 0.076
 The probability that a man has SBP > 150 = 0.037.
 This is the same as the result using
 This is the same as the probability obtained using NORMDIST(105, 125, 14, 1) = 0.076
the NORMDIST function  The probability that a randomly selected man has
SBP < 105 = 0.076
PubH 6414 Lesson 6 Part 1 43 PubH 6414 Lesson 6 Part 1 44

SBP between 115 and 135 Using NORMSDIST


 What is the probability of having SBP between 115  For areas between two values, subtract the area to
and 135? the left of the smaller value from the area to the left
of the larger value.
 Find the Z-
Z-scores for SBP = 115 and SBP = 135
 Use the z-z-scores with NORMSDIST:
115 125 135 125 =NORMSDIST(0.71) NORMSDIST(
NORMSDIST(--0.71) = 0.52
Z= = -0.71 Z= = 0.71
14 14
 Compare this to the result obtained using
NORMDIST:
=NORMDIST(135, 125, 14, 1) NORMDIST(115,125, 14, 1) =
Area between 0.71 and 0.71
0.52
 The probability that a man has SBP between 115
and 135 = 0.52.
PubH 6414 Lesson 6 Part 1 45 PubH 6414 Lesson 6 Part 1 46

Interpreting results from


The Inverse problem
NORMSDIST
 The interpretation of the probability is always  What if you instead of finding the area for
stated in terms of the original data scale, not the a zz--score you want to know the z-
z-score for
transformed z-
z-distribution a specified area?
 Why bother with the extra step of calculating the
z-score?
 NORMSINV in Excel can find the z- z-score
 z-scores are used to find probabilities from standard for a specified area
normal tables such as Table A-A-2 in the text appendix  NORMINV in Excel can find the x -value
 The z-
z-score represents the number of standard
deviations an observation is from the mean which can for a specified area from any normally
be useful in understanding and visualizing the data. distributed variable
 Comparisons between groups (unit less measure)

PubH 6414 Lesson 6 Part 1 47 PubH 6414 Lesson 6 Part 1 48

8
Inverse problem: Ex. 1 NORMSINV function in Excel
 Find a z value such that the probability of obtaining  Find the z-
z-score such that the probability of
a larger z score = 0.10. having a larger z-
z-score = 0.10
 NORMSINV(0.10) returns the z- z-score such that
the probability of being < Z = 0.10
Area=0.10  If the area > than the z-
z-score = 0.1, then the
area < than the z-
z-score = 1 0.1 = 0.9
 Use NORMSINV(0.9) = 1.28
 The probability that a z-
z-score is greater than
1.28 = 0.10
What is this z score?

PubH 6414 Lesson 6 Part 1 49 PubH 6414 Lesson 6 Part 1 50

Inverse problem: Ex. 1 NORMSINV


 Find a z-
z-value such that the probability of obtaining
a smaller z score = 0.25  Find the z-
z-score such that the probability
of a smaller z-
z-score = 0.25
 NORMSINV(0.25) returns the z- z-score for
this probability
Area = 0.25
 NORMSINV(0.25) = -0.67
 The probability that a z-
z-score is less than
-0.67 = 0.25
What is this z-score?
PubH 6414 Lesson 6 Part 1 51 PubH 6414 Lesson 6 Part 1 52

NORMINV function in Excel NORMINV function in Excel


 The NORMINV function is used to return the x- x-  SBP for men is normally distributed with
value for a specified area under any normal mean = 125 and standard deviation = 14
distribution curve. The mean and standard
deviation need to be specified with the  Find the SBP value such that 10% of men
NORMINV function have a value higher than this
 =NORMINV (area, ,) ,) will return the x-
x-value  =NORMINV(.90, 125, 14) = 142.9
with the indicated area less than this X.  10% of men have SBP > 142.9
 =NORMINV(1--area, ,)
=NORMINV(1 ,) will return the x-
x-value
with the indicated area greater than this X.

PubH 6414 Lesson 6 Part 1 53 PubH 6414 Lesson 6 Part 1 54

9
Human conception & the normal Using the 68% - 95% - 99.7%
curve approximation.

 Length of human conception to birth varies


according to a distribution that is
approximately normal with mean 266 days
and standard deviation 16 days.

X ~ N(266, 16)

What percent of the data fall above What percent of the data fall below
266 days? 234 days?

1. 5% 1. 2.5%
2. 34% 2. 5%
3. 50% 3. 34%
4. 68% 4. 50%
5. 81.5% 5. 81.5%

What percent of the data fall The top 16% of pregnancies last
between 250 days and 298 days? approximately how many days?

1. 34% 1. 266 days


2. 50% 2. 282 days
3. 68% 3. 298 days
4. 81.5% 4. 314 days
5. 95% 5. Cannot be
determined from
this information

10
WHICH EQUATION WILL GIVE YOU THE
The standard normal curve. RED SHADED AREA UNDER THE CURVE?

1. P(Z<1)
2. P(0<Z<1)
3. P(Z>1)
4. P(Z=1)
5. P(Z1)

WHICH EQUATION WILL GIVE YOU THE WHICH EQUATION WILL GIVE YOU THE
RED SHADED AREA UNDER THE CURVE? RED SHADED AREA UNDER THE CURVE?

1. P(Z<1) 1. P(Z2)
2. P(0<Z<1) 2. P(Z<--1)-
P(Z< 1)-P(Z<2)
3. P(Z>--1)
P(Z> 3. 1-P(Z<
P(Z<--1)
4. P(Z=--1)
P(Z= 4. P(Z<2)--P(Z
P(Z<2) P(Z--1)
5. P(Z--1)
P(Z 5. 1- P(Z
P(Z--2)

WHICH EQUATION CAN BE USED TO


SOLVE FOR THE z* VALUE? Readings and Assignments

1. P(Z0.62)=z*  Reading
 Chapter 4 pgs. 76 80: Normal Distribution
2. P(Z>z*)=0.62
 Lesson 6 Practice Exercises
3. 1-0.62=P(Z<z*)
 Work through the Excel Module 6
4. P(Z<z*)=0.62
examples
5. 1- P(Zz*)= 0.62
 Start Homework 4

PubH 6414 Lesson 6 Part 1 66

11