You are on page 1of 11

# Probability Review

## Lesson 6 Lesson 5 covered these Probability topics

Part 1  Addition rule for mutually exclusive events

##  Joint, Marginal and Conditional probability for

two non-
non-mutually exclusive events
 Identifying Independent events
Normal Distribution  Screening test measures as examples of
conditional probabilities
 Application of Bayes Theorem to calculate
PPV and NPV

## Lesson 6 Overview Lesson 6 Outline

 Lesson 6 describes probability  Probability Distributions
distributions for numerical variables  Normal Distribution
 Part 1: Distributions for continuous data  Standard Normal (Z) Distribution
 Part 2: Distributions for discrete data
 Excel functions for normal distribution
probabilities
 NORMDIST and NORMSDIST functions
 NORMSINV and NORMINV functions

## Probability Distributions Random Nominal Variable

 Any characteristic that can be measured or  Blood type is a random nominal variable
categorized is called a variable
variable..
 If the variable can assume a number of different  The blood type of a randomly selected individual
values such that any particular outcome is is unknown but the distribution of blood types in
determined by chance it is called a random the population can be described
variable.
Distribution of Blood types in the US
 Every random variable has a corresponding
probability distribution.
distribution. 50%
40%
Probability

10%
random variable.
variable. 0%
O A B AB

Blood Type

## PubH 6414 Lesson 6 Part 1 5 PubH 6414 Lesson 6 Part 1 6

1
Probability Distributions for Probability Distributions for
Continuous Data Continuous Data
 Continuous Data can take on any value within
the range of possible values so describing the
distribution of continuous data in a table is not
very practical

## Probability Density Curve

 Definition: a probability density curve is a curve
describing a continuous probability distribution

##  Unlike a histogram of continuous data, the probability

density curve is a smooth line.

##  Imagine: As the width of the intervals decreases (to

almost 0) and the sample size gets larger and larger.

## Area under a Probability

Probability Density Curve
Density Curve
 The probability density is a smooth idealized
.4

## curve that shows the shape of the distribution of a

random variable in the population
.3

## The total area under a probability density curve =

Percentage


.2

1.0
 The probability density curve in the systolic blood
.1

## pressure example has the bell-

bell-shape of a normal
distribution. Not all probability density curves
0

## 80 100 120 140 160 180

Systolic BP (mmHg)
have a normal distribution.
The Probability Density Curve for BP values for an infinite sample of men

## PubH 6414 Lesson 6 Part 1 11 PubH 6414 Lesson 6 Part 1 12

2
Shapes of Probability Density
Normal Distribution
Curves  The Normal Distribution is also called
 There are many possible shapes for the the Gaussian Distribution after
probability density curves of continuous Karl Friedrich Gauss, a German
data. mathematician (1777 1855)
 Right Skewed
 Left Skewed  Characteristics of any Normal Distribution
 Bell
Bell--shaped curve
 Bimodal
 Unimodal peak is at the mean
 Multimodal
 Mean = Median = Mode
 The most commonly used probability  Tails of the curve extend to infinity in both
distribution in the study of statistics is the directions
normal distribution
PubH 6414 Lesson 6 Part 1 13 PubH 6414 Lesson 6 Part 1 14

Normal Distribution
Q Is every variable normally distributed?
A No there are skewed (asymmetric)
distributions and there are bimodal
distributions.
Q Then why do we spend so much time studying
the normal distribution?
1. Many variables in health research are normally
distributed
2. More importantly: Many statistical tests are based
on the normal distribution

## PubH 6414 Lesson 6 Part 1 15 PubH 6414 Lesson 6 Part 1 16

Describing Normal
Symbol Notation
Distributions
A convention in statistics notation is to use Roman letters
for sample statistics and Greek letters for population  Every Normal distribution is uniquely
parameters. Since the density curve describes the population, described by its mean (
() and standard
Greek letters are used for the mean (mu) and SD (sigma) deviation (
()
 The Notation for a normal distribution is
Density
N(,
N(, )
Symbol Sample Curve
 N(125, 4) refers to a normal distribution with
mean = 125 and variance = 16.
Mean X
Standard
Deviation s
PubH 6414 Lesson 6 Part 1 17 PubH 6414 Lesson 6 Part 1 18

3
The 68-
68-95-
95-99.7 Approximation
for all Normal Distributions
Regardless of the mean and standard deviation of
the normal distribution:
 68% of the observations fall within one standard
Normal density with Two normal densities with different
mean=5 and =1 mean values and same deviation of the mean
 95% of the observations fall within approximately*
two standard deviations of the mean
 99.7% of the observations fall within three
standard deviations of the mean

## Two normal densities with different and the

same mean
PubH 6414 Lesson 6 Part 1 19 PubH 6414 Lesson 6 Part 1 20

The 68-
68-95-
95-99.7 Approximation Distributions of Blood Pressure
for all Normal Distributions
.4

##  A very small % of the observations are beyond

.3
3 standard deviations of the mean 68% = 125 mmHG
.2
= 14 mmHG
* 95% of the observations fall within 1.96 SD of the mean 95%

.1
99.7%

0
83 97 111 125 139 153 167

## The 68-95-99.7 rule applied to the distribution

of systolic blood pressure in men.
PubH 6414 Lesson 6 Part 1 21 PubH 6414 Lesson 6 Part 1 22

## Calculating Probabilities from Calculating Probabilities from

a Normal Distribution Curve a Normal Distribution Curve
 The total area under the curve = 1.0 which is the  What is the probability that a man has blood
total probability pressure between 111 and 139 mmHg?
 Areas for intervals under the curve can be
Using the 68% rule for a normal distribution and the mean and
interpreted as probability
standard deviation for SBP (previous slide), we know the
 What is the probability that a man has blood probability that a randomly selected man has blood pressure
pressure between 111 and 139 mmHg? between 111 and 139 mmHg = 0.68.

## PubH 6414 Lesson 6 Part 1 23 PubH 6414 Lesson 6 Part 1 24

4
Calculating the Areas under
Areas under the Curve
the Curve
 What if you wanted to find the probability of a  Calculating area (or probability) under a normal
man having SBP < 105 mmHg? distribution curve is a numeric problem involving
integration of the formula for the normal
The 68-95-99.7
We want the area distribution (see page 77 of text). This is not an
below 105 rule cant be used
to find this area easy calculation. Other options are:
under the curve  Table A-
A-2 in the text is a table of areas under the
standard normal curve the normal distribution
with mean = 0 and standard deviation = 1
83 97 111
105
125 139 153 167  The NORMDIST function in Excel can be used
SBP in mmHg to find the area under a normal distribution
density curve.
PubH 6414 Lesson 6 Part 1 25 PubH 6414 Lesson 6 Part 1 26

## NORMDIST function in Excel Using NORMDIST function

 NORMDIST returns the cumulative area from the
far left (negative infinity) of the normal density  What is the probability that a randomly
curve to the value specified. This is equal to the selected man has SBP < 105 mmHg?
probability of being less than the indicated value
 SBP ~ N(125, 142)
(X).
 You need to provide the value, the mean, the  For area less than some value use
standard deviation and an indicator (1 or TRUE) NORMDIST(value, , , , 1)
to request this cumulative area.
 =NORMDIST(X, , ,1) returns the probability of having
 =NORMDIST(105
=NORMDIST( 105,, 125, 14, 1) = 0.076
a value less than X.  The probability that a randomly selected
 1-NORMDIST(X, , ,1) returns the probability of having
a value greater than X
man has SBP < 105 mmHg = 0.076

## Areas under the Curve Using NORMDIST function

 What if you wanted to find the probability of a  What is the probability that a man has
man having SBP > 150? SBP > 150 mmHg?
 Mean = 125, standard deviation = 14
We want the area  For area greater than some value, use
above 150
1 NORMDIST(value, , , , 1)
 =1-
=1-NORMDIST(
NORMDIST(150 150,, 125, 14, 1) = 0.037
83 97 111 125 139 153 167
 The probability that a randomly selected

SBP in mmHg
150 man has SBP > 150 = 0.037
PubH 6414 Lesson 6 Part 1 29 PubH 6414 Lesson 6 Part 1 30

5
Areas under the Curve Using NORMDIST function
 What if you wanted to find the probability of a  What is the probability that a man has
man having SBP between 115 and 135? SBP between 115 and 135 mmHg?
We want the area  For area between two values, subtract the
between 115 and 135
area to the left of the smaller value from
the area to the left of the larger value
 =NORMDIST(135, 125, 14, 1)
NORMDIST(115, 125, 14, 1) = 0.52
83 97 111 115 125 135139 153 167
 The probability that a man has SBP
between 115 and 135 mmHg = 0.52
SBP in mmHg

## NORMDIST summary Standard Normal Distribution

 In this course, we wont be using the table of areas  The Standard Normal Distribution is the
under the standard normal curve to find
probabilities. normal distribution with
 Mean = 0
 Instead, use NORMDIST to find the area  Standard deviation = 1
(probability) under a normal distribution density  The notation for the Standard Normal
curve Distribution is N(0,1)
 For area < x: =NORMDIST(x, , ,1)  Note the 1 refers to the the SD
 For area > x: =1-
=1-NORMDIST(x, , ,1)  In the Standard Normal Distribution, the
 For area between a and b with b > a: variance is equal to the standard deviation.
=NORMDIST(b, , , , 1) NORMDIST(a, , , , 1)
PubH 6414 Lesson 6 Part 1 33 PubH 6414 Lesson 6 Part 1 34

Standard Normal
Formula for Z-
Z-score
Transformation
 Any normal distribution of some variable X can be
transformed to a standard normal distribution by
the following calculations: X
 Subtract the mean (
() from each value for X Z=
 Divide each value of X by the standard deviation
 These transformed variables are called Z-
Z-scores. Z is calculated by subtracting the mean () from X
 Sometimes referred to as z-
z-variables or zz--values or and dividing by the standard deviation ()
standard scores Subtracting the mean centers the distribution at 0
Dividing by , rescales the standard deviation to 1

## PubH 6414 Lesson 6 Part 1 35 PubH 6414 Lesson 6 Part 1 36

6
Standard Normal Scores
Divide by standard deviation
SubtractMean =
the mean The z-
z-score is interpreted as the number of SD
Subtract
SD = the mean an observation is from the mean
 Z = 1: The observation lies one SD above
the mean
Standard normal curve  Z = 2: The observation is two SD above the
mean
 Z = -1: The observation lies 1 SD below the
mean
 Z = -2: The observation lies 2 SD below the
mean
PubH 6414 Lesson 6 Part 1 37 PubH 6414 Lesson 6 Part 1 38

95% of
area

## 50% of area < 0 50% of area > 0 2.5% of 2.5% of

probability=0.5 probability=0.5 area area

Standard Normal
Since the area under the curve = 1.0, 50% of the area is on either side of Distribution with 95% area
the mean. marked
Therefore, the probability of an observation being greater than 0 = 0.50 95% of the probability is between z = 1.96 and z = -1.96 on the standard normal curve
and the probability that an observation is less than 0 = 0.50.
PubH 6414 Lesson 6 Part 1 39 PubH 6414 Lesson 6 Part 1 40

## Standard Normal Scores NORMSDIST function in Excel

 Example: Male Systolic Blood Pressure has
mean = 125, standard deviation = 14 mmHg  The NORMSDIST function in Excel can be used
 For SBP = 150 mmHg what is the ZZ--score? to find the area under a standard normal curve
 You can remember that NORMS
NORMSDIST is for the
Standard Normal distribution because of the S
 Z=
 NORMSDIST(Z) gives the area to the left of the
 The probability of having SBP > 150 is equal to the
indicated z-
z-score. The mean and standard
area under the standard normal curve > 1.79
deviation do not need to be specified since they
are known (
( = 0 and = 1)
Area > 1.79
 For areas greater than a z-
z-score, use
1-NORMSDIST(Z)

## PubH 6414 Lesson 6 Part 1 41 PubH 6414 Lesson 6 Part 1 42

7
Using NORMSDIST Using NORMSDIST
 What is the probability that a man has SBP < 105?
 What is the probability that a man has SBP > 150?  Calculate the z-
z-score for 105 from the normal
 First calculate the Z-
Z-score for 150 distribution with = 125 and = 14
150 125
Z= = 1.79 Z=
14
 In EXCEL use =1 - NORMSDIST(1.79) = 0.0367
 In Excel use =NORMSDIST(-
=NORMSDIST(-1.43) = 0.076
 The probability that a man has SBP > 150 = 0.037.
 This is the same as the result using
 This is the same as the probability obtained using NORMDIST(105, 125, 14, 1) = 0.076
the NORMDIST function  The probability that a randomly selected man has
SBP < 105 = 0.076
PubH 6414 Lesson 6 Part 1 43 PubH 6414 Lesson 6 Part 1 44

## SBP between 115 and 135 Using NORMSDIST

 What is the probability of having SBP between 115  For areas between two values, subtract the area to
and 135? the left of the smaller value from the area to the left
of the larger value.
 Find the Z-
Z-scores for SBP = 115 and SBP = 135
 Use the z-z-scores with NORMSDIST:
115 125 135 125 =NORMSDIST(0.71) NORMSDIST(
NORMSDIST(--0.71) = 0.52
Z= = -0.71 Z= = 0.71
14 14
 Compare this to the result obtained using
NORMDIST:
=NORMDIST(135, 125, 14, 1) NORMDIST(115,125, 14, 1) =
Area between 0.71 and 0.71
0.52
 The probability that a man has SBP between 115
and 135 = 0.52.
PubH 6414 Lesson 6 Part 1 45 PubH 6414 Lesson 6 Part 1 46

## Interpreting results from

The Inverse problem
NORMSDIST
 The interpretation of the probability is always  What if you instead of finding the area for
stated in terms of the original data scale, not the a zz--score you want to know the z-
z-score for
transformed z-
z-distribution a specified area?
 Why bother with the extra step of calculating the
z-score?
 NORMSINV in Excel can find the z- z-score
 z-scores are used to find probabilities from standard for a specified area
normal tables such as Table A-A-2 in the text appendix  NORMINV in Excel can find the x -value
 The z-
z-score represents the number of standard
deviations an observation is from the mean which can for a specified area from any normally
be useful in understanding and visualizing the data. distributed variable
 Comparisons between groups (unit less measure)

## PubH 6414 Lesson 6 Part 1 47 PubH 6414 Lesson 6 Part 1 48

8
Inverse problem: Ex. 1 NORMSINV function in Excel
 Find a z value such that the probability of obtaining  Find the z-
z-score such that the probability of
a larger z score = 0.10. having a larger z-
z-score = 0.10
 NORMSINV(0.10) returns the z- z-score such that
the probability of being < Z = 0.10
Area=0.10  If the area > than the z-
z-score = 0.1, then the
area < than the z-
z-score = 1 0.1 = 0.9
 Use NORMSINV(0.9) = 1.28
 The probability that a z-
z-score is greater than
1.28 = 0.10
What is this z score?

## Inverse problem: Ex. 1 NORMSINV

 Find a z-
z-value such that the probability of obtaining
a smaller z score = 0.25  Find the z-
z-score such that the probability
of a smaller z-
z-score = 0.25
 NORMSINV(0.25) returns the z- z-score for
this probability
Area = 0.25
 NORMSINV(0.25) = -0.67
 The probability that a z-
z-score is less than
-0.67 = 0.25
What is this z-score?
PubH 6414 Lesson 6 Part 1 51 PubH 6414 Lesson 6 Part 1 52

## NORMINV function in Excel NORMINV function in Excel

 The NORMINV function is used to return the x- x-  SBP for men is normally distributed with
value for a specified area under any normal mean = 125 and standard deviation = 14
distribution curve. The mean and standard
deviation need to be specified with the  Find the SBP value such that 10% of men
NORMINV function have a value higher than this
 =NORMINV (area, ,) ,) will return the x-
x-value  =NORMINV(.90, 125, 14) = 142.9
with the indicated area less than this X.  10% of men have SBP > 142.9
 =NORMINV(1--area, ,)
=NORMINV(1 ,) will return the x-
x-value
with the indicated area greater than this X.

## PubH 6414 Lesson 6 Part 1 53 PubH 6414 Lesson 6 Part 1 54

9
Human conception & the normal Using the 68% - 95% - 99.7%
curve approximation.

##  Length of human conception to birth varies

according to a distribution that is
approximately normal with mean 266 days
and standard deviation 16 days.

X ~ N(266, 16)

What percent of the data fall above What percent of the data fall below
266 days? 234 days?

1. 5% 1. 2.5%
2. 34% 2. 5%
3. 50% 3. 34%
4. 68% 4. 50%
5. 81.5% 5. 81.5%

What percent of the data fall The top 16% of pregnancies last
between 250 days and 298 days? approximately how many days?

## 1. 34% 1. 266 days

2. 50% 2. 282 days
3. 68% 3. 298 days
4. 81.5% 4. 314 days
5. 95% 5. Cannot be
determined from
this information

10
WHICH EQUATION WILL GIVE YOU THE
The standard normal curve. RED SHADED AREA UNDER THE CURVE?

1. P(Z<1)
2. P(0<Z<1)
3. P(Z>1)
4. P(Z=1)
5. P(Z1)

WHICH EQUATION WILL GIVE YOU THE WHICH EQUATION WILL GIVE YOU THE
RED SHADED AREA UNDER THE CURVE? RED SHADED AREA UNDER THE CURVE?

1. P(Z<1) 1. P(Z2)
2. P(0<Z<1) 2. P(Z<--1)-
P(Z< 1)-P(Z<2)
3. P(Z>--1)
P(Z> 3. 1-P(Z<
P(Z<--1)
4. P(Z=--1)
P(Z= 4. P(Z<2)--P(Z
P(Z<2) P(Z--1)
5. P(Z--1)
P(Z 5. 1- P(Z
P(Z--2)

## WHICH EQUATION CAN BE USED TO

SOLVE FOR THE z* VALUE? Readings and Assignments