Professional Documents
Culture Documents
Normal Distribution
DR. LORI BOIES
ST. MARY ’S UNIVERSITY
@BoiesBiology
1
Random Variables
Random Variables
A random variable can assume a number of possible values
where those values result from a random process (by chance)
We use a capital letter, like X, to denote a random variable
The values of a random variable are denoted with a lowercase letter, in
this case x
For example, P(X = x) means that the probability of a random variable X
takes on a specific value x
Examples of types of random variables
There are two types of random variables:
Discrete random variables often take only integer values
Example: Number of patient visits to a screening clinic, Difference in
number of flu vaccinations in Bexar county in 2016 compared to 2017.
.4
◦ Normal (or Gaussian)
◦ Student’s t
.3
◦ Uniform
Density
.2
◦ Chi-squared
.1
Discrete
◦ Bernoulli
0
-4 -2 0 2 4
◦ Binomial
normal
http://blog.okcupid.com/index.php/the-biggest-lies-in-online-dating
Heights of Females
“When we looked into the data
for women, we were surprised to
see height exaggeration was just
as widespread, though without
the lurch towards a benchmark
height.”
http://blog.okcupid.com/index.php/the-biggest-lies-in-online-dating
Normal Distributions with Different Parameters
SAT scores are distributed nearly normally with mean 1500 and
standard deviation 300. ACT scores are distributed nearly normally
with mean 21 and standard deviation 5. A college admissions officer
wants to determine which of the two applicants scored better on
their standardized test with respect to the other test takers: Pam,
who earned an 1800 on her SAT, or Jim, who scored a 24 on his ACT?
Standardizing with Z scores
Since we cannot just compare these two raw scores, we instead compare how many standard
deviations beyond the mean each observation is.
Pam's score is (1800 - 1500) / 300 = 1 standard deviation above the mean.
Jim's score is (24 - 21) / 5 = 0.6 standard deviations above the mean.
Standardizing with Z scores
These are called standardized scores, or Z scores.
Z score of an observation is the number of standard deviations it falls above or below the mean.
Z scores are defined for distributions of any shape, but only when the distribution is normal
can we use Z scores to calculate percentiles.
Observations that are more than 2 SD away from the mean (|Z| > 2) are usually considered
unusual.
Percentiles
Percentile is the percentage of observations that fall below a given data point.
Graphically, percentile is the area below the probability distribution curve to
the left of that observation.
Calculating percentiles using tables
Let’s Practice!
BMI Example
Consider body mass index (BMI) in a population in which BMI is normally distributed and has a
mean value = 24 and a standard deviation = 6. What percent of people in this population has a
BMI below 24.9?
Recall:
z = z-score
(𝒙𝒙 − 𝝁𝝁) x = experimental result
𝒛𝒛 =
𝝈𝝈 µ = mean value
σ = standard deviation
Let’s Practice! (𝒙𝒙 − 𝝁𝝁)
𝒛𝒛 =
BMI Example 𝝈𝝈
Consider body mass index (BMI) in a population in which BMI is normally distributed and has a
mean value = 24 and a standard deviation = 6. What percent of people in this population has a
BMI below 24.9?
Let X = BMI level of a person: X ~ N(µ = 24, σ = 6)
What is the probability of having a BMI less than 24.9? P(X < 24.9)
24 24.9
Let’s Practice! (𝒙𝒙 − 𝝁𝝁)
𝒛𝒛 =
BMI Example 𝝈𝝈
Consider body mass index (BMI) in a population in which BMI is normally distributed and has a
mean value = 24 and a standard deviation = 6. What percent of people in this population has a
BMI below 24.9?
Let X = BMI level of a person: X ~ N(µ = 24, σ = 6)
What is the probability of having a BMI less than 24.9? P(X < 24.9)
This is actually equivalent to P(Z < 0.15)
24.9 − 24
𝑍𝑍 = = 0.15
6
0 0.15
Let’s Practice!
BMI Example
Let’s find the exact probability using the Z table
Let’s Practice!
BMI Example
𝑃𝑃(𝑋𝑋 < 24.9) 24.9 − 24
𝑍𝑍 = = 0.15
6
𝑃𝑃 𝑍𝑍 < 0.15
𝑃𝑃 𝑍𝑍 < 0.15 =0.5596
-0.92 0
Let’s Try It Again!
BMI Example 2
Let’s Try It Again!
BMI Example 2
According to NIH, the BMI level below 18.5 is considered as underweight. What is the percent of
people in this population is underweight?
Let X = BMI level of a person: X ~ N(µ = 24, σ = 6)
𝑃𝑃(𝑋𝑋 < 18.5) ? 18.5 − 24
𝑍𝑍 = = −0.92
6
𝑃𝑃(𝑍𝑍 < −0.92)
𝑃𝑃 𝑍𝑍 ≤ −0.92 = 0.1788
0 0.92
0 0.92 0 0.92
0 0.92 0 0.92
0 0.92 0 0.92
R Input to calculate p-value: R Input to calculate p-value:
pnorm(q=0.92, lower.tail=FALSE)
0 1
How about once more for fun?
BMI Example 3
How about once more for fun?
BMI Example 3
According to NIH, the BMI level above 30 is considered as obese. What percent of people in this
population are obese?
𝑃𝑃 𝑋𝑋 > 30 ? 30 − 24
𝑍𝑍 = =1
6
𝑃𝑃 𝑍𝑍 ≤ 1 𝑃𝑃(𝑍𝑍 > 1) 𝑃𝑃 𝑍𝑍 > 1 = 1 − 𝑃𝑃 𝑍𝑍 ≤ 1
= 1 − 0.8413 = 0.1587
0 1
15.87% people in this population are obese.
R:
https://www.statology.org/p-value-of-z-score-r/ Information on how calculate p-values from z-score in R
How about even more fun… let’s change
it up a bit! - BMI Example 5
According to NIH, the BMI level between 18.5 and 24.9 is considered as healthy weight. What
percent of people in this population have healthy weight?
𝑷𝑷 𝟏𝟏𝟏𝟏. 𝟓𝟓 < 𝑿𝑿 < 𝟐𝟐𝟐𝟐. 𝟗𝟗
18.5 − 24
𝑃𝑃 −0.92 < 𝑍𝑍 < 0.15 𝑍𝑍 = = −0.92
6
24.9 − 24
𝑍𝑍 = = 0.15
6
How about even more fun… let’s change
it up a bit! - BMI Example 5
According to NIH, the BMI level between 18.5 and 24.9 is considered as healthy weight. What
percent of people in this population have healthy weight?
𝑷𝑷 𝟏𝟏𝟏𝟏. 𝟓𝟓 < 𝑿𝑿 < 𝟐𝟐𝟐𝟐. 𝟗𝟗
18.5 − 24
𝑃𝑃 −0.92 < 𝑍𝑍 < 0.15 𝑍𝑍 = = −0.92
6
24.9 − 24
𝑍𝑍 = = 0.15
6
𝑃𝑃 −0.92 < 𝑍𝑍 < 0.15
= 𝑃𝑃 𝑍𝑍 < 0.15 − 𝑃𝑃 𝑍𝑍 < −0.92
=0.5596-0.1788=0.3808
38.08% people in this population have healthy weight.
68 – 95 – 99.7 Rule
(Empirical Rule)
For nearly normally distributed data,
about 68% falls within 1 SD of the mean,
about 95% falls within 2 SD of the mean,
about 99.7% falls within 3 SD of the mean.
It is possible for observations to fall 4,
5, or more standard deviations away
from the mean, but these occurrences
are very rare if the data are nearly
normal.
The Empirical Rule states that for a given dataset with a normal distribution, 95% of data values fall within
two standard deviations of the mean. This means that 47.5% of values fall between the mean and two
standard deviations below the mean.
In this example, 35.6 is located two standard deviations below the mean. Since we know that 50% of data
values fall above the mean in a normal distribution, a total of 50% + 47.5% = 97.5% of values fall above 35.6.
https://www.statology.org/empirical-rule-practice-problems/