L3 Biostatistics NormalDistribution

Biostatistics for Life Science
Normal Distribution
DR. LORI BOIES
ST. MARY ’S UNIVERSITY
@BoiesBiology
1
Random Variables
Random Variables
A random variable can assume a number of possible values
where those values result from a random process (by chance)
 We use a capital letter, like X, to denote a random variable
 The values of a random variable are denoted with a lowercase letter, in
this case x
 For example, P(X = x) means that the probability of a random variable X
takes on a specific value x
Examples of types of random variables
There are two types of random variables:
Discrete random variables often take only integer values
 Example: Number of patient visits to a screening clinic, Difference in
number of flu vaccinations in Bexar county in 2016 compared to 2017.
Continuous random variables take real (decimal) values

 Example: BMI, Blood pressure difference between subjects on an
antihypertensive treatment versus placebo treatment. Serum cholesterol
level of adult males in the US.
Probabilities functions for random variables
Probability Density Function (pdf)
◦ Continuous random variables
◦ Because continuous, the probability of any single value is 0.
◦ Probabilities for a range of values of the random variable, P(𝑋𝑋 < 𝑥𝑥)
◦ Requires calculus to compute (but R can do for us, or look up values in Tables
provided in book)
Probability Mass Function (pmf)

◦ Discrete random variables
◦ Defines probabilities for distinct events, P 𝑋𝑋 = 𝑥𝑥
https://mobilemonitoringsolutions.com/wp-content/uploads/2020/11/8145187464
https://i.ytimg.com/vi/AgSumauLegs/maxresdefault.jpg
Some Common Distributions
Continuous
.4
◦ Normal (or Gaussian)
◦ Student’s t
.3
◦ Uniform
Density
.2
◦ Chi-squared
.1
Discrete
◦ Bernoulli
0
-4 -2 0 2 4
◦ Binomial
normal
◦ Poisson (covered in more advanced courses)

◦ Multinomial (covered in more advanced courses)
Normal Distribution
Normal Distribution
 Aka. Gaussian distribution
 Unimodal and symmetric, bell shaped curve
 Many variables are nearly normal, but none are exactly normal
 Denoted as N(µ, σ) → Normal with mean µ and standard deviation σ (in Rosner
textbook, 𝜎𝜎2𝑖𝑖𝑠𝑠 𝑢𝑢𝑠𝑠𝑒𝑒𝑑𝑑 𝑎𝑎𝑠𝑠 𝑣𝑣𝑎𝑎𝑟𝑟𝑖𝑖𝑎𝑎𝑛𝑛𝑐𝑐𝑒𝑒 𝑜𝑜𝑓𝑓 𝑡𝑡ℎ𝑒𝑒 𝑛𝑛𝑜𝑜𝑟𝑟𝑚𝑚𝑎𝑎𝑙𝑙 𝑑𝑑𝑖𝑖𝑠𝑠𝑡𝑡𝑟𝑟𝑖𝑖𝑏𝑏𝑢𝑢𝑡𝑡𝑖𝑖𝑜𝑜𝑛𝑛: N(µ, 𝜎𝜎2))
Height of Males
“The male heights on OkCupid very nearly
follow the expected normal distribution --
except the whole thing is shifted to the right of
where it should be. Almost universally guys like
to add a couple inches.”
“You can also see a more subtle vanity at work:

starting at roughly 5' 8", the top of the dotted
curve tilts even further rightward. This means
that guys as they get closer to six feet round up
a bit more than usual, stretching for that
coveted psychological benchmark.”
http://blog.okcupid.com/index.php/the-biggest-lies-in-online-dating
Heights of Females
“When we looked into the data
for women, we were surprised to
see height exaggeration was just
as widespread, though without
the lurch towards a benchmark
height.”
http://blog.okcupid.com/index.php/the-biggest-lies-in-online-dating
Normal Distributions with Different Parameters
SAT scores are distributed nearly normally with mean 1500 and
standard deviation 300. ACT scores are distributed nearly normally
with mean 21 and standard deviation 5. A college admissions officer
wants to determine which of the two applicants scored better on
their standardized test with respect to the other test takers: Pam,
who earned an 1800 on her SAT, or Jim, who scored a 24 on his ACT?
Standardizing with Z scores
Since we cannot just compare these two raw scores, we instead compare how many standard
deviations beyond the mean each observation is.
Pam's score is (1800 - 1500) / 300 = 1 standard deviation above the mean.
Jim's score is (24 - 21) / 5 = 0.6 standard deviations above the mean.
Standardizing with Z scores
These are called standardized scores, or Z scores.
Z score of an observation is the number of standard deviations it falls above or below the mean.
 Z scores are defined for distributions of any shape, but only when the distribution is normal
can we use Z scores to calculate percentiles.
 Observations that are more than 2 SD away from the mean (|Z| > 2) are usually considered
unusual.
Percentiles
 Percentile is the percentage of observations that fall below a given data point.
 Graphically, percentile is the area below the probability distribution curve to
the left of that observation.
Calculating percentiles using tables
Let’s Practice!
BMI Example
Consider body mass index (BMI) in a population in which BMI is normally distributed and has a
mean value = 24 and a standard deviation = 6. What percent of people in this population has a
BMI below 24.9?
Recall:
z = z-score
(𝒙𝒙 − 𝝁𝝁) x = experimental result
𝒛𝒛 =
𝝈𝝈 µ = mean value
σ = standard deviation
Let’s Practice! (𝒙𝒙 − 𝝁𝝁)
𝒛𝒛 =
BMI Example 𝝈𝝈
BMI below 24.9?
 Let X = BMI level of a person: X ~ N(µ = 24, σ = 6)
 What is the probability of having a BMI less than 24.9? P(X < 24.9)
24 24.9
Let’s Practice! (𝒙𝒙 − 𝝁𝝁)
𝒛𝒛 =
BMI Example 𝝈𝝈
BMI below 24.9?
 What is the probability of having a BMI less than 24.9? P(X < 24.9)
 This is actually equivalent to P(Z < 0.15)
24.9 − 24
𝑍𝑍 = = 0.15
6
0 0.15
Let’s Practice!
BMI Example
Let’s find the exact probability using the Z table
Let’s Practice!
BMI Example
𝑃𝑃(𝑋𝑋 < 24.9) 24.9 − 24
𝑍𝑍 = = 0.15
6
𝑃𝑃 𝑍𝑍 < 0.15
𝑃𝑃 𝑍𝑍 < 0.15 =0.5596
24 24.9 The percent of people

who are below 24.9 BMI
level is 55.96%.
Let’s Practice!
BMI Example – This time fun with R!
In general:
𝑃𝑃(𝑋𝑋 < 24.9) q: the z-score
mean: the mean of the normal distribution (default is zero)
sd: the standard deviation of the normal distribution (default is 1)
lower.tail: If TRUE, the probability to the left of q in the normal distribution is returned. If
FALSE, the probability to the right of q is returned (default is TRUE).
Left-tailed test: pnorm(q=z-score, lower.tail=TRUE)

Right-tailed test: pnorm(q=z-score, lower.tail=FALSE)
Two-tailed test: 2*pnorm(q=z-score, lower.tail=FALSE)
24 24.9
R Input to calculate p-value:
24.9 − 24
𝑍𝑍 = = 0.15
6 pnorm(q=0.15, lower.tail=TRUE)
𝑃𝑃 𝑍𝑍 < 0.15 =0.5596

The percent of people who are below 24.9 BMI level is 55.96%.
https://www.statology.org/p-value-of-z-score-r/ Information on how calculate p-values from z-score in R
Let’s Try It Again!
BMI Example 2
According to NIH, the BMI level below 18.5 is considered as underweight. What is the percent of
people in this population is underweight?
 𝑃𝑃(𝑋𝑋 < 18.5) ? 18.5 − 24
𝑍𝑍 = = −0.92
6
𝑃𝑃(𝑍𝑍 < −0.92)
-0.92 0
BMI Example 2
BMI Example 2
According to NIH, the BMI level below 18.5 is considered as underweight. What is the percent of
people in this population is underweight?
 𝑃𝑃(𝑋𝑋 < 18.5) ? 18.5 − 24
𝑍𝑍 = = −0.92
6
𝑃𝑃(𝑍𝑍 < −0.92)
𝑃𝑃 𝑍𝑍 ≤ −0.92 = 0.1788
17.88% of people in this

-0.92 0
population are underweight.
R:
BMI Example 2
We can work this another way! The standard normal distribution is
symmetrical about the mean.
𝑃𝑃(𝑧𝑧 > 0.92)
0 0.92
𝑃𝑃 𝑍𝑍 < −0.92 = 𝑃𝑃(𝑍𝑍 > 0.92) = 1 − 𝑃𝑃(𝑍𝑍 ≤ 0.92)

BMI Example 2
𝑃𝑃(𝑍𝑍 < 0.92)

𝑃𝑃(𝑍𝑍 > 0.92)
= 1-
0 0.92 0 0.92
𝑃𝑃 𝑍𝑍 < −0.92 = 𝑃𝑃(𝑍𝑍 > 0.92) = 1 − 𝑃𝑃(𝑍𝑍 ≤ 0.92)

BMI Example 2
BMI Example 2
𝑃𝑃(𝑍𝑍 < 0.92)

𝑃𝑃(𝑍𝑍 > 0.92)
= 1-
0 0.92 0 0.92
𝑃𝑃 𝑍𝑍 < −0.92 = 𝑃𝑃 𝑍𝑍 > 0.92 = 1 − 𝑃𝑃 𝑍𝑍 < 0.92 =

1 − 0.8212 = 0.1788
17.88% people in this population are underweight.
BMI Example 2
symmetrical about the mean. 𝑃𝑃 𝑍𝑍 < −0.92 = 𝑃𝑃(𝑍𝑍 > 0.92) = 1 − 𝑃𝑃(𝑍𝑍 ≤ 0.92)
𝑃𝑃(𝑍𝑍 < 0.92)

𝑃𝑃(𝑍𝑍 > 0.92) R:
= 1-
0 0.92 0 0.92
R Input to calculate p-value: R Input to calculate p-value:
pnorm(q=-0.92, lower.tail=TRUE) pnorm(q=0.92, lower.tail=TRUE)
pnorm(q=0.92, lower.tail=FALSE)

How about once more for fun?
BMI Example 3
According to NIH, the BMI level above 30 is considered as obese. What percent of people in this
population are obese?
 𝑃𝑃 𝑋𝑋 > 30 ? 30 − 24
𝑍𝑍 = =1
6
𝑃𝑃 𝑍𝑍 ≤ 1 𝑃𝑃(𝑍𝑍 > 1)
0 1
BMI Example 3
BMI Example 3
According to NIH, the BMI level above 30 is considered as obese. What percent of people in this
population are obese?
 𝑃𝑃 𝑋𝑋 > 30 ? 30 − 24
𝑍𝑍 = =1
6
𝑃𝑃 𝑍𝑍 ≤ 1 𝑃𝑃(𝑍𝑍 > 1) 𝑃𝑃 𝑍𝑍 > 1 = 1 − 𝑃𝑃 𝑍𝑍 ≤ 1
= 1 − 0.8413 = 0.1587
0 1
15.87% people in this population are obese.
R:
How about even more fun… let’s change
it up a bit! - BMI Example 5
According to NIH, the BMI level between 18.5 and 24.9 is considered as healthy weight. What
percent of people in this population have healthy weight?
 𝑷𝑷 𝟏𝟏𝟏𝟏. 𝟓𝟓 < 𝑿𝑿 < 𝟐𝟐𝟐𝟐. 𝟗𝟗
18.5 − 24
𝑃𝑃 −0.92 < 𝑍𝑍 < 0.15 𝑍𝑍 = = −0.92
6
24.9 − 24
𝑍𝑍 = = 0.15
6
How about even more fun… let’s change
it up a bit! - BMI Example 5
According to NIH, the BMI level between 18.5 and 24.9 is considered as healthy weight. What
percent of people in this population have healthy weight?
 𝑷𝑷 𝟏𝟏𝟏𝟏. 𝟓𝟓 < 𝑿𝑿 < 𝟐𝟐𝟐𝟐. 𝟗𝟗
18.5 − 24
𝑃𝑃 −0.92 < 𝑍𝑍 < 0.15 𝑍𝑍 = = −0.92
6
24.9 − 24
𝑍𝑍 = = 0.15
6
𝑃𝑃 −0.92 < 𝑍𝑍 < 0.15
= 𝑃𝑃 𝑍𝑍 < 0.15 − 𝑃𝑃 𝑍𝑍 < −0.92
=0.5596-0.1788=0.3808
38.08% people in this population have healthy weight.
68 – 95 – 99.7 Rule
(Empirical Rule)
For nearly normally distributed data,
 about 68% falls within 1 SD of the mean,
 about 95% falls within 2 SD of the mean,
 about 99.7% falls within 3 SD of the mean.
It is possible for observations to fall 4,
5, or more standard deviations away
from the mean, but these occurrences
are very rare if the data are nearly
normal.
https://www.statology.org/the-normal-distribution/ More detailed information on the Normal Distribution

Let’s Practice the Empirical Rule!
The height of plants in a certain garden are normally distributed with a mean of 41.2 inches and
a standard deviation of 2.8 inches.
Use the Empirical Rule to estimate what percentage of plants are greater than 35.6 inches tall.
Let’s Practice the Empirical Rule!
The height of plants in a certain garden are normally distributed with a mean of 41.2 inches and
a standard deviation of 2.8 inches.
Use the Empirical Rule to estimate what percentage of plants are greater than 35.6 inches tall.
The Empirical Rule states that for a given dataset with a normal distribution, 95% of data values fall within
two standard deviations of the mean. This means that 47.5% of values fall between the mean and two
standard deviations below the mean.
In this example, 35.6 is located two standard deviations below the mean. Since we know that 50% of data
values fall above the mean in a normal distribution, a total of 50% + 47.5% = 97.5% of values fall above 35.6.
Thus, 97.5% of plants are greater than 35.6 inches tall.
https://www.statology.org/empirical-rule-practice-problems/

L3 Biostatistics NormalDistribution

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

L3 Biostatistics NormalDistribution

Uploaded by

Copyright:

Available Formats

Biostatistics for Life Science

Continuous random variables take real (decimal) values

Probability Mass Function (pmf)

◦ Poisson (covered in more advanced courses)

“You can also see a more subtle vanity at work:

24 24.9 The percent of people

Left-tailed test: pnorm(q=z-score, lower.tail=TRUE)

𝑃𝑃 𝑍𝑍 < 0.15 =0.5596

17.88% of people in this

𝑃𝑃(𝑧𝑧 > 0.92)

𝑃𝑃 𝑍𝑍 < −0.92 = 𝑃𝑃(𝑍𝑍 > 0.92) = 1 − 𝑃𝑃(𝑍𝑍 ≤ 0.92)

𝑃𝑃(𝑍𝑍 < 0.92)

𝑃𝑃 𝑍𝑍 < −0.92 = 𝑃𝑃(𝑍𝑍 > 0.92) = 1 − 𝑃𝑃(𝑍𝑍 ≤ 0.92)

𝑃𝑃(𝑍𝑍 < 0.92)

𝑃𝑃 𝑍𝑍 < −0.92 = 𝑃𝑃 𝑍𝑍 > 0.92 = 1 − 𝑃𝑃 𝑍𝑍 < 0.92 =

𝑃𝑃(𝑍𝑍 < 0.92)

pnorm(q=-0.92, lower.tail=TRUE) pnorm(q=0.92, lower.tail=TRUE)

https://www.statology.org/p-value-of-z-score-r/ Information on how calculate p-values from z-score in R

https://www.statology.org/the-normal-distribution/ More detailed information on the Normal Distribution

Thus, 97.5% of plants are greater than 35.6 inches tall.

You might also like