You are on page 1of 6

Module #4: Comparing numerical values

Lecture 1: Normal distribution & Central Limit Theorem


Reading: Whitlock and Schluter Ch. 10* & 11*, pp 328-335

In the last module, we looked at the distributions and statistical tests for
discrete/categorical variables. Now we are going to look at a common
distribution for continuous variables, the normal distribution. Then we will
continue this module with the most basic type of statistical test, the t-test.
I. The Normal Distribution
The normal distribution is the iconic distribution for biological variables. Its bell
shape, with most observations falling close to the mean, and fewer observations
falling away from the mean is a close approximation to the frequency
distribution of many variables we see in nature.
Relative frequency

0.4

0.3 majority of observation will fall at the peak height - mean,


0.2 mode, median
0.1
fewer observation in tails
-2 -1 0 1 2 3

Measurement

I want you to take a minute and think about:


Why are so many biological variables described by the normal distribution?

Normal distribution often happens in nature when many factors all have ~ equal magnitude of additive effects
1
ex, exam scores, trait controlled by many genes
The normal distribution can be completely described by two parameters:
– mean (µ) - location
– standard deviation (σ) - spread

According to:
u=10 u=20
1 −(𝑌−𝜇)2
𝑓(𝑌) = 𝑒 2𝜎2
√2𝜋𝜎 2
define normal distribution

A. Properties of the Normal distribution (p.277):


- Continuous, so probability is measured as the area under the
curve
- Symmetrical
95% area under the curve , +/- 1.96
- Single mode (mean=median=mode) measure of central tendency are all the same
- ~2/3 of the area under the normal curve lies within one
standard deviation of the mean
- 95% of the area of a normal distribution lies within 1.96
standard deviations of the mean
- It is a theoretical distribution, to which many biological
variables approximate!
To make it easier to work with normal distributions with different means and
standard deviations, we convert the distribution we have to the standard
normal distribution.

- infinity # ø / #u of normal distributions - because they all share properties can convert them all into
standard normal distributions

2
II. The Standard Normal Distribution
All normal distributions are shaped alike, just with different means and
variances. Any values from a normal distribution can be converted to a standard
normal distribution, by:
𝑌−𝜇 - u- population mean of our normal distribution
𝑍=
𝜎 population standard deviation
y- particular value

0.4

0.3 Mean is zero, Standard deviation is 1


0.2

0.1 Z= standard normal deviate


how many standard dev's a value is away from mean
-2 -1 0 1 2 3

We can also apply the standard normal distribution, to the distribution of means
sampled from a population. The distribution of means, will be normally
distributed with a standard deviation of (aka Standard error of the mean):
𝜎 with known parameters
𝜎𝑌̅ =
√𝑛
The calculating the probability of observing certain sample means becomes a Z-
score using the SEM as the standard deviation of the sample statistic.
𝑌̅ − 𝜇 - approx. the SEM
𝑍=
𝜎𝑌̅ u- pop. mean
y- mean of the distribution

standard deviation of sampling distribution 3


III. Applications of the Standard Normal Distribution
1. The natural log on growth (change in radius per year) of Engelmann spruce is
approximately normally distributed with a mean of 0.037 log units and standard
deviation 0.385. Following the steps below, determine the probability that a
tree has a bad year, defined as having growth less than -0.050 log units in a year.
a. Make a sketch of the normal distribution with the mean 0.037 and mark
the area we are trying to determine.

- need to know the area in the 0.385


shaded tail region

b. Calculate the standard-0.050 u=0.037


normal deviate (Z) associated with the value we
are interested in here (-0.05)
Z= Y-u/ ø Z=-0.050-0.037/0.385
Z= -0.22597

c. We are interested in the probability of getting a value less than this Z


score. Use Statistical table B to determine this probability.
- Area under both the tail is same due to normal distribution

=+0.22597

= P [Z> 0.22597] = P [Z<-0.22597]


= P =0.40905
d. What is the log growth rate that demarcates the highest 5% of growth
years?
Z= 1.65

Z= Y-u/ø
5% 1.65 (ø) + u = Y
p=0.05 1.65 (0.385) + 0.037 = 0.067225 4
2. The following table lists the mean and standard deviations of several different
normal distributions. For each, a sample of 10 individuals was taken, as well as,
a sample of 30 individuals. For each sample, calculate the probability that the
mean of the samples was greater than the given value of Y.
Mean Standard Y n=10, Pr(𝑌̅ > 𝑌) n=30, Pr(𝑌̅ > 𝑌)
Deviation
14 5 15 0.2643 0.13786
15 3 15.5
-23 4 -22
Z = ¥ -u /ø¥ = ¥ - u / ø/ √n = (15 - 14)/ (5/ √10) = 0.632
p [Z =1.09] = 0.13786
=(15 - 14)/ (5/ √30) = 1.09

What do you notice about the Pr(𝑌̅ > 𝑌) as the sample size increases? Why is
this?
large sample size

small sample size - broad tails

IV. Central Limit Theorem -- The most amazing Central Limit Theorem!
Central Limit Theorem = ‘the mean of a large number of measurements
randomly sampled from a non-normal (or normal) population is approximately
normally distributed’ p. 286
Let’s look at an example:
Button pushing times
most responses

measuring response rate after a stimuli


2nd try
Frequency

3rd try
Time (ms)

5
Now randomly sample from this distribution and calculate of mean:

repeating same distribution


as we started with - does not look normal

Distributions of
1000 sample
-starts to look normal means for
samples of
different sizes

- normally distributed

DISCUSSION: Turn to your neighbour and explain WHY we see what we do


above? Why does as the sample size increase, the distribution of means
gets increasingly normal? Why doesn’t it just mimic the distribution from
which the samples were drawn?

You might also like