You are on page 1of 39

Quantitative Data Analysis

Lesson 14: Sampling Distribution of x̄ and the Central Limit


Theorem

1 / 36
Review

• population – the entire group of individuals that is the


target of our interest.
Example: All BYU full-time students
• sample – a subgroup of the population from which we
obtain information
Example: 170 BYU full-time students
• parameter – numerical fact about the population
Example: µ = average GPA of all full-time BYU students
• statistic – numerical fact about the sample
Example: x̄ = average GPA of the 170 students in our
sample

2 / 36
Notation:

µ = population mean
σ = population standard deviation
X̄ = sample mean
s = sample standard deviation
Mean(X̄ )= mean of the sampling distribution of X̄
SD(X̄ )= standard deviation of the sampling distribution of X̄

3 / 36
The Sampling Distribution of X̄

The sampling distribution of a sample mean (X̄ ) is a theoretical


probability distribution

It describes the distribution of:


• all sample means
• from all possible random samples
• of the same size
• taken from the same population

4 / 36
Example 1: Population distribution of blood pressures (BP)

0.030
0.025
Percentage of Men in Population

0.020

µ = 125 mm Hg
0.015

σ = 14 mm Hg
0.010
0.005
0.000

80 100 120 140 160 180

Systolic Blood Pressure (mm Hg)

5 / 36
Let’s do a simulation for n = 20

• Take 500 separate random samples from this population of


men, each with n = 20 subjects
• For each of the 500 samples, we will
• plot the sample BP values
• calculate the sample mean, X̄
• calculate the sample standard deviation, s
• We will plot the 500 sample means using a histogram

Ready, set, go...

6 / 36
7 / 36
So we did this 500 times...
Let’s look at a histogram of the 500 sample means
Each based on a sample of size 20

8 / 36
Let’s do ANOTHER simulation for n = 50

• Take 500 separate random samples from this population of


men, each with n = 50 subjects
• For each of the 500 samples, we will
• plot the sample BP values
• calculate the sample mean
• calculate the sample standard deviation
What do you predict is the mean of these X̄ s?
What do you predict is the shape of the histogram of these X̄ s?

9 / 36
Summary of simulation results for BP

10 / 36
Example 2: Population distribution of hospital length of stay (LOS)

0.20
0.15
Percentage

µ = 4 days
0.10

σ = 3 days
0.05
0.00

0 5 10 15 20 25 30

Length of Stay (in days)

11 / 36
Let’s do a simulation for n = 16

• Take 500 separate random samples from this population of


hospital admissions, each with n = 16 patients
• For each of the 500 samples, we will
• plot the sample LOS values
• calculate the sample mean
• calculate the sample standard deviation

Ready, set, go...

12 / 36
13 / 36
So we did this 500 times...
Let’s look at a histogram of the 500 sample means
Each based on a sample of size 16

14 / 36
Let’s do ANOTHER simulation for n = 64

• Take 500 separate random samples from this population of


hospital length of stay, each with n = 64 subjects
• For each of the 500 samples, we will
• plot the sample LOS values
• calculate the sample mean
• calculate the sample standard deviation

Ready, set, go...

15 / 36
16 / 36
So we did this 500 times...
Let’s look at a histogram of the 500 sample means
Each based on a sample of size 64

17 / 36
Let’s do ANOTHER simulation for n = 256

• Take 500 separate random samples from this population of


hospital length of stay, each with n = 256 subjects
• For each of the 500 samples, we will
• plot the sample LOS values
• calculate the sample mean
• calculate the sample standard deviation
What do you predict is the mean of these X̄ s?
What do you predict is the shape of the histogram of these X̄ s?

Ready, set, go...

18 / 36
19 / 36
So we did this 500 times...
Let’s look at a histogram of the 500 sample means
Each based on a sample of size 256

20 / 36
Simulation results for Hospital LOS

21 / 36
Variation in sample mean values tied to size of each
sample
NOT the number of samples ●

8






7

● ●



● ●

● ●




● ●
● ●


● ●


6

● ●




● ●





5



● ●

4


● ●



3


2

500 5000 500 5000 500 5000


Simulations Simulations Simulations
n=16 n=64 n=256

22 / 36
Sampling Distribution of X̄

Distribution of x̄ for all possible SRSs of size


n from a population with mean µ and SD σ

Center mean of sampling distribution of x̄ is µ

standard deviation of sampling distribution of


Spread x̄ is √σn

? The answer to this question will


Shape
be given after the simulations

23 / 36
CLT–R Shiny App example

1. Open R Shiny app: http://shinyserver.byu.edu/


users/stat121res/shinyApp/
2. Sampling Distribution of X-bar

24 / 36
• In real research it is impossible to estimate the sampling
distribution of a sample mean by actually taking multiple
random samples from the same population
• no research would ever happen if a study needed to be
repeated multiple times to understand this sampling
behavior
• Simulations are useful to illustrate a concept, but not to
highlight a practical approach!
• Luckily, there is some mathematical machinery that
generalizes some of the patterns we saw in the simulation
results

25 / 36
Amazing Result

Mathematical statisticians have figured out how to predict what


the sampling distribution will look like without actually repeating
the study numerous times and having to choose a sample each
time

Often, the sampling distribution will look


“normal”

Central Limit Theorem

26 / 36
Central Limit Theorem (CLT)

If you take a large SRS of size n from any population, then the
shape of the sampling distribution of x̄ is approximately normal
• shape gets more normal as n increases
• n > 30 is considered large
• CLT allows us to use the standard normal table to compute
approximate probabilities associated with x̄

27 / 36
Central Limit Theorem

Population

0 2 4 6 8 10

Sample Means
based on n = 16

0 2 4 6 8 10

Sample Means
based on n = 64

0 2 4 6 8 10

Sample Means
based on n = 256

0 2 4 6 8 10

28 / 36
Sampling Distribution of X̄

Distribution of x̄ for all possible SRSs of size


n from a population with mean µ and SD σ

Center mean of sampling distribution of x̄ is µ

standard deviation of sampling distribution of


Spread √
x̄ is σ/ n

· Case 1: Population normal. The shape of


the distribution of x̄ is normal
Shape
· Case 2: Population non-normal. The shape
of the distribution of x̄ is approximately nor-
mal when n is large (n > 30) by the CLT 29 / 36
Why do we care about the sampling distribution?

• Sampling distributions allow us to assess uncertainty of


sample results
• If we knew the spread of the sampling distribution, we
would know how far our x̄ might be from the true µ (either
below or above µ)

30 / 36
Why is the sampling distribution so important?

• If a sampling distribution has a lot of variability, then if you


took another sample it’s likely you would get a very
different result
• About 95% of the time the sample mean will be within 2 √σ
n
of the population mean
• This tells us how “close” the sample statistic should be to
the population parameter

31 / 36
Why is the normal distribution so important in the
study of statistics?

It’s not because things in nature are always normally distributed


(although sometimes they are)

It’s because of the Central Limit Theorem:

The sampling distribution of a statistic (like a sample mean)


often follows a normal distribution if the sample sizes are large

32 / 36
Self-check

Consider taking a random sample of size 49 from a left-skewed


population with mean 80 and standard deviation 7. What is the
mean of the sampling distribution of sample means for n = 49
from this population?
(a) 7
(b) 80
(c) Close to 80
(d) Cannot be calculated because the population is
left-skewed

33 / 36
Self-check

Consider taking a random sample of size 49 from a left-skewed


population with mean 80 and standard deviation 7. What is the
mean of the sampling distribution of sample means for n = 49
from this population?
(a) 7
(b) 80
(c) Close to 80
(d) Cannot be calculated because the population is
left-skewed

33 / 36
Self-check

Consider taking a random sample of size 49 from a left-skewed


population with mean 80 and standard deviation 7. What is the
standard deviation of the sampling distribution of sample
means for n = 49 from this population?
(a) 7
(b) 1
(c) 0.1429
(d) Cannot be calculated because the population is
left-skewed

34 / 36
Self-check

Consider taking a random sample of size 49 from a left-skewed


population with mean 80 and standard deviation 7. What is the
standard deviation of the sampling distribution of sample
means for n = 49 from this population?
(a) 7
(b) 1
(c) 0.1429
(d) Cannot be calculated because the population is
left-skewed

34 / 36
Self-check

Consider taking a random sample of size 49 from a left-skewed


population with mean 80 and standard deviation 7. Suppose
we’re planning to take a random sample of size 49 from this
population. What sample mean value are we expecting to get?

(a) x̄ = 7
(b) x̄ = 49
(c) x̄ = 80
(d) An x̄ close to 80

35 / 36
Self-check

Consider taking a random sample of size 49 from a left-skewed


population with mean 80 and standard deviation 7. Suppose
we’re planning to take a random sample of size 49 from this
population. What sample mean value are we expecting to get?

(a) x̄ = 7
(b) x̄ = 49
(c) x̄ = 80
(d) An x̄ close to 80

35 / 36
Vocabulary

• simple random sample


• population distribution
• sampling distribution of x̄
• standard deviation of sampling distribution of x̄
• mean of sampling distribution of x̄
• shape of sampling distribution of x̄
• central limit theorem

36 / 36

You might also like