You are on page 1of 39

Probability and Statistics

LECTURE 5
SAMPLING METHODS
&
SAMPLING DISTRIBUTIONS

Adapted from http://www.prenhall.com/mcclave


OUTLINE
• Sampling methods
• The importance of sampling distribution
• Sampling distribution of sample means
• Using simulation to understand sampling
distribution
• Central limit theorem

5-2
SAMPLING

 Some terms:
• Census: conducting a survey to collect data for the
entire population
• Sampling: selecting a sample from population
 Why sampling is necessary?
 Cost
 Practicality
 Statistical inference will allow us to draw
conclusions for population based on sample data

5-3
SAMPLING WITH/WITHOUT REPLACEMENT

 Sampling with replacement:

 Sampling without replacement:

5-4
SAMPLING METHODS

 Probability sampling methods


 Probability sample: a sample selected in such a way
that each item or person in the population has a known
(nonzero) likelihood of being included in the sample.
 E.g. Simple random sampling, systematic random
sampling, and stratified random sampling.

5-5
SAMPLING METHODS

 Nonprobability sampling methods


 Not all items or people have a chance of being included
in the sample.
 E.g. online survey, surveying customers by going around
a shopping mall and asking customers to answer a
questionnaire.
 Results may be biased.

5-6
SOME IMPORTANT PROBABILITY
SAMPLING METHODS

Let’s now discuss two of the most important


probability sampling methods: simple random
sampling and stratified random sampling.

5-7
SIMPLE RANDOM SAMPLING
Selecting a sample in such a way that every possible
sample of the same size is equally likely to be
chosen.
Example: Suppose our lecture class is the
population. We are going to select a simple random
sample (SRS) of 20 students from the population.

 We first have to obtain a list of population members.


Number the list from 1 to N (the population size).
 Then, use a computer software to select a random
sample.

5-8
TAKE A SRS IN R
sample(x, size, replace = FALSE)

 x: a vector of one or more elements from which to


choose.
 size: the number of items to be chosen.

 replace: specifies whether we want sampling with


replacement (TRUE) or without replacement (FALSE)

5-9
STRATIFIED RANDOM SAMPLING

 To select a stratified random sample, first divide


the population into mutually exclusive groups of
similar individuals called strata. Then choose a
separate SRS from each stratum and combine
these SRSs to form the full sample.

 Example:

5 - 10
OTHER TYPES OF SAMPLING

Please refer to your textbook for other type of


sampling methods

5 - 11
STATISTICAL METHODS

Statistical
Methods

Descriptive Inferential
Statistics Statistics

5 - 12
IMPORTANCE OF SAMPLING DISTRIBUTION

 The basis of statistical inference


 Basis for understanding hypothesis testing,
estimation, etc.

5 - 13
REPEATED SAMPLING

 Example
 We wish to estimate population mean
 Select a random sample
 Find the sample mean (e.g. = 20) and use it as an
estimate
 If other people select different samples, and find
markedly different sample means
 Would we trust our estimate?

5 - 14
REPEATED SAMPLING

 The same problem but:


 If everyone else selects different samples, their
results are close to our result
 Sampling distribution of sample means gives ideas
about how sample means vary between
samples
 Sample mean: just a particular sample statistic

5 - 15
EXAMPLE OF SAMPLING DISTRIBUTION

 Given a population of salaries of 5 employees: 2,


5, 7, 8, 10 (in hundred dollars/month)
 Imagine population mean is unknown; we wish
to estimate the population mean salary
 We select a random sample of 3 salaries

5 - 16
EXAMPLE OF SAMPLING DISTRIBUTION

 Denote mean of random sample: 𝑿 ഥ


 Before sample selection: does 𝑿 ഥ represent
a fixed value or a random variable?
ഥ represents a variable that can change in
 If 𝑿
values
 how many possible values it can take?
 What is the possibility of each value?
 What if we use a sample size of n = 4?

5 - 17
SAMPLING DISTRIBUTION OF SAMPLE MEANS

 Probability distribution of all of the possible


values of the sample mean for a given size
sample selected from a population

 What if we change the sample size?

5 - 18
QUESTIONS

 Is there a sampling distribution of median?


 Is there a sampling distribution of variance?

5 - 19
EXAMPLE OF SAMPLING DISTRIBUTION
OF VARIANCE

5 - 20
IN GENERAL

Sampling distribution is a probability


distribution of all of the possible values of a
sample statistic for a given size sample selected
from a population

5 - 21
ACTIVITY: EXPLORING SAMPLING DISTRIBUTIONS
VIA SIMULATION

 Use the applet on the webpage:


http://www.rossmanchance.com/applets/OneSample.html
 1st Population: math scores of 15892 high school
students
 Let’s observe
 Histogram of population
 Mean of population
 SD of population

5 - 22
ACTIVITY

 Now we will develop sampling distribution of


sample means (for example, by selecting 10000
samples or more) for n =
 2
 10
 30
 100

5 - 23
OBSERVATIONS
 Let’s write down our observations:
 Many sampling distributions (for each n)
 Shape of sampling distribution
 Mean of sampling distribution (and compare it
with mean of population)
 SD of sampling distribution (and compare it with
SD of population)
 The difference between sampling distribution
and population

5 - 24
ACTIVITY
 Now let’s choose a different population (a non-
normal population) provided by the website
 Repeat what we have done
 Write down our observations
 When does the sampling distribution becomes
approximately normal?

5 - 25
ACTIVITY
 Now we should
 Clearly distinguish between population and sampling
distribution.
 Homework: you should experiment with other
populations in the website to deepen your
understanding of sampling distributions.
 Question: Is there a sampling distribution of
another statistic?

5 - 26
THEOREM I
 If a random sample is selected from a normal
population, the sampling distribution of sample
mean is normal.

 Demonstrated by the applet of population of


math scores.

5 - 27
THEOREM II: CENTRAL LIMIT THEOREM

 If a random sample is selected from a non-normal


population, the sampling distribution of sample
mean is approximately normal for large sample
sizes.

 Demonstrated by the applet of a skewed


population.

5 - 28
THEOREM II: CENTRAL LIMIT THEOREM
 Practical guideline:
• If the population is nearly normal, then a sample of size n
= 5 will probably be large enough to assure that 𝑿 ഥ is
approximately normal.
• If the population is symmetric, then a sample of size n =
20 to 25 is enough for the Central Limit Theorem (CLT) to
hold.
• For most moderately skewed distributions, a sample size
of around 30 is traditionally thought to be sufficiently
large for the CLT to hold. This is a rule of thumb but this
is not a definitive number.
• For very skewed distributions or distributions with
outliers, the sample size required for the CLT to hold may
be much larger than 30.
5 - 29
PROPERTIES OF SAMPLING DISTRIBUTION OF
MEAN

 The relationship between


 Mean of population and
 Mean of all sample means

 The relationship between


 SD of population and
 SD of all sample means

5 - 30
SAMPLING ERROR
 Difference between sample statistic and
parameter
 Important when making inference about
population

5 - 31
STANDARD ERROR OF MEAN
 SD of sample means
 Represents (approx.) average deviation of sample
means to center
 The center = population mean
 Represents (approx.) average error when using
sample mean to estimate population mean
 So called Standard error of mean:
𝝈𝑿
𝝈𝑋ത =
𝒏
(if n/N ≤ 0.05)

5 - 32
FINITE POPULATION CORRECTION FACTOR
 In cases where n/N > 0.05, the standard error
of mean is:

5 - 33
FINDING PROBABILITY OF SAMPLE MEAN
 First, check that the sampling distribution of
sample mean is normal or nearly so
 If so, convert to Z to find probability:

5 - 34
EXERCISE 1
You’re an operations analyst
for AT&T. Long-distance
telephone calls are normally
distributed with µ = 8 min. &
= 2 min. If you select a
random sample of 25 calls,
what is the probability that
the sample mean would be
between 7.8 & 8.2 minutes?

© 1984-1994 T/Maker Co.

5 - 35
SOLUTION
X   7.8  8
Z    .50
 n 2 25
X   8.2  8
Z   .50
 n 2 25
Sampling Standard Normal
Distribution Distribution
X = .4 =1
.3830

7.8 8 8.2 X -.50 0 .50 Z


5 - 36
EXERCISE 2
You’re an operations analyst
for company A. The
distribution of long-distance
telephone calls is symmetric
but non-normal with µ = 8
min. & = 2 min. If you select
a random sample of 30 calls,
what is the probability that
the sample mean lies
between 7.8 & 8.2 minutes?

© 1984-1994 T/Maker Co.

5 - 37
SOLUTION
X   7 .8  8
Z    .55
 n 2 30
X   8 .2  8
Z   .55
 n 2 30
Sampling Standard Normal
Distribution Distribution

 = .365  =1
X
.4176

-.55 0 Z
7.8
5 - 38
8 8.2 X .55
CONCLUSION

 Sampling methods
 The importance of sampling distribution

 Sampling distribution of sample mean

 Using simulation to understand sampling


distribution
 Central limit theorem

5 - 39

You might also like