Sampling MM 2022

Sampling Approaches and
Considerations
1
Statistics is a Science of
Inference
• Statistical Inference: On basis of sample
– Predict and forecast values of statistics derived from
population parameters... limited and incomplete
– Test hypotheses about values of sample information
population parameters...
– Make decisions...
Make generalizations On the basis of

about the observations of a
characteristics of a sample, a part of a
population... population
The Literary Digest Poll (1936)
Unbiased
Sample
Unbiased, representative
sample drawn at random
Democrats Republicans from the entire
Population
population.
Biased
People who have phones Sample Biased, unrepresentative
and/or cars and/or are
Digest readers. sample drawn from
Democrats
people who have cars
Republicans
Population and/or telephones and/or
read the Digest.
Sampling vs. Census ?
Go On-Line
www.surveysampling.com
A census involves collecting data from all

members of a population.
A sample is a relatively small subset of the

population that is selected to be representative
of the population’s characteristics.
4
Sampling Design Process
The sampling design process involves

answering three questions:
1. Should a sample or a census be used?
2. If a sample, then which sampling
approach is best?
3. How large a sample is necessary?
5
To obtain a representative
sample . . . .
Steps to follow:
1. Define the target population.
2. Choose the sampling frame.
3. Select the sampling method.
4. Determine the sample size.
5. Implement the sampling plan.
6
Representative Sample
A representative sample mirrors the

characteristics of the population and
minimizes the errors associated with
sampling.
7
▪Population inferences can be made...
▪...by selecting a representative sample
from the population
Target Population
. . . the complete group of objects or

elements relevant to the research
project. They are relevant because
they possess the information the
research project is designed to
collect.
10
Sampling Unit
. . . . elements or objects available for

selection during the sampling process are
known as the sampling unit.
11
Sampling Frame
. . . . as complete a list as possible of

all the elements in the population from
which the sample is drawn.
12
Sampling Methods
Go On-Line
www.svys.com
Probability
Non-Probability
13
Probability vs. Non-Probability Sampling
Probability = each element of the population has a

known, but not necessarily equal, probability of being
selected in a sample.
Non-Probability = not every element of the target

population has a chance of being selected because
the inclusion or exclusion of elements in a sample is
left to the discretion of the researcher.
14
Convenience Sampling
. . . involves selecting sample

elements that are most readily
available to participate in the
study and who can provide the
required information.
15
Judgment Sampling
. . . a form of convenience sampling,

sometimes referred to as a
purposive sample, in which the
researcher’s judgment is used to
select the sample elements.
16
Simple Random Sampling
. . . . a sampling method in which each

element of the population has an equal
probability of being selected.
17
Random Sampling Techniques
▪ Simple Random Sample – basis for

other random sampling techniques
• Each unit is numbered from 1 to n
• A random number generator can be used to
select
n items from the sample
Random Sampling Techniques
▪ Stratified Random Sample

• Proportionate (% of the sample taken from
each stratum is proportionate to the % that
each stratum is within the whole
population)
• Disproportionate (when the % of the sample
taken from each stratum is not
proportionate to the % that each stratum is
within the whole population)
▪ Systematic Random Sample
▪ Cluster (or Area) Sampling
Simple Random Sample:
Sample Members
▪01 Alaska Airlines ▪11 DuPont ▪21 Lucent

▪02 Alcoa ▪12 Exxon Mobil ▪22 Mattel
▪03 Ashland ▪13 General Dynamics ▪23 Mead
▪04 Bank of America ▪14 General Electric ▪24 Microsoft
▪05 BellSouth ▪15 General Mills ▪25 Occidental Petroleum
▪06 Chevron ▪16 Halliburton ▪26 JCPenney
▪07 Citigroup ▪17 IBM ▪27 Procter & Gamble
▪08 Clorox ▪18 Kellog ▪28 Ryder
▪09 Delta Air Lines ▪19 KMart ▪29 Sears
▪10 Disney ▪20 Lowe’s ▪30 Time Warner
▪ N = 30
▪ n=6
Simple Random Sampling:
Random Number Table
▪9▪9▪4▪3▪7 ▪8▪7▪9▪6▪1 ▪4▪5▪7▪3▪7 ▪3▪7▪5▪5▪2 ▪9▪7▪9▪6▪9 ▪3▪9▪0▪9▪4 ▪3▪4▪4▪7▪5 ▪3▪1▪6▪
▪5▪0▪6▪5▪6 ▪0▪0▪1▪2▪7 ▪6▪8▪3▪6▪7 ▪6▪6▪8▪8▪2 ▪0▪8▪1▪5▪6 ▪8▪0▪0▪1▪6 ▪7▪8▪2▪2▪4 ▪5▪8▪3▪
▪8▪0▪8▪8▪0 ▪6▪3▪1▪7▪1 ▪4▪2▪8▪7▪7 ▪6▪6▪8▪3▪5 ▪6▪0▪5▪1▪5 ▪7▪0▪2▪9▪6 ▪5▪0▪0▪2▪6 ▪4▪5▪5▪
▪8▪6▪4▪2▪0 ▪4▪0▪8▪5▪3 ▪5▪3▪7▪9▪8 ▪8▪9▪4▪5▪4 ▪6▪8▪1▪3▪0 ▪9▪1▪2▪5▪3 ▪8▪8▪1▪0▪4 ▪7▪4▪3▪
▪6▪0▪0▪9▪7 ▪8▪6▪4▪3▪6 ▪0▪1▪8▪6▪9 ▪4▪7▪7▪5▪8 ▪8▪9▪5▪3▪5 ▪9▪9▪4▪0▪0 ▪4▪8▪2▪6▪8 ▪3▪0▪6▪
▪5▪2▪5▪8▪7 ▪7▪1▪9▪6▪5 ▪8▪5▪4▪5▪3 ▪4▪6▪8▪3▪4 ▪0▪0▪9▪9▪1 ▪9▪9▪7▪2▪9 ▪7▪6▪9▪4▪8 ▪1▪5▪9▪
▪8▪9▪1▪5▪5 ▪9▪0▪5▪5▪3 ▪9▪0▪6▪8▪9 ▪4▪8▪6▪3▪7 ▪0▪7▪9▪5▪5 ▪4▪7▪0▪6▪2 ▪7▪1▪1▪8▪2 ▪6▪4▪4▪
N = 30
n=6
Systematic Sampling
▪N
▪k▪ = ▪ ▪▪,
. . . a process that involves
▪n
randomly selecting an initial
▪where▪:
starting point on a list, and
thereafter every nth element in ▪n▪=▪ sample size
the sampling frame.
▪N▪=▪ population size
▪k▪=▪ size of selection interval

To take a systematic sample of n = 40 from the population of N = 800 full-time
employees
1. Partition the frame of 800 into 40 groups, each of which contains 20
employees.
2. 2. Select a random number from the first 20 individuals and include every
twentieth individual after the first selection in the sample.
3. For example, if the first random number you select is 008, your subsequent
22
selections are 028, 048, 068, 088, 108,c, 768, and 788.
Stratified Sampling
. . . requires the researcher

to partition the target
population into relatively
homogeneous subgroups that
are distinct and non-
overlapping.
23
Cluster Sampling
. . . a form of probability
sampling in which the
relatively homogeneous
individual clusters where
sampling occurs are chosen
randomly and not all
clusters are sampled.
24
Cluster Sampling
▪ Cluster sampling – involves dividing the

population into non-overlapping areas
• Identifies the clusters that tend to be
internally homogeneous
• Each cluster is a microcosm of the
population
▪ If the cluster is too large, a second set
of clusters is taken from each original
cluster
• This is two stage sampling
Cluster Sampling
▪ Advantages
• More convenient for geographically
dispersed populations
• Reduced travel costs to contact sample
elements
• Simplified administration of the survey
• Unavailability of sampling frame prohibits
using other random sampling methods
Cluster Sampling
▪ Disadvantages
• Statistically less efficient when the cluster
elements
are similar
• Costs and problems of statistical analysis
are greater
than for simple random sampling
Determining sample size involves achieving a
balance between several factors:
• The variability of elements in the target population.

• The type of sample required.
• Time available.
• Budget.
• Required estimation precision.
• Whether findings will be generalized.
28
Three decisions to make when statistical
formulas are used to determine sample size:
1. The degree of confidence

(often 95%).
2. The specified level of precision
(amount of acceptable error).
3. The amount of variability
(population homogeneity).
29
SAMPLING
DISTRIBUTION
=====================
Sample Statistics as Estimators of
Population Parameters
• A sample statistic is a A population parameter is

numerical measure of a a numerical measure of a
summary characteristic summary characteristic of
of a sample.
a population.
• An estimator of a population parameter is a sample

statistic used to estimate or predict the population
parameter.
• An estimate of a parameter is a particular numerical
value of a sample statistic obtained through sampling.
• A point estimate is a single value used as an estimate of
a population parameter.
Estimators
• The sample mean, X , is the most common estimator of the

population mean, 
• The sample variance, s2, is the most common estimator of
the population variance, 2.
• The sample standard deviation, s, is the most common
estimator of the population standard deviation, .
• The sample proportion, p, is the most common estimator of
the population proportion, p. ^
Sampling Distributions (1)
• The sampling distribution of a statistic is the

probability distribution of all possible values
the statistic may assume, when computed
from random samples of the same size, drawn
from a specified population.
• The sampling distribution of X is the
probability distribution of all possible values
the random variable may assume when a
sample of size n is taken from a specified
X
population.
Sampling from a Normal Population
When sampling from a normal population with mean  and standard

deviation , the sample mean, X, has a normal sampling distribution:

2
X ~ N (, )
n
This means that, as the Sampling Distribution of the Sample Mean
sample size increases, the 0.4
Sampling Distribution: n =16

sampling distribution of the 0.3
sample mean remains
f(X)
0.2
centered on the population 0.1
Normal population
Normal population
mean, but becomes more 0.0

compactly distributed around
that population mean
The Central Limit Theorem
n=5
When sampling from a population with mean 0.25
 and finite standard deviation , the sampling

0.20
P(X)
0.15
0.10
distribution of the sample mean will tend to a 0.05

0.00
X
normal distribution with mean  and standard n=20

deviation as the sample size becomes large 0.2
P(X)
(n >30). 0.1
0.0
X
For “large enough” n: Large n

0.4
0.3
f(X)
0.2
0.1
0.0

-
X
The Central Limit Theorem Applies to
Sampling Distributions from Any Population
Normal Uniform Skewed General
Population
n=2
n = 30
 X  X  X  X
Confidence Intervals
• How much uncertainty is associated with a

point estimate of a population parameter?
• An interval estimate provides more

information about a population characteristic
than does a point estimate
• Such interval estimates are called confidence

intervals
Confidence Interval Estimate
• An interval gives a range of values:
• Takes into consideration variation in sample
statistics from sample to sample
• Based on observation from 1 sample
• Gives
information about closeness to unknown
population parameters
• Stated in terms of level of confidence
• Can never be 100% confident
Estimation Process
Random Sample I am 95%

confident that
μ is between
Population Mean 40 & 60.
(mean, μ, is X = 50
unknown)
Sample
General Formula
• Thegeneral formula for all

confidence intervals is:
Point Estimate ± (Critical Value)(Standard Error)
Confidence Level
• Confidence Level
• Confidence in which the interval
will contain the unknown
population parameter
• A percentage (less than 100%)
Confidence Level, (1-) (continued)
• Suppose confidence level = 95%

• Also written (1 - ) = .95
• A relative frequency interpretation:
• In the long run, 95% of all the confidence intervals
that can be constructed will contain the unknown
true parameter
• A specific interval either will contain or will not
contain the true parameter
• No probability involved in a specific interval
Confidence
Intervals
Population Population
Mean Proportion
σ Known σ Unknown
Confidence Interval for μ (σ Known)
• Assumptions
• Population standard deviation σ is known
• Population is normally distributed
• If population is not normal, use large sample
• Confidence interval estimate: σ

XZ
n
(where Z is the normal distribution critical value for a probability of
α/2 in each tail)
Finding the Critical Value, Z
Z = 1.96
• Consider a 95% confidence interval:
1−  = .95
α α
= .025 = .025
2 2
Z units: Z= -1.96 0 Z= 1.96

Lower Upper
X units: Confidence Point Estimate Confidence
Limit Limit
Common Levels of Confidence
• Commonly used confidence levels are 90%,

95%, and 99%
Confidence
Confidence
Coefficient, Z value
Level
1− 
80% .80 1.28
90% .90 1.645
95% .95 1.96
98% .98 2.33
99% .99 2.58
99.8% .998 3.08
99.9% .999 3.27
Intervals and Level of Confidence
Sampling Distribution of the Mean
/2 1−  /2
x
Intervals μx = μ
extend x1
fromσ x2 (1-)x100%
X+Z of intervals
n
constructed
to σ contain μ;
X−Z
n ()x100% do
Confidence not.
Intervals
Example
▪ A sample of 11 circuits from a large normal

population has a mean resistance of 2.20
ohms. We know from past testing that the
population standard deviation is .35 ohms.
▪ Determine a 95% confidence interval for the

true mean resistance of the population.
Example
(continued)
▪ A sample of 11 circuits from a large normal

population has a mean resistance of 2.20
ohms. We know from past testing that the
population standard deviation is .35 ohms.
σ
▪ Solution: XZ
n
= 2.20  1.96 (.35/ 11)
= 2.20  .2068
(1.9932 , 2.4068)
Confidence
Intervals
Population Population
Mean Proportion
σ Known σ Unknown
Confidence Interval for μ
(σ Unknown)
(continued)
▪ Assumptions
▪ Population standard deviation is unknown
▪ Population is normally distributed
▪ If population is not normal, use large sample
▪ Use Student’s t Distribution
▪ Confidence Interval Estimate:
S
X  t n-1
n
(where t is the critical value of the t distribution with n-1 d.f. and an
area of α/2 in each tail)
Student’s t Distribution
▪ The t is a family of distributions

▪ The t value depends on degrees of
freedom (d.f.)
▪ Number of observations that are free to vary after
sample mean has been calculated
d.f. = n - 1
Degrees of Freedom (df)
Idea: Number of observations that are free to vary
after sample mean has been calculated
Example: Suppose the mean of 3 numbers is 8.0
Let X1 = 7 If the mean of these three

Let X2 = 8 values is 8.0,
What is X3? then X3 must be 9
(i.e., X3 is not free to vary)
Here, n = 3, so degrees of freedom = n – 1 = 3 – 1 = 2
(2 values can be any numbers, but the third is not free to vary
for a given mean)
Student’s t Distribution
Note: t Z as n increases
Standard
Normal
(t with df = )
t (df = 13)
t-distributions are bell-
shaped and symmetric, but
have ‘fatter’ tails than the t (df = 5)
normal
0 t
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 7-54
Student’s t Table
Upper Tail Area

Let: n = 3
df .25 .10 .05 df = n - 1 = 2
 = .10
1 1.000 3.078 6.314 /2 =.05
2 0.817 1.886 2.920

3 0.765 1.638 2.353 /2 = .05
The body of the table

contains t values, not 0 2.920 t
probabilities
t distribution values
With comparison to the Z value
Confidence t t t Z
Level (10 d.f.) (20 d.f.) (30 d.f.) ____
.80 1.372 1.325 1.310 1.28

.90 1.812 1.725 1.697 1.64
.95 2.228 2.086 2.042 1.96
.99 3.169 2.845 2.750 2.58
Note: t Z as n increases
Example
A random sample of n = 25 has X = 50 and
S = 8. Form a 95% confidence interval for μ
▪ d.f. = n – 1 = 24, so t /2 , n−1 = t.025,24 = 2.0639
The confidence interval is

S 8
X  t /2, n-1 = 50  (2.0639)
n 25
(46.698 , 53.302)
DETERMINING SAMPLE SIZES
Sampling Error
▪ The required sample size can be found to reach
a desired margin of error (e) with a specified
level of confidence (1 - )
▪ The margin of error is also called sampling error

▪ the amount of imprecision in the estimate of the
population parameter
▪ the amount added and subtracted to the point
estimate to form the confidence interval
Determining Sample Size
Determining
Sample Size
For the
Mean Sampling error
(margin of error)
σ σ
XZ e=Z
n n
(continued)
Determining
Sample Size
For the
Mean
σ Z σ
2 2
e=Z Now solve
for n to get n= 2
n e
(continued)
▪ To determine the required sample size for the

mean, you must know:
▪ The desired level of confidence (1 - ), which

determines the critical Z value
▪ The acceptable sampling error (margin of error), e
▪ The standard deviation, σ
Required Sample Size Example
If  = 45, what sample size is needed to
estimate the mean within ± 5 with 90%
confidence?
Z σ 2
(1.645) (45) 2 2 2
n= 2
= 2
= 219.19
e 5
So the required sample size is n = 220

(Always round up)

Sampling MM 2022

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Sampling MM 2022

Uploaded by

Copyright:

Available Formats

Sampling Approaches and

Make generalizations On the basis of

A census involves collecting data from all

A sample is a relatively small subset of the

The sampling design process involves

1. Define the target population.

2. Choose the sampling frame.

3. Select the sampling method.

4. Determine the sample size.

5. Implement the sampling plan.

A representative sample mirrors the

. . . the complete group of objects or

. . . . elements or objects available for

. . . . as complete a list as possible of

Probability = each element of the population has a

Non-Probability = not every element of the target

. . . involves selecting sample

. . . a form of convenience sampling,

. . . . a sampling method in which each

▪ Simple Random Sample – basis for

▪ Stratified Random Sample

▪01 Alaska Airlines ▪11 DuPont ▪21 Lucent

▪9▪9▪4▪3▪7 ▪8▪7▪9▪6▪1 ▪4▪5▪7▪3▪7 ▪3▪7▪5▪5▪2 ▪9▪7▪9▪6▪9 ▪3▪9▪0▪9▪4 ▪3▪4▪4▪7▪5 ▪3▪1▪6▪

▪5▪0▪6▪5▪6 ▪0▪0▪1▪2▪7 ▪6▪8▪3▪6▪7 ▪6▪6▪8▪8▪2 ▪0▪8▪1▪5▪6 ▪8▪0▪0▪1▪6 ▪7▪8▪2▪2▪4 ▪5▪8▪3▪

▪8▪0▪8▪8▪0 ▪6▪3▪1▪7▪1 ▪4▪2▪8▪7▪7 ▪6▪6▪8▪3▪5 ▪6▪0▪5▪1▪5 ▪7▪0▪2▪9▪6 ▪5▪0▪0▪2▪6 ▪4▪5▪5▪

▪8▪6▪4▪2▪0 ▪4▪0▪8▪5▪3 ▪5▪3▪7▪9▪8 ▪8▪9▪4▪5▪4 ▪6▪8▪1▪3▪0 ▪9▪1▪2▪5▪3 ▪8▪8▪1▪0▪4 ▪7▪4▪3▪

▪6▪0▪0▪9▪7 ▪8▪6▪4▪3▪6 ▪0▪1▪8▪6▪9 ▪4▪7▪7▪5▪8 ▪8▪9▪5▪3▪5 ▪9▪9▪4▪0▪0 ▪4▪8▪2▪6▪8 ▪3▪0▪6▪

▪5▪2▪5▪8▪7 ▪7▪1▪9▪6▪5 ▪8▪5▪4▪5▪3 ▪4▪6▪8▪3▪4 ▪0▪0▪9▪9▪1 ▪9▪9▪7▪2▪9 ▪7▪6▪9▪4▪8 ▪1▪5▪9▪

▪8▪9▪1▪5▪5 ▪9▪0▪5▪5▪3 ▪9▪0▪6▪8▪9 ▪4▪8▪6▪3▪7 ▪0▪7▪9▪5▪5 ▪4▪7▪0▪6▪2 ▪7▪1▪1▪8▪2 ▪6▪4▪4▪

▪k▪=▪ size of selection interval

. . . requires the researcher

▪ Cluster sampling – involves dividing the

• The variability of elements in the target population.

1. The degree of confidence

• A sample statistic is a A population parameter is

• An estimator of a population parameter is a sample

• The sample mean, X , is the most common estimator of the

• The sampling distribution of a statistic is the

When sampling from a normal population with mean  and standard

This means that, as the Sampling Distribution of the Sample Mean

sample size increases, the 0.4

Sampling Distribution: n =16

sample mean remains

 and finite standard deviation , the sampling

distribution of the sample mean will tend to a 0.05

normal distribution with mean  and standard n=20

For “large enough” n: Large n

Normal Uniform Skewed General

• How much uncertainty is associated with a

• An interval estimate provides more

• Such interval estimates are called confidence

Random Sample I am 95%

• Thegeneral formula for all

• Suppose confidence level = 95%

• Confidence interval estimate: σ

Z units: Z= -1.96 0 Z= 1.96

• Commonly used confidence levels are 90%,

▪ A sample of 11 circuits from a large normal

▪ Determine a 95% confidence interval for the

▪ A sample of 11 circuits from a large normal

▪ The t is a family of distributions

Let X1 = 7 If the mean of these three

Upper Tail Area