You are on page 1of 63

Sampling Approaches and

Considerations

1
Statistics is a Science of
Inference
• Statistical Inference: On basis of sample
– Predict and forecast values of statistics derived from
population parameters... limited and incomplete
– Test hypotheses about values of sample information
population parameters...
– Make decisions...

Make generalizations On the basis of


about the observations of a
characteristics of a sample, a part of a
population... population
The Literary Digest Poll (1936)

Unbiased
Sample
Unbiased, representative
sample drawn at random
Democrats Republicans from the entire
Population
population.

Biased
People who have phones Sample Biased, unrepresentative
and/or cars and/or are
Digest readers. sample drawn from
Democrats
people who have cars
Republicans
Population and/or telephones and/or
read the Digest.
Sampling vs. Census ?
Go On-Line
www.surveysampling.com

A census involves collecting data from all


members of a population.

A sample is a relatively small subset of the


population that is selected to be representative
of the population’s characteristics.

4
Sampling Design Process

The sampling design process involves


answering three questions:
1. Should a sample or a census be used?
2. If a sample, then which sampling
approach is best?
3. How large a sample is necessary?

5
To obtain a representative
sample . . . .

Steps to follow:

1. Define the target population.

2. Choose the sampling frame.

3. Select the sampling method.

4. Determine the sample size.

5. Implement the sampling plan.

6
Representative Sample

A representative sample mirrors the


characteristics of the population and
minimizes the errors associated with
sampling.

7
▪Population inferences can be made...
▪...by selecting a representative sample
from the population
Target Population

. . . the complete group of objects or


elements relevant to the research
project. They are relevant because
they possess the information the
research project is designed to
collect.

10
Sampling Unit

. . . . elements or objects available for


selection during the sampling process are
known as the sampling unit.

11
Sampling Frame

. . . . as complete a list as possible of


all the elements in the population from
which the sample is drawn.

12
Sampling Methods

Go On-Line
www.svys.com

Probability

Non-Probability

13
Probability vs. Non-Probability Sampling

Probability = each element of the population has a


known, but not necessarily equal, probability of being
selected in a sample.

Non-Probability = not every element of the target


population has a chance of being selected because
the inclusion or exclusion of elements in a sample is
left to the discretion of the researcher.

14
Convenience Sampling

. . . involves selecting sample


elements that are most readily
available to participate in the
study and who can provide the
required information.

15
Judgment Sampling

. . . a form of convenience sampling,


sometimes referred to as a
purposive sample, in which the
researcher’s judgment is used to
select the sample elements.

16
Simple Random Sampling

. . . . a sampling method in which each


element of the population has an equal
probability of being selected.

17
Random Sampling Techniques

▪ Simple Random Sample – basis for


other random sampling techniques
• Each unit is numbered from 1 to n
• A random number generator can be used to
select
n items from the sample
Random Sampling Techniques

▪ Stratified Random Sample


• Proportionate (% of the sample taken from
each stratum is proportionate to the % that
each stratum is within the whole
population)
• Disproportionate (when the % of the sample
taken from each stratum is not
proportionate to the % that each stratum is
within the whole population)
▪ Systematic Random Sample
▪ Cluster (or Area) Sampling
Simple Random Sample:
Sample Members

▪01 Alaska Airlines ▪11 DuPont ▪21 Lucent


▪02 Alcoa ▪12 Exxon Mobil ▪22 Mattel
▪03 Ashland ▪13 General Dynamics ▪23 Mead
▪04 Bank of America ▪14 General Electric ▪24 Microsoft
▪05 BellSouth ▪15 General Mills ▪25 Occidental Petroleum
▪06 Chevron ▪16 Halliburton ▪26 JCPenney
▪07 Citigroup ▪17 IBM ▪27 Procter & Gamble
▪08 Clorox ▪18 Kellog ▪28 Ryder
▪09 Delta Air Lines ▪19 KMart ▪29 Sears
▪10 Disney ▪20 Lowe’s ▪30 Time Warner

▪ N = 30
▪ n=6
Simple Random Sampling:
Random Number Table

▪9▪9▪4▪3▪7 ▪8▪7▪9▪6▪1 ▪4▪5▪7▪3▪7 ▪3▪7▪5▪5▪2 ▪9▪7▪9▪6▪9 ▪3▪9▪0▪9▪4 ▪3▪4▪4▪7▪5 ▪3▪1▪6▪

▪5▪0▪6▪5▪6 ▪0▪0▪1▪2▪7 ▪6▪8▪3▪6▪7 ▪6▪6▪8▪8▪2 ▪0▪8▪1▪5▪6 ▪8▪0▪0▪1▪6 ▪7▪8▪2▪2▪4 ▪5▪8▪3▪

▪8▪0▪8▪8▪0 ▪6▪3▪1▪7▪1 ▪4▪2▪8▪7▪7 ▪6▪6▪8▪3▪5 ▪6▪0▪5▪1▪5 ▪7▪0▪2▪9▪6 ▪5▪0▪0▪2▪6 ▪4▪5▪5▪

▪8▪6▪4▪2▪0 ▪4▪0▪8▪5▪3 ▪5▪3▪7▪9▪8 ▪8▪9▪4▪5▪4 ▪6▪8▪1▪3▪0 ▪9▪1▪2▪5▪3 ▪8▪8▪1▪0▪4 ▪7▪4▪3▪

▪6▪0▪0▪9▪7 ▪8▪6▪4▪3▪6 ▪0▪1▪8▪6▪9 ▪4▪7▪7▪5▪8 ▪8▪9▪5▪3▪5 ▪9▪9▪4▪0▪0 ▪4▪8▪2▪6▪8 ▪3▪0▪6▪

▪5▪2▪5▪8▪7 ▪7▪1▪9▪6▪5 ▪8▪5▪4▪5▪3 ▪4▪6▪8▪3▪4 ▪0▪0▪9▪9▪1 ▪9▪9▪7▪2▪9 ▪7▪6▪9▪4▪8 ▪1▪5▪9▪

▪8▪9▪1▪5▪5 ▪9▪0▪5▪5▪3 ▪9▪0▪6▪8▪9 ▪4▪8▪6▪3▪7 ▪0▪7▪9▪5▪5 ▪4▪7▪0▪6▪2 ▪7▪1▪1▪8▪2 ▪6▪4▪4▪

N = 30
n=6
Systematic Sampling

▪N
▪k▪ = ▪ ▪▪,
. . . a process that involves
▪n
randomly selecting an initial
▪where▪:
starting point on a list, and
thereafter every nth element in ▪n▪=▪ sample size
the sampling frame.
▪N▪=▪ population size

▪k▪=▪ size of selection interval


To take a systematic sample of n = 40 from the population of N = 800 full-time
employees
1. Partition the frame of 800 into 40 groups, each of which contains 20
employees.
2. 2. Select a random number from the first 20 individuals and include every
twentieth individual after the first selection in the sample.
3. For example, if the first random number you select is 008, your subsequent
22
selections are 028, 048, 068, 088, 108,c, 768, and 788.
Stratified Sampling

. . . requires the researcher


to partition the target
population into relatively
homogeneous subgroups that
are distinct and non-
overlapping.

23
Cluster Sampling

. . . a form of probability
sampling in which the
relatively homogeneous
individual clusters where
sampling occurs are chosen
randomly and not all
clusters are sampled.

24
Cluster Sampling

▪ Cluster sampling – involves dividing the


population into non-overlapping areas
• Identifies the clusters that tend to be
internally homogeneous
• Each cluster is a microcosm of the
population
▪ If the cluster is too large, a second set
of clusters is taken from each original
cluster
• This is two stage sampling
Cluster Sampling

▪ Advantages
• More convenient for geographically
dispersed populations
• Reduced travel costs to contact sample
elements
• Simplified administration of the survey
• Unavailability of sampling frame prohibits
using other random sampling methods
Cluster Sampling

▪ Disadvantages
• Statistically less efficient when the cluster
elements
are similar
• Costs and problems of statistical analysis
are greater
than for simple random sampling
Determining sample size involves achieving a
balance between several factors:

• The variability of elements in the target population.


• The type of sample required.
• Time available.
• Budget.
• Required estimation precision.
• Whether findings will be generalized.

28
Three decisions to make when statistical
formulas are used to determine sample size:

1. The degree of confidence


(often 95%).
2. The specified level of precision
(amount of acceptable error).
3. The amount of variability
(population homogeneity).

29
SAMPLING
DISTRIBUTION
=====================
Sample Statistics as Estimators of
Population Parameters

• A sample statistic is a A population parameter is


numerical measure of a a numerical measure of a
summary characteristic summary characteristic of
of a sample.
a population.

• An estimator of a population parameter is a sample


statistic used to estimate or predict the population
parameter.
• An estimate of a parameter is a particular numerical
value of a sample statistic obtained through sampling.
• A point estimate is a single value used as an estimate of
a population parameter.
Estimators

• The sample mean, X , is the most common estimator of the


population mean, 
• The sample variance, s2, is the most common estimator of
the population variance, 2.
• The sample standard deviation, s, is the most common
estimator of the population standard deviation, .
• The sample proportion, p, is the most common estimator of
the population proportion, p. ^
Sampling Distributions (1)

• The sampling distribution of a statistic is the


probability distribution of all possible values
the statistic may assume, when computed
from random samples of the same size, drawn
from a specified population.
• The sampling distribution of X is the
probability distribution of all possible values
the random variable may assume when a
sample of size n is taken from a specified
X
population.
Sampling from a Normal Population

When sampling from a normal population with mean  and standard


deviation , the sample mean, X, has a normal sampling distribution:

2

X ~ N (, )
n

This means that, as the Sampling Distribution of the Sample Mean

sample size increases, the 0.4

Sampling Distribution: n =16


sampling distribution of the 0.3
Sampling Distribution: n =4

sample mean remains

f(X)
0.2

Sampling Distribution: n =2
centered on the population 0.1
Normal population
Normal population
mean, but becomes more 0.0


compactly distributed around
that population mean
The Central Limit Theorem

n=5
When sampling from a population with mean 0.25

 and finite standard deviation , the sampling


0.20

P(X)
0.15
0.10

distribution of the sample mean will tend to a 0.05


0.00
X

normal distribution with mean  and standard n=20


deviation as the sample size becomes large 0.2

P(X)
(n >30). 0.1

0.0
X

For “large enough” n: Large n


0.4
0.3

f(X)
0.2
0.1
0.0

-
X
The Central Limit Theorem Applies to
Sampling Distributions from Any Population

Normal Uniform Skewed General

Population

n=2

n = 30

 X  X  X  X
Confidence Intervals

• How much uncertainty is associated with a


point estimate of a population parameter?

• An interval estimate provides more


information about a population characteristic
than does a point estimate

• Such interval estimates are called confidence


intervals
Confidence Interval Estimate
• An interval gives a range of values:
• Takes into consideration variation in sample
statistics from sample to sample
• Based on observation from 1 sample
• Gives
information about closeness to unknown
population parameters
• Stated in terms of level of confidence
• Can never be 100% confident
Estimation Process

Random Sample I am 95%


confident that
μ is between
Population Mean 40 & 60.
(mean, μ, is X = 50
unknown)

Sample
General Formula

• Thegeneral formula for all


confidence intervals is:
Point Estimate ± (Critical Value)(Standard Error)
Confidence Level

• Confidence Level
• Confidence in which the interval
will contain the unknown
population parameter
• A percentage (less than 100%)
Confidence Level, (1-) (continued)

• Suppose confidence level = 95%


• Also written (1 - ) = .95
• A relative frequency interpretation:
• In the long run, 95% of all the confidence intervals
that can be constructed will contain the unknown
true parameter
• A specific interval either will contain or will not
contain the true parameter
• No probability involved in a specific interval
Confidence Intervals
Confidence
Intervals

Population Population
Mean Proportion

σ Known σ Unknown
Confidence Interval for μ (σ Known)
• Assumptions
• Population standard deviation σ is known
• Population is normally distributed
• If population is not normal, use large sample

• Confidence interval estimate: σ


XZ
n
(where Z is the normal distribution critical value for a probability of
α/2 in each tail)
Finding the Critical Value, Z

Z = 1.96
• Consider a 95% confidence interval:
1−  = .95

α α
= .025 = .025
2 2

Z units: Z= -1.96 0 Z= 1.96


Lower Upper
X units: Confidence Point Estimate Confidence
Limit Limit
Common Levels of Confidence

• Commonly used confidence levels are 90%,


95%, and 99%

Confidence
Confidence
Coefficient, Z value
Level
1− 
80% .80 1.28
90% .90 1.645
95% .95 1.96
98% .98 2.33
99% .99 2.58
99.8% .998 3.08
99.9% .999 3.27
Intervals and Level of Confidence
Sampling Distribution of the Mean

/2 1−  /2
x
Intervals μx = μ
extend x1
fromσ x2 (1-)x100%
X+Z of intervals
n
constructed
to σ contain μ;
X−Z
n ()x100% do
Confidence not.
Intervals
Example

▪ A sample of 11 circuits from a large normal


population has a mean resistance of 2.20
ohms. We know from past testing that the
population standard deviation is .35 ohms.

▪ Determine a 95% confidence interval for the


true mean resistance of the population.
Example
(continued)

▪ A sample of 11 circuits from a large normal


population has a mean resistance of 2.20
ohms. We know from past testing that the
population standard deviation is .35 ohms.
σ
▪ Solution: XZ
n
= 2.20  1.96 (.35/ 11)
= 2.20  .2068
(1.9932 , 2.4068)
Confidence Intervals
Confidence
Intervals

Population Population
Mean Proportion

σ Known σ Unknown
Confidence Interval for μ
(σ Unknown)
(continued)

▪ Assumptions
▪ Population standard deviation is unknown
▪ Population is normally distributed
▪ If population is not normal, use large sample
▪ Use Student’s t Distribution
▪ Confidence Interval Estimate:
S
X  t n-1
n
(where t is the critical value of the t distribution with n-1 d.f. and an
area of α/2 in each tail)
Student’s t Distribution

▪ The t is a family of distributions


▪ The t value depends on degrees of
freedom (d.f.)
▪ Number of observations that are free to vary after
sample mean has been calculated

d.f. = n - 1
Degrees of Freedom (df)
Idea: Number of observations that are free to vary
after sample mean has been calculated
Example: Suppose the mean of 3 numbers is 8.0

Let X1 = 7 If the mean of these three


Let X2 = 8 values is 8.0,
What is X3? then X3 must be 9
(i.e., X3 is not free to vary)
Here, n = 3, so degrees of freedom = n – 1 = 3 – 1 = 2
(2 values can be any numbers, but the third is not free to vary
for a given mean)
Student’s t Distribution
Note: t Z as n increases

Standard
Normal
(t with df = )

t (df = 13)
t-distributions are bell-
shaped and symmetric, but
have ‘fatter’ tails than the t (df = 5)
normal

0 t
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 7-54
Student’s t Table

Upper Tail Area


Let: n = 3
df .25 .10 .05 df = n - 1 = 2
 = .10
1 1.000 3.078 6.314 /2 =.05

2 0.817 1.886 2.920


3 0.765 1.638 2.353 /2 = .05

The body of the table


contains t values, not 0 2.920 t
probabilities
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 7-55
t distribution values
With comparison to the Z value

Confidence t t t Z
Level (10 d.f.) (20 d.f.) (30 d.f.) ____

.80 1.372 1.325 1.310 1.28


.90 1.812 1.725 1.697 1.64
.95 2.228 2.086 2.042 1.96
.99 3.169 2.845 2.750 2.58

Note: t Z as n increases

Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 7-56
Example
A random sample of n = 25 has X = 50 and
S = 8. Form a 95% confidence interval for μ

▪ d.f. = n – 1 = 24, so t /2 , n−1 = t.025,24 = 2.0639

The confidence interval is


S 8
X  t /2, n-1 = 50  (2.0639)
n 25
(46.698 , 53.302)
DETERMINING SAMPLE SIZES
Sampling Error
▪ The required sample size can be found to reach
a desired margin of error (e) with a specified
level of confidence (1 - )

▪ The margin of error is also called sampling error


▪ the amount of imprecision in the estimate of the
population parameter
▪ the amount added and subtracted to the point
estimate to form the confidence interval

Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 7-59
Determining Sample Size

Determining
Sample Size

For the
Mean Sampling error
(margin of error)
σ σ
XZ e=Z
n n

Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 7-60
Determining Sample Size
(continued)

Determining
Sample Size

For the
Mean

σ Z σ
2 2
e=Z Now solve
for n to get n= 2
n e
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 7-61
Determining Sample Size
(continued)

▪ To determine the required sample size for the


mean, you must know:

▪ The desired level of confidence (1 - ), which


determines the critical Z value
▪ The acceptable sampling error (margin of error), e
▪ The standard deviation, σ

Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 7-62
Required Sample Size Example
If  = 45, what sample size is needed to
estimate the mean within ± 5 with 90%
confidence?

Z σ 2
(1.645) (45) 2 2 2
n= 2
= 2
= 219.19
e 5

So the required sample size is n = 220


(Always round up)

Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 7-63

You might also like