You are on page 1of 17

PART III

Some Useful Designs


CHAPTER 11

Stratified Sampling

In stratified sampling, the population is partitioned into regions or strata, and a


sample is selected by some design within each stratum. Because the selections in
different strata are made independently, the variances of estimators for individual
strata can be added together to obtain variances of estimators for the whole popula-
tion. Since only the within-stratum variances enter into the variances of estimators,
the principle of stratification is to partition the population in such a way that the
units within a stratum are as similar as possible. Then, even though one stratum
may differ markedly from another, a stratified sample with the desired number of
units from each stratum in the population will tend to be “representative” of the
population as a whole.
A geographical region may be stratified into similar areas by means of some
known variable such as habitat type, elevation, or soil type. Even if a large geo-
graphic study area appears to be homogeneous, stratification into blocks can help
ensure that the sample is spread out over the whole study area. Human populations
may be stratified on the basis of geographic region, city size, sex, or socioeconomic
factors.
In the following, it is assumed that a sample is selected by some probability
design from each of the L strata in the population, with selections in different strata
independent of each other. The variable of interest associated with the ith unit of
stratum h will be denoted yhi . Let Nh represent the number of units in stratum h
and nh the number of units in  the sample from that stratum. The total number
L of
units in the population is N = L N and the total sample size is n = h=1 nh .
h=1 h
Nh
The total of the y-values in stratum h is τh = i=1 yhi and the mean for that

stratum is μh = τh /Nh . The total for the whole population is τ = L h=1 τh . The
overall population mean is μ = τ/N.
The design is called stratified random sampling if the design within each stra-
tum is simple random sampling. Figure 11.1 shows a stratified random sample
from a population of N = 400 units. The sizes of the L = 4 strata are N1 = 200,
N2 = 100, and N3 = N4 = 50. Within each stratum, a random sample without
replacement has been selected independently. The total sample size of n = 40

Sampling, Third Edition. Steven K. Thompson.


© 2012 John Wiley & Sons, Inc. Published 2012 by John Wiley & Sons, Inc.
141
142 stratified sampling

Figure 11.1. Stratified random sample.

has been allocated proportional to stratum size, so that n1 = 20, n2 = 10, and
n3 = n4 = 5.
The results in the next section are written to allow for the possibility of any
design within a given stratum, provided that the selections are independent between
strata; then specific results for stratified random sampling are given.

11.1. ESTIMATING THE POPULATION TOTAL

With Any Stratified Design


Suppose that within stratum h any specified design is used to select the sample sh
of nh units, and suppose that one has an estimator τ̂h which is unbiased for τh with
respect to that design. Let var(τ̂h ) denote the variance of τ̂h , and suppose that one
 τ̂h ) of that variance.
has an unbiased estimator var(
Then an unbiased estimator of the overall population total τ is obtained by
adding together the stratum estimators:

L
τ̂st = τ̂h
h=1

Because of the independence of the selections in different strata, the variance


of the stratified estimator is the sum of the individual stratum variances:

L
var(τ̂st ) = var(τ̂h )
h=1
estimating the population total 143

An unbiased estimator of that variance is the sum of individual stratum


estimators:

L
 τ̂st ) =
var(  τ̂h )
var(
h=1

With Stratified Random Sampling


If the sample is selected by simple random sampling without replacement in each
stratum, then

τ̂h = Nh y h

is an unbiased estimator of τh , where

1 
nh
yh = yhi
nh
i=1

is the sample mean for stratum h.


An unbiased estimator for the population total τ is


L
τ̂st = Nh y h
h=1

having variance


L
σh2
var(τ̂st ) = Nh (Nh − nh )
nh
h=1

where

1 h N
σh2 = (yhi − μh )2
Nh − 1
i=1

is the finite-population variance from stratum h.


An unbiased estimator of the variance of τ̂st is

L
sh2
 τ̂st ) =
var( Nh (Nh − nh )
nh
h=1

where
1  h n
sh2 = (yhi − y h )2
nh − 1
i=1

is the sample variance from stratum h.


144 stratified sampling

11.2. ESTIMATING THE POPULATION MEAN

With Any Stratified Design


Since μ = τ/N , the stratified estimator for μ is
τ̂st
μ̂st =
N
Assuming that the selections in different strata have been made independently,
the variance of the estimator is
1
var(μ̂st ) = var(τ̂st )
N2
with unbiased estimator of variance
1
 μ̂st ) =
var(  τ̂st )
var(
N2

With Stratified Random Sampling


With stratified random sampling, an unbiased estimator of the population mean μ
is the stratified sample mean:

1 
L
y st = Nh y h (11.1)
N
h=1

Its variance is
L 
   
Nh 2 Nh − nh σ 2 h
var(y st ) = (11.2)
N Nh nh
h=1

An unbiased estimator of this variance is


L    
Nh 2 Nh − nh sh2
 st ) =
var(y (11.3)
N Nh nh
h=1

Example 1: The results of a stratified random sample are summarized in Table 11.1.
Substituting in Equation (11.1), the estimate of the population mean is
1
y st = [20(1.6) + 9(2.8) + 12(0.6)]
41
1
= (32 + 25.2 + 7.2)
41
64.4
= = 1.57
41
confidence intervals 145

Table 11.1: Results of a Stratified Random Sample


Stratum h Nh nh yh sh2
1 20 5 1.6 3.3
2 9 3 2.8 4.0
3 12 4 0.6 2.2

The estimator τ̂ of the population total, obtained by multiplying by 41, is 64.4.


The estimated variance of y st [from Equation (11.3)] is
 
1 3.3 4.0 2.2
 st ) = 2 20(20 − 5)
var(y + 9(9 − 3) + 12(12 − 4)
41 5 3 4
322.8
= = 0.192
412
The estimated variance for the estimator of the population total, obtained by
multiplying by 412 , is 322.8. 

11.3. CONFIDENCE INTERVALS

When all the stratum sample sizes are sufficiently large, an approximate
100(1 − α)% confidence interval for the population total is provided by

 τ̂st )
τ̂st ± t var(

where t is the upper α/2 point of the normal distribution. For the mean, the confi-
 μ̂st ). As a rule of thumb, the normal approximation
dence interval is μ̂st ± t var(
may be used if all the sample sizes are at least 30. With small sample sizes, the t dis-
tribution with an approximate degrees of freedom may be used. The Satterthwaite
(1946) approximation for the degrees of freedom d to be used is

L 2
 
L
d= ah sh2 / (ah sh2 )2 /(nh − 1) (11.4)
h=1 h=1

where ah = Nh (Nh − nh )/nh .


Satterthwaite’s formula is based on approximating the distribution of a lin-
ear combination of sample variances with a chi-square distribution. Some pos-
sible refinements to Satterthwaite’s formula are discussed in Ames and Webster
(1991).

Example 2: Confidence Interval. For Example 1, the variance coefficients are


a1 = 20(20 − 5)/5 = 60, a2 = 9(9 − 3)/3 = 18, and a3 = 12(12 − 4)/4 = 24.
146 stratified sampling

The estimated degrees of freedom [from Equation (11.4)] are


[60(3.3) + 18(4.0) + 24(2.2)]2
d=
[60(3.3)]2 /4 + [18(4.0)]2 /2 + [24(2.2)]2 /3
322.82
= = 7.82
13,322.28
or about 8 degrees of√freedom. The approximate 95% confidence interval for the
mean is 1.57 ± 2.306 0.192 = 1.57 ± 1.01 = (0.56, 2.58). 

11.4. THE STRATIFICATION PRINCIPLE

Since the formula for the variance of the estimator of the population mean or total
with stratified sampling contains only within-stratum population variance terms, the
estimators will be more precise the smaller the σh2 . Equivalently, estimation of the
population mean or total will be most precise if the population is partitioned into
strata in such a way that within each stratum, the units are as similar as possible.
Thus, in a survey of a plant or animal population, the study area might be stratified
into regions of similar habitat or elevation, with the idea that within strata, abun-
dances will be more similar than between strata. In a survey of a human population,
stratification may be based on socioeconomic factors or geographic region.

Example 3: Comparison with Simple Random Sampling. Consider a small pop-


ulation of N = 6 units, divided into two strata of Nh = 3 units each, in order to
examine the effectiveness of stratified sampling in comparison to simple random
sampling. The population y-values in stratum 1 are 2, 0, and 1. In stratum 2, the y-
values are 5, 9, and 4. Thus, the overall population mean is μ = 3.5, and the overall
population variance is σ 2 = 10.7. Within stratum 1, the population mean is μ1 = 1
and the population variance is σ12 = 1.0. Within stratum 2, μ2 = 6 and σ22 = 7.0.
For simple random sampling with sample size n = 4, the sample mean is an
unbiased estimator of the population mean and has variance var(y) = [(6 − 4)/6]
(10.7/4) = 0.89. For stratified random sampling with sample sizes n1 = n2 = 2, so
that the total sample size is still 4, the stratified sample mean y st is an unbiased esti-
mator of the population mean having variance var(y st ) = (3/6)2 [(3 − 2)/3](1/2) +
(3/6)2 [(3 − 2)/3](7/2) = 0.33. For this population, stratification has been effective
because the units within each stratum are relatively similar. 

11.5. ALLOCATION IN STRATIFIED RANDOM SAMPLING

Given a totalsample size n, one may choose how to allocate it among the L strata. If
each stratum is the same size and one has no prior information about the population,
a reasonable choice would be to assume equal sample sizes for the strata, so that
for stratum h the sample size would be
n
nh =
L
allocation in stratified random sampling 147

If the strata differ in size, proportional allocation could be used to maintain a


steady sampling fraction throughout the population. If stratum h has Nh units, the
sample size allocated to it will be
nNh
nh =
N
The allocation scheme that estimates the population mean or total with the
lowest variance for a fixed total sample size n under stratified random sampling is
optimum allocation:
nNh σh
nh = L
k=1 Nk σk

The stratum population standard deviations σh may in practice be estimated with


sample standard deviations from past data.
In some sampling situations, the cost of sampling, measured in terms of time
or money, differs from stratum to stratum, and total cost may be described by the
linear relationship

c = c0 + c1 n1 + c2 n2 + · · · + cL nL

where c is the total cost of the survey, c0 is an “overhead” cost, and ch is the cost
per unit observed in stratum h. Then for a fixed total cost c, the lowest variance is

achieved with sample size in stratum h proportional to Nh σh / ch , that is,

(c − c0 )Nh σh / ch
nh = L √
k=1 Nk σk ck

Thus, the optimum scheme allocates larger sample size to the larger or more
variable strata and smaller sample size to the more expensive or difficult-to-sample
strata.

Example 4: Allocation. A population consists of three strata of sizes


N1 = 150, N2 = 90, and N3 = 120, so that the total population size is N = 360.
Based on sample standard deviations from previous surveys, the standard devia-
tions within each stratum are estimated to be approximately σ1 ≈ 100, σ2 ≈ 200,
and σ3 ≈ 300.
Proportional allocation of a sample of total size n = 12 is given by n1 =
12(150)/360 = 5, n2 = 12(90)/360 = 3, and n3 = 12(120)/360 = 4.
Assuming equal cost per unit of sampling in each stratum, the optimal allocation
of a total sample size of n = 12 between the three strata is
12(150)(100)
n1 = = 2.6
150(100) + 90(200) + 120(300)
12(90)(200)
n2 = = 3.1
150(100) + 90(200) + 120(300)
148 stratified sampling

12(120)(300)
n3 = = 6.3
150(100) + 90(200) + 120(300)
Rounding to whole numbers gives n1 = 3, n2 = 3, and n3 = 6. 

11.6. POSTSTRATIFICATION

In some situations it may be desired to classify the units of a sample into strata
and to use a stratified estimate, even though the sample was selected by simple
random, rather than stratified, sampling. For example, a simple random sample of
a human population may be stratified by sex after selection of the sample, or a
simple random sample of sites in a fishery survey may be poststratified on depth.
In contrast to conventional stratified sampling, with poststratification, the stratum
sample sizes n1 , n2 . . . , nL are random variables.
With proportional allocation in conventional stratified random sampling, the
sample size in stratum h is fixed at n h = nNh /N and the variance (Eq. (11.2))
simplifies to var(y st ) = [(N − n)/nN ] L 2
h=1 (Nh /N )σh . With poststratification of
a simple random sample of n units from the whole population, the sample size
nh in stratum h has expected value nNh /N, so that the resulting sample tends
to approximate proportional  allocation. With poststratification the variance of the
stratified estimator y st = L h=1 (Nh /N)y h is approximately

L     L
N − n  Nh 1 N − n  N − Nh 2
var(y st ) ≈ σh2 + 2 σh (11.5)
nN N n N −1 N
h=1 h=1

and the variance of τ̂st = Ny st is var(τ̂st ) = N 2 var(y st ). The first term is the vari-
ance that would be obtained using a stratified random sampling design with propor-
tional allocation. An additional term is added to the variance with poststratification,
due to the random sample sizes.
For a variance estimate with which to construct a confidence interval for the
population mean with poststratified data from a simple random sample, it is rec-
ommended to use the standard stratified sampling method (Eq. (11.3)) rather than
substituting the sample variances directly into Equation (11.5). With poststratifica-
tion, the standard formula (Eq. (11.3)) estimates the conditional variance (given by
Eq. (11.2)) of y st given the sample sizes n1 , . . . , nL , while Equation (11.5) is the
unconditional variance [and see the comments of J. N. K. Rao (1988, p. 440)].
To use poststratification, the relative size Nh /N of each stratum must be known.
If the relative stratum sizes are not known, they may be estimated using double
sampling (see Chapter 14). Further discussion of poststratification may be found
in Cochran (1977), Hansen et al. (1953), Hedayat and Sinha (1991), Kish (1965),
Levy and Lemeshow (1991), Singh and Chaudhary (1986), and Sukhatme and
Sukhatme (1970). Variance approximations for poststratification vary among the
sampling texts. The derivation for the expression given here is given in Section
11.8 under the heading “Poststratification Variance.”
derivations for stratified sampling 149

11.7. POPULATION MODEL FOR A STRATIFIED POPULATION

A simple model for a stratified population assumes that the population Y -values are
independent random variables, each having a normal distribution, and with means
and variances depending on stratum membership. Under this model, the value Yhi
for the ith unit in stratum h has a normal distribution with mean μh and variance σh2 ,
for h = 1, . . . , L, i = 1, . . . , Nh , and the Yhi are independent. A stratified sample
s is selected using any conventional design within each stratum.
NSince for each
unit Yhi is a random variable, the population total T = L h=1
h
i=1 Y hi is also a
random variable. Since the Y -values are observed only for units in the sample, we
wish to predict T using a predictor T̂ computed from the sample data. Desirable
properties to have in a predictor T̂ include model unbiasedness,

E(T̂ |s) = E(T )

where expectation is taken with respect to the model. In addition, we would like
the mean square prediction error E(T̂ − T )2 to be as low as possible.
For a given sample the best unbiased predictor of the population total T is


L
T̂ = Nh y h
h=1

which is the standard stratified sampling estimator. Without the assumption of


normality, the predictor T̂ is best linear unbiased. This result is a special case of
prediction results about the general linear regression model.
In addition, a model-unbiased estimator of the mean square prediction error is
the standard stratified variance estimator
L 
   
Nh 2 Nh − nh S 2 h
Ê(T̂ − T ) =2
N Nh nh
h=1

in which Sh2 is the sample variance within stratum h.

11.8. DERIVATIONS FOR STRATIFIED SAMPLING

Optimum Allocation
Consider the variance of the estimator τ̂st as a function f of the sample sizes, with
the total sample sizes given. The object is to choose n1 , n2 , . . . , nL to minimize


L  
Nh
f (n1 , . . . , nL ) = var(τ̂st ) = Nh σh2 −1
nh
h=1
150 stratified sampling

subject to the constraint


L
nh = n
h=1

The Lagrange multiplier


 method may be used to solve such a problem. Write
g(n1 , . . . , nL ) = nh − n. The solution is obtained by differentiating the function
H = f − λg with respect to each nh and λ, where λ is the Lagrange multiplier,
and setting the partial derivatives equal to zero. The partial derivatives are

∂H N 2σ 2
= − h2 h − λ = 0
∂nh nh

 h = 1, . . . , L. Differentiating with respect to λ reproduces the constraint


for
nh − n = 0. Solving for nh gives

nNh σh
nh = L
k=1 Nk σk

To verify that the solution gives a minimum of the variance function, as opposed
to a maximum or saddle point, the second derivatives are examined. Writing Hhk
for the second partial derivative ∂ 2 H /∂nh ∂nk gives

N 2 σh2
Hhh = h = 1, . . . , L
n3h
Hhk = 0 h = k
Hλh = −1
Hλλ = 0

A sufficient condition for the solutionto be a minimum is that for


L
any set of numbers a1 , . . . , aL satisfying
L  L h=1 Hλh ah = 0, the double sum
H a a is invariably positive (Hancock 1960, p. 115). Since Hhk = 0
h=1 Lhk 
k=1 h k
L L
for h = k, h=1 k=1 Hhk ah ak = h=1 Nh σh /nh , which is invariably positive.
2 2 3

The derivation proceeds similarly when the constraint depends on cost. An


alternative derivation uses the Cauchy–Schwartz inequality (see, e.g., Cochran
1977, p. 97).

Poststratification Variance
With simple random sampling, the number nh of sample units in stratum h has a
hypergeometric distribution with E(nh ) = nNh /N and

var(nh ) = n(Nh /N )(1 − Nh /N)[(N − n)/(N − 1)]


computing notes 151

The poststratification estimator y st is unbiased for the population mean μ pro-


vided that samples in which any of the nh are zero are excluded. Then

var(y st ) = E[var(y st |n1 , . . . , nL )]


L   
 Nh 2 Nh − nh  σ 2
h
=E
N Nh nh
h=1
L 
     
Nh 2 1 1
= σh2 E −
N nh Nh
h=1

Using a Taylor series approximation for 1/nh , whose first derivative is −n−2
h and
second derivation is 2n−3
h , and taking expectation gives the approximation
 
1 1 1
E ≈ + var(nh )
nh E(nh ) n3h
    
N N 2 N − Nh N −n
= +
nNh nNh N N −1
Substituting this approximation into the variance expression gives
L       
 Nh 2 2 N N 2 N − Nh N −n 1
var(y st ) ≈ σh + −
N nNh nNh N N −1 Nh
h=1
L     L  
N − n  Nh 1 N − n  N − Nh
= σh + 2
2
σh2
nN N n N −1 N
h=1 h=1

which completes the derivation.

11.9. COMPUTING NOTES

Calculations and a simulation for stratification will be illustrated using data from the
1997 aerial moose survey along the Yukon River corridor, Yukon-Charley Rivers
National Preserve, Alaska: Project report, November, 1997 (Burch and Demma,
1997). The survey was stratified into three strata and a stratified random sampling
design was used to estimate the number of moose in the study refuge. For units in
the sample moose were counted from the air. For the purposes of our example, we
will assume that every moose in a sample plot is detected. In the actual survey,
a factor for sightability (detectability) was estimated and an additional adjustment
was made.
The population has L = 3 strata based on habitat type. The numbers of units in
each stratum are N 1 = 122, N 2 = 57 and N3 = 22. The sample sizes used were
n1 = 39, n2 = 38 and n3 = 21.
First, estimates are made using the stratified sample data.
152 stratified sampling

# Read in the data or enter them from the print out below.
moosedat <- read.table(file="http://www.stat.sfu.ca/
∼thompson/data/moosedata")
moosedat
# You can see it by printing out the whole data structure:
moosedat
# Note there are two columns called "str" for stratum and
"moose" for the total count of
# moose in each sample plot. The strata are labeled 1 =
"low", 2 =
"medium", and 3 = "high", representing habitat favorability.
# You can rename these for accessibility as follows:
stratum <- moosedat$str
y <- moosedat$moose
# Their total sample size:
length(y)
# The stratum sample sizes:
table(stratum)
# Two simple ways to get stratum sample means and sample
variances to use in calculations:
?tapply
tapply(y,stratum,mean)
tapply(y,stratum,var)
y1 <- y[stratum==1]
y1
length(y1)
mean(y1)
var(y)
# With the second method, repeat for strata 2 and 3.

# Here are the moose data:

> moosedat
str moose
1 3 0
2 3 0
3 3 1
4 3 7
5 3 5
6 3 7
7 3 7
8 2 13
9 3 17
10 3 1
11 2 7
12 3 10
13 2 1
14 2 0
computing notes 153

15 2 1
16 3 4
17 2 8
18 1 2
19 2 3
20 2 0
21 2 2
22 1 2
23 1 2
24 2 4
25 1 1
26 1 1
27 1 0
28 3 10
29 2 4
30 3 8
31 2 23
32 1 1
33 1 0
34 1 0
35 1 0
36 1 0
37 1 4
38 3 2
39 1 0
40 1 12
41 2 3
42 2 3
43 2 0
44 1 1
45 2 18
46 2 3
47 2 2
48 2 2
49 2 13
50 2 1
51 2 0
52 2 8
53 2 10
54 3 17
55 1 3
56 1 3
57 1 0
58 1 0
59 1 0
60 1 0
61 1 2
62 1 0
63 3 3
154 stratified sampling

64 1 0
65 1 1
66 2 11
67 1 5
68 1 17
69 3 33
70 2 2
71 3 8
72 3 10
73 1 1
74 2 0
75 3 2
76 3 9
77 2 2
78 1 0
79 2 0
80 1 0
81 1 0
82 1 2
83 2 1
84 2 0
85 2 0
86 1 3
87 2 0
88 2 4
89 1 0
90 1 0
91 1 0
92 2 10
93 2 1
94 2 0
95 1 0
96 1 0
97 2 0
98 1 0
>

To investigate the properties of the sampling strategy for this population or to


consider alternative strategies, simulation is an invaluable tool. The problem of
course is that the existing data are themselves only a sample from the population,
whereas we would like data on the whole population to achieve the most realistic
simulation. One way of constructing a simulation population for a population like
this is to augment the data by repeating data values in each stratum to impute the
missing values and create an artificial, but arguably realistic, population of N = 201
units, the same number as in the actual study region. Stratum sample values are
used to fill in each complete stratum with repeated sample data. Since the stratum
sizes are not whole-number multiples of the stratum sample sizes, the remaining
needed values are selected at random from the respective stratum sample.
exercises 155

A simulation like this can be used to compare a stratified design with a different
design such as simple random sampling from the whole population, or to study
the effect of changing sample size or allocation scheme. The simulation procedure
follows.

# A simulation for evaluating the stratified random sampling


# design for the moose population, with the artificially
# augmented population made by repeating or "bootstrapping"
# the data values in each stratum to get an augmented
# population as large as the real population and having
# values as realistic as possible.

y1aug <- c(rep(y1,3), sample(y1,5))


y2aug <- c(y2, sample(y2,19))
y3aug <- c(y3, sample(y3, 1))
tauhat <- numeric()
N1 <- 122
N2 <- 57
N3 <- 22
n1 <- 39
n2 <- 38
n3 <- 21

for (k in 1:b){
s1 <- sample(N1, n1)
tauhat1 <- N1 * mean(y1aug[s1])
s2 <- sample(N2, n2)
tauhat2 <- N2 * mean(y2aug[s2])
s3 <- sample(N3, n3)
tauhat3 <- N3 * mean(y3aug[s3])
tauhat[k] <- tauhat1 + tauhat2 + tauhat3
}

hist(tauhat)
mean(tauhat)
var(tauhat)
tau <- sum(y1aug) + sum(y2aug) + sum(y3aug)
tau
mean((tauhat-tau)^2)
sqrt(mean((tauhat-tau)^2))

EXERCISES

1. The following results were obtained from a stratified random sample:

Stratum 1: N1 = 100, n1 = 50, y 1 = 10, s12 = 2800


156 stratified sampling

Stratum 2: N2 = 50, n2 = 50, y 2 = 20, s22 = 700


Stratum 3: N3 = 300, n3 = 50, y 3 = 30, s32 = 600

(a) Estimate the mean for the whole population.


(b) Give a 95% confidence interval for the mean.

2. Allocate a total sample size of n = 100 between two strata having sizes N1 =
200 and N2 = 300 and variances σ12 = 81 and σ22 = 16 (a) using proportional
allocation and (b) using optimal allocation (assume equal costs).

3. Use stratified random sampling to estimate the mean or total of a population of


your choice. In the process of carrying out the survey and making the estimate,
think about or discuss with others the following:

(a) What practical problems arise in establishing a frame, such as a map or


list of units, from which to select the sample?
(b) How is the sample selection actually carried out?
(c) What special problems arise in observing the units selected?
(d) Estimate the population mean or total.
(e) Estimate the variance of the estimator above.
(f) Give a 95% confidence interval for the population mean or total.
(g) Using the stratum sample variances from your data, give the proportional
and the optimum allocations of a sample of size 200 in a future survey.
(h) How would you improve the survey procedure if you were to do it again?

You might also like