Professional Documents
Culture Documents
Journal of The American Statistical Association
Journal of The American Statistical Association
Minimum Variance
Stratification
a b
Tore Dalenius & Joseph L. Hodges Jr
a
University of Stockholm
b
University of California , Berkeley
Published online: 11 Apr 2012.
To cite this article: Tore Dalenius & Joseph L. Hodges Jr (1959) Minimum Variance
Stratification, Journal of the American Statistical Association, 54:285, 88-101
Taylor & Francis makes every effort to ensure the accuracy of all the
information (the “Content”) contained in the publications on our platform.
However, Taylor & Francis, our agents, and our licensors make no
representations or warranties whatsoever as to the accuracy, completeness,
or suitability for any purpose of the Content. Any opinions and views
expressed in this publication are the opinions and views of the authors, and
are not the views of or endorsed by Taylor & Francis. The accuracy of the
Content should not be relied upon and should be independently verified with
primary sources of information. Taylor and Francis shall not be liable for any
losses, actions, claims, proceedings, demands, costs, expenses, damages,
and other liabilities whatsoever or howsoever caused arising directly or
indirectly in connection with, in relation to or arising out of the use of the
Content.
This article may be used for research, teaching, and private study purposes.
Any substantial or systematic reproduction, redistribution, reselling, loan,
sub-licensing, systematic supply, or distribution in any form to anyone is
expressly forbidden. Terms & Conditions of access and use can be found at
http://www.tandfonline.com/page/terms-and-conditions
Downloaded by [Northwestern University] at 22:07 20 December 2014
MINIMUM VARIANCE STRATIFICATION
TORE DALENIUS
University of Stockholm
AND
JOSEPHL. HODGES, JR.
University of California (Berkeley)
1. Introduction..... . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88
2. An approximation to the exact solution... . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90
3. Application I: f(x) =e-S. . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . 94
4. Application II:f(x)=xe-z • . . . . . . . . . • . . . • . . . . . . . . . . . . . . . . . . . . . . . . . • • . . . . • 97
Downloaded by [Northwestern University] at 22:07 20 December 2014
1. INTRODUCTION
(1)
The range Xo, XL of the estimation variable x is cut up into L parts at points
MINIMUM VARIANCE STRATIFICATION 89
Xl < ... <Xh < ... <XL-l. Each such part corresponds to a stratum; i.e.
the estimation variable X is used as stratification variable as well.' For the hth
stratum
Xh
Wh =
f Xh-l
f(t)dt (2)
Xh
Wh,uh = f Xh-l
tf(t)dt (3)
and
"h
f t 2f(t)dt
Downloaded by [Northwestern University] at 22:07 20 December 2014
l) Xh-l
(4)
Uh" = - - W - - - ,uh 2
h
Obviously
(5)
(7)
U
2
min(i ) = ~ ( ~ WhUhY (8)
The solution given by Eq. (9) presents computational difficulties when ap-
plied in a general case. For the special case where f(x) is the normal density
ip(x), Zindler [10] gives a useful computational procedure. In par. 2 of this
paper, a simple, generally applicable method of approximating the exact set
[x,,] is presented.
1.2.2 A conjecture. Dalenius and Gurney [4] conjectured that
TV"u" = constant (10)
would serve as an approximation to Eq. (9), when Lis "large.">
1.2.3 A rule of thumb. The observation that for many populations, and for
reasonable locations of the stratum boundaries, the relative variance does not
vary much from stratum to stratum, leads from Eq. (10) to
Downloaded by [Northwestern University] at 22:07 20 December 2014
(l1a)
where JL'" refers to a measure variable which is assumed to be highly correlated
with the estimation variable, as discussed by Hansen et al. [6, p. 219]. For the
special case when these variables are identical, the rule is
TV"JL" = constant (l1b)
Mahalanobis [8, p. 4, footnote] proposes a similar rule: " ... an optimum
or nearly optimum solution would be obtained when the expected contribution
of each stratum to the total aggregate value of x is made equal for all strata"
(p. 4, footnote).
Kitagawa [7, pp. 27-30] analyzes this "principle of equipartition" in order to
justify it and presents some related results.
1.2.4 Equidistant stratification. The normal kind of first step to try when
setting up strata is often to make
X"+l - x" = constant (12)
Aoyama [1] derived this result by applying the mean value theorem to
Eq. (9), and assuming that the variation of the density is small in each stratum.
Dalenius and Hodges [5] analyze this rule in some detail.
2. AN APPROXIMATION TO THE EXACT SOLUTION
(13)
• In practical sample design, it is important to realize that the big gain from the use of one-stage stratified
sampling is obtained in going from L = 1 to L =2. This point is illustrated by Dalenius [3, Ch, 8].
MINIMUM VARIANCE STRATIFICATION 91
When u-+ 00 , y(U) approaches an upper bound H. The roots Xl' ••• Xh' ••• X'L-1
of the following equations
h
y(u) = -H, h = 1 ... L - 1 (14)
L
are taken as the (first) approximations, for large L, to the points Xl ••• Xh •••
XL-1 satisfying Eq. (9). ..
2.3 Justification
This approximation may be derived by the following heuristic argument.
When L is large, the strata will be narrow, and each will have an approximately
Downloaded by [Northwestern University] at 22:07 20 December 2014
rectangular distribution, so that v'12 rTh == Xh - Xh-1. Then, by the mean value
theorem there exists a value fh of f in the hth stratum, such that
The last sum is minimized by making Yh - Yh-l = constant. A rigorous proof has
been given by Dalenius and Hodges [5].
Ip(u) = f-'"
U tPj(t)dt (16)
2
The conditional means !th, !ti and variances rTh2, rTi of the two strata are easily
expressed in terms of Ip(u)
f xh
xg
tj(t)dt
II (Xh) - I 1(x g )
!th = - - - - - (17)
Io(Xh) - 1 0(x o)
J:hj(t)dt
(18)
Moreover, we define
(19)
92 AMERICAN STATISTICAL ASSOCIATION JOURNAL, MARCH 1959
Thus
J 1h
fJ.h=- (20)
i:
J 2h J 1h2 J ohJ2h - J 1h2
rTh2 = - - - = ----- (21)
i: J oh2 J oh2
The condition given by Eq. (9) may now be expressed in terms of J ph and
J pi as follows
(24)
A
J 2 - 2bJ1 + b2Jo (25)
.JJoJ2 - J 12
which is A h for h=l, we get
2
A = -= b "-' 1.15b (26)
yl3
i.e. A changes at about 1.15 times the rate at which the interval length (0, b)
changes.
As the next step, consider that this rectangular distribution is divided into
£=2 strata, at a point Xl', 0<X1' <b, arbitrarily chosen. This point Xl' specifies
A l ' and Bl. Applying the result quoted above, we realize that changing Xl' by
one unit will change A l ' and B l ' by approximately 1.15 units each and the
MINIMUM VARIANCE STRATIFICATION 93
difference AI' by approximately 2.3 units. It seems reasonable to determine a
second point Xl" of stratification by setting !:ll" equal to zero, where
4
!:ll" = !:l/ + (xI" - Xl') ----= (27)
v'3
The solution of !:ll"=0 is given by
We will now generalize this to any number L of strata. For three consecutive
strata, with indices g, hand i, we have (in the rectangular case)
aA,,' aB,,' 2
--=-- =
axa' ax,,' -v!3
(29)
aA,,' aB,,' 2
-- = -- = -----:d
ax/ -v!3
while all expressions of the type aAh'/ ax;', aBh ' / axa' etc. are equal to zero.
From !:l,,' =A h' -Bh' we derive analogous expressions for a!:lh/aXh' etc.
These values will now be used in the following way. We have a set [Xh']
with corresponding !:lh'-values. We want to find a set [Xh"] with !:lh"l < !:lh'l. I I
By the mean value theorem we have
where we put xo"=xo' and X/'=XL'. Solving this system for x,," gives the set
wanted.
The matrix M of a!:l,,'/ ax;' is under the approximated rectangular distribu-
tion
2 -1 0 0 0 0
-1 2 -1 0 0 0
2
M =--= 0 -1 2 -1 0 0 (31)
v'3
0 0 0 ... -1 2
L-1 L-2 2 1
L-2 2(L - 2) 4 2
M-1 = - -
va 6 3
(32)
2·L 3 6
2 4 2(L - 2) L-2
1 2 (L - 2) L-1
Thus, the new set [x,,"] is found by computing
Xl" = xt' - 2L
va [(L - l),M + (L - 2).62' + ... +.6' L-d
Downloaded by [Northwestern University] at 22:07 20 December 2014
(35)
Io(u)
Il(u)
= 1 - e-
= 1 - e-
U
uer"
U
-
! (36)
12 (u ) = 2 - ru (u 2 + 2u + 2)
3.2 Numerical calculations
3.2.1 L=2 strata. The first approximation Xl' is given by the root of the
following equation
MINIMUM VARIANCE STRATIFICATION 95
or in this case
1
2(1 - e- u / 2 ) = - 2 = 1 (38)
2
TABLE 95
Downloaded by [Northwestern University] at 22:07 20 December 2014
We compute
A 1' = 2.2761
B/ = 2.0000
11/ = 0.2761
As 111' is considerably larger than zero, we have to adjust xl. We thus com-
pute
Xl
" = Xl -
, vS
-"'1
A ,
(39)
4
giving Xl" = 1.27.
We now repeat the computations contained in Table 95. From the resulting
I p (x1" )- and J"-functions, we compute
A/' = 2.0162
B 1" = 2.0007
/11" = 0.0155
• The computations have been carried out at the Institute of Statistics, University of Stockholm.
96 AMERICAN STATISTICAL ASSOCIATION JOURNAL, MARCH 1969
Thus, Ill" is much closer to 0 than is Ill'. Therefore, Xl" should be closer to
the best point Xl than is Xl'. In fact, as shown in par. 3.3, the variance corre-
sponding to Xl" is less than the variance corresponding to Xl'.
3.2.2 L=3 strata. The first approximations Xl' and X2' are given by the
roots of the following equations (cf par. 3.2.1.)
1
2(1 - e- u ) =- 2
3
(40)
2
2(1 - e- u ) = - 2
3
giving xl' =0.81 and xl =2.20. Using these values, we carry through the com-
Downloaded by [Northwestern University] at 22:07 20 December 2014
putations necessary to obtain x-", X2", and the variances for these values. The
results are shown in Table 96.
3.2.3 L>3 strata. Similar calculations have been carried out for L=4 and
L = 5 strata. As they are entirely analogous to those reported above, no details
will be given. The results are summarized in Table 96.
TABLE 96
FIRST AND SECOND APPROXIMATION TO OPTIMUM POINTS OF
STRATIFICATION OF !(x)=e-tll FOR £=2· . ·5 STRATA AND
THE ASSOCIATED VARIANCES
Points of stratification
Number
of First appro = Xl' X2' Xa' x.' Variances
strata
Second appro = Xl" X2" :Ca" X,"
TABLE 97
FIRST AND SECOND APPROXIMATION TO OPTIMUM POINTS OF
STRATIFICATION OF f(x) =X e-': FOR L=2 .. ·5 STRATA
AND THE ASSOCIATED VARIANCES
Points of stratification
Number
of First appro = Xl' X2' xa' xl Variances
strata
Second appro = Xl" X2" X3" x,/1
Io(u) = ¢(u) )
ll(u) = - feu) (42)
12(u) = Io(u) + uI1(u)
98 AMERICAN STATISTICAL ASSOCIATION JOURNAL, MARCH 1959
TABLE 98
FIRST AND SECOND APPROXIMATION TO OPTIMUM POINTS OF
STRATIFICATION OF f(x) =2",(x), 0 :S:x, FOR L =2' .. 5
STRATA AND THE ASSOCIATED VARIANCES
Points of stratification
Number
of First appro = x,' X2' xa' x.' Variances
strata
Second appro = XI" $2" Xa" X/'
Downloaded by [Northwestern University] at 22:07 20 December 2014
For the research reported by Dalenius [3, Ch. 7], a very sizable amount of
computation was performed in order to find approximations to MVS. These
computations, performed by a professional computer of outstanding capacity,
were of a "trial-and-error" nature.
l\!INIMUM VARIANCE STRATIFICATION 99
TABLE 99
FIRST AND SECOND APPROXIMATION TO OPTIMUM POINTS OF
STRATIFICATION OF f(x) =2(1-x) FOR L=2 . . . 5 STRATA
AND THE ASSOCIATED VARIANCES
Points of stratification
Number
of First appro = Xl' X2' Xa' x/ Variances
strata
Second appro = Xl" X2" Xa" X4"
Il(x'L) - Il(x'L_l) = C
Il(x'L-l) - I l(x'L_2) = c
(44)
Il(x'l) - Il(x' 0) = C
where Il(Xh') = Il(u) for U=Xh'. Adding the L equations we easily get
1 1
C = -Il(x'L) = -·H (45)
L L
as the value of C. From this, we derive the following result
Ilex'L-l) = L ~ 1. H)
(46)
L-2
I l(x'L_2) = ---·H
L
100 AMERICAN STATISTICAL ASSOCIATION JOURNAL, MARCH 1959
etc., from which it is easy to compute the set [Xh'] satisfying this rule of thum b
Using this set [Xh'], the variances may be computed and compared with
those arrived at by the approximate rule presented in this paper.
Such comparisons have been made for f(x) =e-'" for L=2, 3, 4 and 5 strata.
Thus, for e.g. L=5, we get the set [Xh'] by solving the equations
The solution Xl' • • • X4' to these equations is given below together with the
corresponding first approximations derived by means of the technique pre-
sented in this paper.
TABLE 100a
COMPARISON OF POINTS OF STRATIFICATION OF f(x) =e-'" FOR L=2· .. 5
STRATA AS COMPUTED BY THE RULES GIVEN IN PAR. 1.2.3 AND PAR. 2.2
0.00 0.00
0.82 0.45
1.38 1.02
2.02 1.83
2.99 3.22
00
TABLE 100b
COMPARISON OF VARIANCES IN STRATIFIED SAMPLING FROM f(x) =e-':
FOR L=2··· 5 STRATA AS COMPUTED BY THE
RULES GIVEN IN PAR. 1.2.3 AND PAR. 2.2
2 0.3079 0.2877
3 0.1556 0.1345
4 0.0950 0.0777
5 0.0639 0.0503
MINIMUM VARIANCE STRATIFICATION 101
8.2 Comparisons with equidistant stratification
Equidistant stratification is applicable only to a density with a finite range.
In order to throw some light on its efficiency, it has been applied to f(x)
=2(I-x); the resulting variances are compared in Table 101 with those result-
ing from using the first approximations derived by means of the technique given
in par. 2.2..
TABLE 101
COMPARISON OF VARIANCES IN STRATIFIED SAMPLING FROM
1(:1;) =2(1-:1;) FOR £=2· .. 5 STRATA AS COMPUTED BY
THE RULES GIVEN IN PAR. 1.2.4 AND PAR. 2.2
Downloaded by [Northwestern University] at 22:07 20 December 2014
2 0.0181 0.0152
3 0.0091 0.0069
4 0.0046 0.0044
5 0.0038 0.0029
9. ACKNOWLEDGMENTS
The authors are indebted to the referees for valuable comments on a previous
version of this paper.
REFERENCES
[1] Aoyama, H., "A study of the stratified random sampling." Annals 01 Mathematical
Statistice, Vol. VI, No.1 (1954) pp. 1-36.
[2] Dalenius, T., "The problem of optimum stratification." Skandinavisk Aktuarietid-
skrift, 3-4 (1950) pp, 203-13.
[3] Dalenius, T., Sampling in Sweden. Contributions to the Methods and Theories of Sample
Survey Practice, Stockholm: Almqvist och Wiksell, 1957.
[4] Dalenius, T. and Gurney, M., "The problem of optimum stratification II." Skandi-
navisk Aktuarietidskrift, 3-4 (1951) pp. 133-48.
[5] Dalenius, T. and Hodges, J. L., Jr., "The choice of stratification points." Skandinavisk
Aktuarietidskrift, 3-4 (1957) pp. 198-203.
[6] Hansen, M. H., Hurwitz, W. G. and Madow, W. G., Sample Survey Methods and
Theory, Vol. 1. New York: John Wiley & Sons, Inc. 1953.
[7] Kitagawa, T., "Some contributions to the design of sample survey." Sankhyi'i, Vol. 17,
Part 1 (1956) pp. 1-36.
[8] Mahalanobis, P. C., "Some aspects of the design of sample surveys." Sankhyi'i, Vol.
12, Parts 1 and 2 (1952) pp. 1-7.
[9] Strecker, H., Moderne Methoden in der Agrarstatistik. Wiirzburg: Physica-Verlag,
1957.
[10] Zindler, H.-J., "fiber Faustregeln zur optimalen Schichtung bei Normalverteilung."
Allgemeines Statistisches Archiv, 40. Band (1956) pp. 168-73.