You are on page 1of 27

# Variance Estimation in

Complex Surveys
Drew Hardin
Kinfemichael Gedif
So far..

Variance for estimated mean and total
under
SRS, Stratified, Cluster (single, multi-stage), etc.
Variance for estimating a ratio of two
means under
SRS (we used linearization method)

What about other cases?
Variance for estimators that are not linear
combinations of means and totals
Ratios
Variance for estimating other statistic from
complex surveys
Median, quantiles, functions of EMF, etc.

Other approaches are necessary

Outline
Variance Estimation Methods
Linearization
Random Group Methods
Balanced Repeated Replication (BRR)
Resampling techniques
Jackknife, Bootstrap
Adapting to complex surveys
Hot research areas
Reference
Linearization (Taylor Series
Methods)
We have seen this before (ratio estimator
and other courses).
Suppose our statistic is non-linear. It can
often be approximated using Taylors
Theorem.
We know how to calculate variances of
linear functions of means and totals.
Linearization (Taylor Series
Methods)
Linearize

Calculate Variance

| |
)

(

)

,...,

(
2
) ,... ( 1
2
) ,... (
1
1
1 1
j i
j
j i
i
k t t
k
t t k
t t Cov
t
h
t
h
t V
t
h
t V
t
h
t t h V
k k
(
(

c
c
(

c
c
+
(

c
c
+ +
(

c
c
~

=

c
c
+ =
k
j
j j t t t
j
k k t t
c
c c c c h
t t t h t t t t h k
k
1
,.. , 2 1 3 2 1 )

(
) ,...., , , (
) ,..., , ( )

,...,

( 2 1
3 2 1
Linearization (Taylor Series)
Methods
Pro:
Can be applied in general sampling designs
Theory is well developed
Software is available
Con:
Finding partial derivatives may be difficult
Different method is needed for each statistic
The function of interest may not be expressed a
smooth function of population totals or means
Accuracy of the linearization approximation

Random Group Methods
Based on the concept of replicating the survey
design
Not usually possible to merely go and replicate
the survey
However, often the survey can be divided into R
groups so that each group forms a miniature
versions of the survey
Random Group Methods

1 2 3 4 5 6 7 8 Stratum 1
1 2 3 4 5 6 7 8 Stratum 2
1 2 3 4 5 6 7 8 Stratum 3
1 2 3 4 5 6 7 8 Stratum 4
1 2 3 4 5 6 7 8 Stratum 5
Treat as miniature sample
Unbiased Estimator (Average of Samples)

Slightly Biased Estimator (All Data)
1
)
~

(
1
)
~
(

1
2
1

u u
= u
=
R R
V
R
r
r
1
)

(
1

1
2
2

u u
=
=
R R
V
R
r
r
Random Group Methods
Pro:
Easy to calculate
General method (can also be used for non smooth
functions)
Con:
Assumption of independent groups (problem when N
is small)
Small number of groups (particularly if one strata is
sampled only a few times)
Survey design must be replicated in each random
group (presence of strata and clusters remain the
same)

Resampling and Replication Methods
Balanced Repeated Replication (BRR)
Special case when n
h
=2
Jackknife (Quenouille (1949) Tukey (1958))
Bootstrap (Efron (1979) Shao and Tu (1995))
These methods
Extend the idea of random group method
Allows replicate groups to overlap
Are all purpose methods
Asymptotic properties ??
Balanced Repeated Replication
Suppose we had sampled 2 per stratum
There are 2
H
ways to pick 1 from each
stratum.
Each combination could treated as a
sample.
Pick R samples.
Balanced Repeated Replication
Which samples should we include?
Assign each value either 1 or 1 within the stratum
Select samples that are orthogonal to one another to
create balance
You can use the design matrix for a fraction factorial
Specify a vector o
r
of 1,-1 values for each stratum
Estimator

| |
2
1

) (

1
)

=
=
R
r
r BRR
R
V u o u u
Balanced Repeated Replication

Pro
Relatively few computations
Asymptotically equivalent to linearization methods for
smooth functions of population totals and quantiles
Can be extended to use weights
Con
2 psu per sample
Can be extended with more complex schemes
The Jackknife
SRS-with replacement
Quenoule (1949); Tukey (1958); Shao and Tu (1995)
Let be the estimator of u after omitting the i
th

observation
Jackknife estimate

Jackknife estimator of the

For Stratified SRS without replacement Jones (1974)
l
u

i i
n
i
i
J
n n n u u u u u

) 1 (

~
where /
~ ~
1
= =

=
= =

=
=

=
n
i
J
i
n
i
i
n
i
i
J
n n
n
n
n
V
1
2
1 1
2
)
~ ~
(
) 1 (
1
/

where )

(
1
)

(
u u
u u u u u
i
u

(u V
The Jackknife
stratified multistage design
In stratum h, delete one PSU at a time
Let be the estimator of the same form as
when PSU i of stratum h is omitted

Jackknife estimate:

Or using pseudovalues

) (

) 1 /( ) (
'
' '
hi hi
hi h h h
h h
h h
hi
y g where nh y y n W y W y = + =

=
u
) (

hi
u
u

= = = =
= =
=
L
h
n
i
L
h
n
i
hi
h
II
J
hi I
J
hi
h h
hi
h h
n L
n
n n
1 1 1 1
) ( ) ( ) ( ) (
) ( ) (
~ 1 1 ~
; /
~ ~

) 1 (

~
u u u u
u u u
The Jackknife
stratified multistage design
Different formulae for

Where

Using the pseudovalues
)

(u V

= =

=
h
n
i
method hi
L
h
h
h
L
n
n
V
1
2 ) (
1
)

(
) 1
)

( u u u
L n
L
h
h
L
h
hi h method
/

or , /

be can

1
) (
1
) ( ) (

= =
u u u u u
II I j
n
n
V
h
n
i
j
J
hi
L
h
h
h
L
, )
~ ~
(
) 1
)

(
1
2
) (
) (
1
=

=

= =
u u u
The Jackknife
Asymptotics
Krewski and Rao (1981)
Based on the concept of a sequence of finite populations
with L strata in

Under conditions C1-C6 given in the paper

Where method is the estimator used (Linearization, BRR, Jackknife)
{ }

=
[
1 L L
[
L
) 1 , 0 (
)

)
)

( )
) , 0 ( )

( )
2
2 2 / 1
N
V
T iii
nV ii
N n i
d
method
method
method
d

u
u u
o u
o u u
The Bootstrap
Nave bootstrap
Efron (1979); Rao and Wu (1988); Shao and Tu (1995)
Resample with replacement in stratum h

Estimate:

Variance:

Or approximate by

The estimator is not a consistent estimator of the
variance of a general nonlinear statistics
{ }
h
n
i hi
y
1
*
=
B b
y g and y y y n y
b
h
b
h
b
i
b
hi h
b
h
,..., 2 , 1
) (

, ,
* ) *( ) *( ) *( ) *(
1
) *(
=
= = =

u
| |
2 *
*
*
*
*
))

( )

u u u E E V
NBS
=

=
B
b
b
B
V
NBS
1
. * ) *( * *
)

(
1
1
)

u u u
The Bootstrap
Nave bootstrap
For

Comparing with

The ratio does not converge to 1for a
bounded n
h

* * *

y y W
h h
= =

u
2
2
*
1
) (
h
h
h
h
s
n
n
n
W
y Var
h

|
|
.
|

\
|

=
2
2
) (
h
h
s
n
W
y Var
h

=
) (
) (
*
y Var
y Var
The Bootstrap
Modified bootstrap
Resample with replacement in stratum h
Calculate:

Variance:
Can be approximated with Monte Carlo
For the linear case, it reduces to the customary
unbiased variance estimator
m
h
< n
h
{ } 1 ,
1
*
>
= h
m
i hi
m y
h
)
~
(
~
,
~ ~
, /
~ ~
) (
) 1 (
~
1
*
2 / 1
2 / 1
y g y W y m y y
y y
n
m
y y
h
m
i
L
h
h h hi h
hi
h
h
h hi
h
= = =

+ =

=
u
| |
2 *
*
*
*
* *
))
~
(
~
( )
~
(

u u u E E V
MBS
=
More on bootstrap
The method can be extended to stratified srs
without replacement by simply changing

For m
h
=n
h
-1, this method reduces to the nave BS
For n
h
=2, m
h
=1, the method reduces to the
random half-sample replication method
For n
h
>3, choice of m
h
see Rao and Wu (1988)

) )( 1 (
) 1 (
~
to
~
*
2 / 1
2 / 1
h hi h
h
h
h hi hi
y y f
n
m
y y y

+ =
Simulation
Rao and Wu (1988)
Jackknife and Linearization intervals gave
substantial bias for nonlinear statistics in one sided
intervals
The bootstrap performs best for one-sided intervals
(especially when m
h
=n
h
-1)
For two-sided intervals, the three methods have
similar performances in coverage probabilities
The Jackknife and linearization methods are more
stable than the bootstrap
B=200 is sufficient
Hot topics
Jackknife with non-smooth functions (Rao
and Sitter 1996)
Two-phase variance estimation (Graubard
and Korn 2002; Rubin-Bleuer and Schiopu-
Kratina 2005)
Estimating Function (EF) bootstrap method
(Rao and Tausi 2004)
Software
OSIRIS BRR, Jackknife
SAS Linearization
Stata Linearization
SUDAAN Linearization, Bootstrap, Jackknife
WesVar BRR, JackKnife, Bootstrap
References:
Effron, B. (1979). Bootstrap methods: Another look at the jackknife. Annals of
statistics 7, 1-26.
Graubard, B., J., Korn, E., L. (2002). Inference for supper population parameters
using sample surveys. Statistical Science, 17, 73-96.
Krewski, D., and Rao, J., N., K. (1981). Inference from stratified samples: Properties
of linearization, jackknife, and balanced replication methods. The annals of statistics.
9, 1010-1019.
Quenouille, M., H.(1949). Problems in plane sampling. Annals of Mathematical
Statistics 20, 355-375.
Rao, J.,N.,K., and Wu, C., F., J., (1988). Resampling inferences with complex survey
data. JASA, 83, 231-241.
Rao, J.,N.,K., and Tausi, M. (2004). Estimating function variance estimation under
stratified multistage sampling. Communications in statistics. 33:, 2087-2095.
Rao, J. N. K., and Sitter, R. R. (1996). Discussion of Shaos paper.Statistics, 27, pp.
246247.
Rubin-Bleuer, S., and Schiopu-Kratina, I. (2005). On the two-phase framework for
joint model and design based framework. Annals of Statistics (to appear)
Shao, J., and Tu, (1995). The jackknife and bootstrap. New York: Springer-Verlag.
Tukey, J.W. (1958). Bias and confidence in not-quite large samples. Annals of
Mathematical Statistics. 29:614.
Not referred in the presentation
Wolter, K. M. (1985) Introduction to variance estimation. New York: Springer-Verlag.
Shao, J. (1996). Resampling Methods in Sample Surveys. Invited paper, Statistics,
27, pp. 203237, with discussion, 237254.