You are on page 1of 6

5

APPLICATION OF MULTI-PHASE SAMPLING AND


SUCCESSIVE SAMPLING IN SAMPLE SURVEYS
Kaustav Aditya
Indian Agricultural Statistics Research Institute, New Delhi-110012

5.1 MULTI-PHASE SAMPLING


The procedure called double sampling or two-phase sampling is typically employed in
the following situation. There exists a procedure, relatively cheap to implement, that
produces a vector of observations denoted by x. The vector x is correlated with the
characteristics of interest, where the vector of interest is denoted by y. It is very
expensive to make determinations on y. In the most popular form of two-phase
sampling, a relatively large sample is selected and x determined on this sample. This
sample is called the first phase sample or phase I sample. Determinations for the
vector y are made on a subsample of the original sample. The subsample is called the
second phase sample or phase 2 sample. In the form originally suggested by Neyman
(1938), the original sample was stratified on the basis of x and the stratified estimator
for y constructed using the estimated stratum sizes estimated with the phase 1 sample.
We first describe this particular, and important, case of two-phase sampling. We
simplify the discussion by considering scalar y. Double sampling can be used both
with ratio or regression estimation technique and stratified sampling for better
precession.
The general procedure for both double sampling with the ratio estimator and for
double sampling with the regression estimator is identical. Contrary to double
sampling for stratification where a categorical variable is observed in the first phase, it
is usually metric variables that serve as ancillary variables when double sampling with
the ratio or regression estimator is being used. In the first phase, a sample of size 'n' is
taken to estimate the mean or total of the auxiliary variable X. The sample taken is
usually large because measurement of X is cheap, fast and easy. In the second phase, a
sample is selected on which both target and ancillary variable are observed; from
these pairs of observations, a relationship between the two variables can be
established, either a ratio or a regression. The second phase sample is usually small
because the observation of Y is usually more expensive, difficult and time consuming.
Then, the observations from the first phase are used to estimate the total and mean of
the target variable for the entire area of interest.
In both approaches, dependent or independent phases are possible and the
corresponding estimators need to be used. It is interesting to note, that double
sampling is also interesting in context of Sampling with partial replacement (SPR)
that is a very efficient technique to estimate changes.

5.1

NOTATONS
N

Total number of samples in the entire area of interest;

Number of samples in the first phase;

Number of samples in the second phase;

ymdr

Estimated mean of target variable Y from the ratio estimator for entire area;

ymdreg Estimated mean of target variable Y from regression estimator for entire area;
x

Estimated mean of ancillary variable X in the first phase:

Estimated mean of ancillary variable X in the second phase;

Estimated mean of target variable Y in the second phase;

yi

i-th Observed value of target variable Y;

Estimated ratio of the ratio estimator

Estimated slope coefficient of regression estimator;

s2 y

Estimated variance of the target variable Y;

s 2 x

Estimated variance of ancillary variable X in the first phase;

s xy

Estimated covariance of Y and X in the second phase;

Estimated coefficient of correlation of Y and X.

For the ratio estimator, the mean of the target variable is estimated as,
ymdr

y
x rx
x

with an estimated variance of the estimated mean as,

V ( ymdr )

s 2y r 2 sx2 2rsxy
n

2rsxy r 2 sx2
n

s 2y
N

And for the regression estimator, the mean is estimated as,

ymdreg y b( x x )
with an estimated variance of the estimated mean as,
V ( ymdreg )

s 2y n n 2
1
n
n

5.2

Examples:
1. Aerial photographs or satellite images are used to measure the ancillary
variable, for example percentage crown cover. In the second phase, field plots
are selected to measure the target variable such as volume or biomass per ha
and the ancillary variables. Thus, a regression can be established which allows
to predict the target variable once the ancillary variable is known. In many
cases, this regression, however, is not very strong so that the
overall precision that can be achieved is moderate. One of the main issues and
source of errors in this example is the accuracy of co-registration
between remote sensing imagery and [sample plot/field plots].
2. This example is on the estimation of leaf area of a tree, as, for example,
needed to determine the leaf area index. Here, leaf area is difficult to measure;
it is much easier to observe leaf weight. Therefore, a regression is established
in the second phase that allows predicting leaf area from leaf weight; a sample
of leaves is taken in the second phase sample of which both leaf area and leaf
weight are determined. In order to apply this regression, the mean (or total)
leaf weight needs to be determined: for this purpose, a large sample is taken in
the first phase. In this example, a major issue is the sampling frame for the
first phase sample, that needs to be carefully defined (or a sampling
technique is applied that does not require the a-priori definition of the
sampling frame such as randomized branch sampling).

5.2 SUCCESSIVE SAMPLING


Surveys often gets repeated on many occasions (over years or seasons) for estimating
same characteristics at different points of time. The information collected on previous
occasion can be used to study the change or the total value over occasion for the
character and also in addition to study the average value for the most recent occasion.
For example in milk yield survey one may be interested in estimating the
1. Average milk yield for the current season,
2. The change in milk yield for two different season and
3. Total milk production for the year.
The successive method of sampling consists of selecting sample units on different
occasions such that some units are common with samples selected on previous
occasions. If sampling on successive occasions is done according to a specific rule,
with partial replacement of sampling units, it is known as successive sampling. The
method of successive sampling was developed by Jessen (1942) and extended by
Patterson (1950) and by Tikkiwal (1950, 53, 56, 64, 65, 67) and also Eckler (1955).
Singh and Kathuria (1969) investigated the application of this sampling technique in
the agricultural field. Hansen et al. (1955) and Rao and Graham (1964) have
discussed rotation designs for successive sampling. Singh and Singh (1965), Singh
(1968), Singh and Kathuria (1969) have extended successive sampling for many other
sampling designs.

5.3

Generally, the main objective of successive surveys is to estimate the change with a
view to study the effects of the forces acting upon the population. For this, it is better
to retain the same sample from occasion to occasion. For populations where the basic
objective is to study the overall average or the total, it is better to select a fresh sample
for every occasion. If the objective is to estimate the average value for the most recent
occasion, the retention of a part of the sample over occasions provides efficient
estimates as compared to other alternatives. One important question arises in the
context of devising efficient sampling strategies for repetitive surveys is whether the
same sample is to be surveyed on all occasions, or fresh samples are to be chosen on
each of the occasions; in what manner the composition of the sample is changed from
occasion to occasion.
The answer depends on, apart from field difficulties, the specific problems of
estimation at hand. For instance if the aim is to estimate only the difference between
the item mean on the current ( y ) and on the previous ( x ) occasion, then the sample
on both the occasion would give rise to a better estimate than the independent samples
since the variance of the estimate in the former case viz,
V ( y x ) = V ( y ) + V ( x ) 2COV ( y , x ) < V ( y ) + V ( x ),
as y and x are highly correlated so that Cov ( y , x ) >0 .
On the contrary, for estimating the average of the means the latter would be better
than the former in that
V ( y x ) = V ( y ) + V ( x ) + 2Cov ( y , x ) > V ( y ) + V ( x ),
But, if the difference between the means and also their average are to be estimated
simultaneously, clearly neither of this alternatives are desirable ,hence arises the idea
of retaining a part (say Sc) of the previous sample (say S1) and supplement it by a set (
say Sf) of fresh units on the current occasion, and the data retaining to x on , x and y
on, and y on Sc, Sf and S build up the optimum estimator of Y so that it ,together with
the estimate of X , would give rise to efficient result for difference between Y and
X ,and also their average. The question then would be that big or small the set of
common units or fresh units should be for the surveys on the current occasion, how
these samples should be chosen and what procedure is employed for working out
estimates. The entire question is interrelated and depends ultimately on the regression
of y on x. It is known that regression of y on x is linear with significant intercepts then
we may choose from by SRS without replacement and then employ regression
estimator, or when the intercept is not significant the sample may be chosen by SRS
and ratio estimator be employed.

5.2 SAMPLING ON TWO SUCCESSIVE OCCASIONS


It is assumed that the survey population remains unaltered from occasion to occasion.
For the purpose of generality, let the sample size for the first occasion be n1 and for
second occasion be, n2=n12+n22, where n12 is the number of common units between
the 1st and the 2nd occasion and n22 units to be drawn afresh on the second occasion.
The data obtained on current (i.e. 2nd in this case) occasion would be denoted by y
5.4

and that on the previous occasion (i.e. 1st in this case) by x. Now the sampling
procedure consists of the following steps:
1. From the given survey population choose a sample S1 of size n1 units by SRS
without replacement for survey on the first occasion.
2. On the second occasion choose a set Sc of n12 units from the sample taken at
step (1) either by SRS or PPS sampling depending on the situation at hand
and supplement it to another set Sf of n22 units taken independently from the
unsurveyed (N- n1) units of the population by SRS without replacement so
that the total sample S2 on the second occasion comprises n2 units. Now S1
acts as a preliminary sample.
3. The unbiased estimator of Y based on y and x values of Sc and x values of S1
would be given as,

1
tc
n12

n12

yj

j 1

, pj

xj
n12

xj
j 1

with variance,
2

yj

Pj nP Y
2
S y j 1 j
S y2

V(tc )

n1
n12
N
N

Also in view of selection of Sf as noted in the step (2), the unbiased estimator of Y is,

yf

1
n22

n22

y j with variance as,


j 1

1
1
V yf
S y2
n22 N
Further, tc and y f are correlated so,

COV (tc , y f )

1 2
Sy
N

So in this sampling on two successive occasion, the best minimum variance combination
of tc and y f will be,
yss atc (1 a) y f , where, a

Vf
Vc V f

5.5

with variance,
V (yss )

VcV f
Vc V f

COV(tc , y f );

where,V f V ( y f ) COV(tc , y f ) and Vc V (tc ) COV(tc , y f )

REFERENCES
Eckler, A. R. (1955). Rotation Sampling. American Statistician, 26: 664-685.
Jessen, R. J. 91942). Statistical Investigation of a sample survey for obtaining farm
facts. Iowa Agricultural Experiment Station Research Bulletin No. 304.
Neyman, J. (1938). Contribution to the theory of sampling human populations.
Journal of American Statistical Association, 33,101-116.
Parzen, E (1959). Statistical Inference on Time Series boy Hilbert Space Methods I.
Technical report No. 23, Department of Statistics, Stanford University.
Parzen, E (1961). An approach to time series analysis. Annals of Mathematical
Statistics, 32, 951-989.
Rao, C. R. (1952). Some theorems on Minimum Variance Unbiased Estimation.
Sankhya, Sr. A, 12, 27-42.
Rao, J. N. K. and Graham, J.E. (1964). Rotation designs for sampling on repeated
occasions, Journal of American Statistical Association, 59, 492-509.
Tikkwal, B. D. (1951): Theory of Successive Sampling. Unpublished Thesis for
Diploma, I.C.A.R., New Delhi.
Yates, F. (1949): Sampling Methods for Censuses and Surveys. Charles Griffin &
Company LTD., London.

5.6

You might also like