0 Up votes0 Down votes

456 views25 pages25 pages from unknown statistical book

Oct 23, 2012

© Attribution Non-Commercial (BY-NC)

PDF, TXT or read online from Scribd

25 pages from unknown statistical book

Attribution Non-Commercial (BY-NC)

456 views

25 pages from unknown statistical book

Attribution Non-Commercial (BY-NC)

- ProblemSet1 Correction
- 1
- Question Paper 1 NHTET April 2017
- notes0102.doc
- Research Methodology 1
- teachers
- Mca4020 Slm Unit 11
- 2nd COT Lesoon Plan.docx
- Access to Health Care in Relation to Socioeconomic Status in the Amazonian Area of Peru
- work study.pptx
- PEDAGOGICAL CONTENT KNOWLEDGE (PCK) AND TEACHER EFFECTIVENESS IN GEOGRAPHY TEACHING IN RESPECT OF EXPERIENCE AND QUALIFICATION; A COMPARATIVE STUDY.
- Project Report
- how to calculate sample size.pdf
- how to calculate sample size.pdf
- MBABusiness Research Methods
- Estimation 2
- Ang-Tagnau-converted.docx
- Research
- Introduction 3 Later Latest Edition Neww 999 Nhasi
- PSAI Lecture

You are on page 1of 25

In general, we want the target and study populations to be the same. When they are not

the same, the researcher must be careful to ensure that conclusions based on the sample

results can be applied to the target population.

Because of restrictions such as cost or scheduling conicts, it is often impossible to collect

a simple random sample or a stratied simple random sample. In many cases, however,

it may be possible to dene a sampling frame that does not correspond with the target

population or the study population and still obtain statistically valid estimates.

Cluster sampling, and, specically, systematic sampling are examples when a dierence

between the target population and the sampling frame occurs. Despite the dierence, if

executed properly, conclusions based on the sample results from these sampling designs

can be applied to the target population.

Situation: A population contains M population units. The set of M units is partitioned

into N disjoint groups of population units called primary units. The population units

containing the primary units are called secondary units.

The primary units may be of dierent sizes. That is, the numbers of secondary units in

the primary units are not all the same.

Think of these disjoint groups of population units as strata. Suppose we dene the sam-

pling frame as a set of strata. Then, the sampling units in this sampling frame are not

individual units in the population . The sampling units are clusters of population units. In

this case, the sampling frame does not correspond with the units of the target population

or the study population.

Thus, whenever any secondary unit of a primary unit is included in the sample, all the

other secondary units in that primary unit will also be included in the sample.

That is, the primary units in the sampling frame are strata. The number of strata is

usually large while each stratum contains only a small number of population (secondary)

units. Note: the population has M individual units but the sampling frame has only N

primary sampling units corresponding the number of clusters (or strata) formed.

The responses from the secondary population units are not analyzed individually, but are

combined with all other secondary units that are in the same cluster. Therefore, there are

only N possible y values (not M). The researcher hopes that reducing the population of

size M to a sampling frame containing only N sampling units is oset by the practical

conveniences (such as reduced cost) that this type of sampling frame can oer.

6.1 One-Stage Cluster Sampling

What is the dierence between this type of cluster sampling and stratied sampling?

In stratied sampling, we take a subset of population sampling units within each

stratum to form the sample.

In cluster sampling, we take a subset of strata as the primary sampling units.

88

When the strata themselves are the primary sampling units, the strata are called clusters.

The selection of a sample of clusters to provide a sample of population units is called

cluster sampling.

If all of the population units in every selected cluster are in the sample, then this is known

as one-stage cluster sampling.

When a cluster is dened as a group of population units, the clusters are called the

primary units. Subgroups within primary units are called secondary units. For one-

stage cluster sampling, the secondary units are the individual population units.

A one-stage cluster sample

with N = 50 primary units each having 8 secondary units

in a population containing M = 400 secondary units.

If the selection of the population units within every selected cluster is restricted a second

time, then this technique is known as subsampling or two-stage cluster sampling.

For example, we may take a SRS of secondary units within each primary unit. This will

be discussed later in Section 7 of the course notes.

If a sample of primary units (Stage 1) is selected, followed by a selection of secondary

units (Stage 2) within the sample of primary units, followed by a selection of tertiary units

(Stage 3) within the sample of secondary units, and so on, then the sampling procedure

is known as multistage cluster sampling.

In cluster sampling, the size of the cluster can also be used as an auxiliary variable to

select clusters with unequal sampling probabilities or used in a ratio estimator.

89

Stratied sampling vs cluster sampling:

A researcher will use a stratied sampling design because of its potential to produce

an ecient (less variable) estimator of a population characteristic. It will, in general,

be more expensive to collect data for a stratied sample than for a cluster sample.

A researcher will use cluster sampling because of its administrative convenience. That

is, cluster sampling can signicantly reduce sampling costs at the expense of a less

ecient estimator of a population characteristic.

Notation used in one-stage cluster sampling:

N = the number of clusters (primary units in the population )

M

i

= number of secondary units in cluster i

M =

N

i=1

M

i

= the number of secondary units in the population

y

ij

= the y-value associated with secondary unit j in cluster i

y

i

=

M

i

j=1

y

ij

= cluster i total y

i

=

y

i

M

i

= cluster i mean

s

2

u

=

n

i=1

(y

i

y)

2

n 1

= the sample variance of cluster totals

=

N

i=1

M

i

j=1

y

ij

=

N

i=1

y

i

= population total

=

1

M

N

i=1

M

i

j=1

y

ij

=

M

= population mean of the secondary units

1

=

1

N

N

i=1

y

i

= population mean of the cluster totals (mean of the primary units)

2

u

=

N

i=1

(y

i

1

)

2

N 1

= the population variance of cluster totals

6.2 Equal Sized Clusters

Suppose that each of the N clusters have the same number L of secondary units (M

1

=

M

2

= = M

N

= L). Then, M = NL.

Suppose a SRS of n clusters (primary units) is taken. Then the total number of secondary

units selected is m = nL.

There is a total of

_

N

n

_

possible one-stage cluster samples and each one has the same

probability of being selected.

Thus, the probability of selecting any particular one-stage cluster sample =

1

_

N

n

_

.

90

Figure 7a: Cluster Sampling Example for the Longleaf Pin Data

The total abundance = 584. There are M = 400 secondary units and N = 100 primary units

(clusters) of size M

i

= 4.

1 1 1 1 1 2 1 0 0 0 4 5 0 1 0 1 2 1 0 1

3 2 1 0 1 0 0 0 1 2 2 2 0 2 2 2 0 2 0 1

7 4 1 1 1 1 0 0 0 2 2 0 4 3 2 4 2 1 2 2

0 1 2 0 0 0 0 0 4 6 5 1 5 0 0 0 2 1 2 0

1 1 0 2 3 2 0 0 2 1 3 1 4 1 1 1 2 2 1 1

2 0 0 0 4 3 3 0 1 16 5 0 1 3 8 0 0 1 3 3

0 0 1 14 3 3 1 2 0 8 0 2 0 3 9 0 4 2 1 0

0 0 5 1 8 7 6 6 6 1 0 4 0 0 1 2 2 0 1 2

0 0 2 2 3 2 2 3 1 1 1 3 0 0 2 2 0 3 4 0

0 0 0 0 1 0 3 1 1 1 2 0 2 0 2 0 2 1 1 0

1 8 7 7 8 0 5 0 1 0 1 2 0 0 2 4 2 2 2 4

0 9 1 0 0 1 1 1 0 0 0 1 2 4 0 2 1 3 3 1

0 0 0 1 0 2 4 3 1 2 2 0 0 1 1 2 2 0 2 4

0 1 0 0 1 2 0 2 3 5 2 0 0 2 1 1 2 0 1 3

1 0 0 1 1 0 0 0 2 2 2 1 1 1 0 0 2 0 0 0

0 2 0 2 2 0 1 1 0 2 0 0 1 0 0 1 1 1 5 3

0 0 0 3 2 1 0 0 0 0 0 2 1 0 1 1 1 3 1 2

1 0 0 1 0 3 0 1 0 0 2 1 2 0 0 0 1 1 1 0

0 0 0 0 0 0 0 1 1 1 0 1 0 3 0 2 0 1 1 0

2 0 0 0 0 0 0 0 1 2 0 1 3 0 0 1 0 1 2 4

Figure 7b: Cluster Sampling Example for a Spatially Correlated Population

The abundance counts show a strong diagonal spatial correlation. The total abundance = 13354. There are

M = 400 secondary units and N = 50 primary units clusters of size M

i

= 8.

18 20 15 20 20 15 19 18 24 23 20 26 29 28 28 31 31 34 28 32

13 20 16 20 15 23 19 26 21 21 24 30 23 26 25 33 31 28 32 38

16 18 20 24 25 26 22 23 26 26 22 27 25 25 34 28 37 36 38 31

17 17 16 22 21 23 22 27 27 24 28 32 29 33 27 37 37 38 35 33

15 19 23 17 21 23 21 23 24 25 31 26 32 34 32 33 31 31 36 37

21 24 20 21 28 26 30 22 31 25 29 29 27 30 29 37 35 32 38 43

23 17 24 25 24 27 31 29 31 34 27 36 29 29 34 39 37 37 40 36

18 24 21 25 27 22 32 32 31 26 28 34 34 37 35 34 38 38 37 40

22 26 28 26 24 29 33 26 27 27 34 31 39 32 36 38 37 40 44 43

23 27 28 29 26 32 25 31 35 34 32 33 37 32 42 40 40 37 42 44

23 21 31 23 30 27 31 30 32 35 30 40 32 37 37 36 40 44 44 40

26 29 31 26 30 31 34 36 30 38 36 32 38 38 37 42 42 41 40 49

28 24 28 27 26 31 32 29 32 33 38 34 39 38 40 37 41 43 42 43

32 25 31 32 29 29 35 38 38 32 36 35 39 42 39 40 44 42 41 45

27 29 35 28 35 35 31 40 35 37 38 44 40 40 47 39 49 48 51 49

30 29 32 32 33 30 36 38 42 36 35 38 44 47 45 49 41 43 44 51

28 35 35 34 34 33 41 33 34 35 39 44 44 48 44 50 49 48 53 54

29 33 32 36 39 33 33 34 35 42 46 47 48 47 46 45 44 52 54 55

28 37 38 37 33 33 34 37 45 40 39 42 42 46 47 48 52 47 46 53

38 39 39 37 34 38 39 45 39 42 45 41 44 51 46 50 52 51 51 53

91

6.2.1 Estimation of , , and

1

The unbiased estimators of and are

cl

=

M

nL

n

i=1

L

j=1

y

ij

=

N

n

n

i=1

L

j=1

y

ij

=

N

n

n

i=1

y

i

= Ny (77)

cl

=

1

nL

n

i=1

L

j=1

y

ij

=

1

nL

n

i=1

y

i

=

y

L

=

cl

M

(78)

where y =

1

n

n

i=1

y

i

=

cl

N

= is the sample mean of the cluster totals.

Next, we want to study the variances of these estimators:

var(

cl

) = N(N n)

2

u

n

var(

cl

) =

N(N n)

M

2

2

u

n

(79)

where

2

u

=

N

i=1

(y

i

1

)

2

N 1

is the variance of the N cluster y

i

totals. Taking a square

root of the true variances in (79) yields the standard deviations of the estimators.

Because

2

u

is unknown, we use the sample variance of the clusters: s

2

u

=

n

i=1

(y

i

y)

2

n 1

to get unbiased estimators of the variances:

var(

cl

) = N(N n)

s

2

u

n

var(

cl

) =

N(N n)

M

2

s

2

u

n

(80)

Taking the square root of the estimated variances in (80) yields the standard errors of

the estimators.

An unbiased estimator of the mean per primary unit

1

is

1

= y =

cl

N

.

The variance of

1

is var(

1

) =

1

N

2

Var(

cl

) with the estimated variance being obtained

by dividing the estimated variance of

cl

in (80) by N

2

. That is, var(

1

) =

N n

N

s

2

u

n

.

6.2.2 Condence Intervals for and

The condence intervals for and are:

cl

t

_

var(

cl

)

cl

t

_

var(

cl

) (81)

where t

is the upper /2 critical value from the t(n 1) distribution. Note that the

degrees of freedom are based on n, the number of primary units or sampled clusters (and

not on the total number of secondary units m = nL).

92

Figure 8a: Cluster Sampling Example for the Longleaf Pin Data

The total abundance = 584. There are M = 400 secondary units and N = 100 primary units

(clusters) of size L = 4. The sample contains n = 8 clusters. The secondary units sampled are in ( )

1 (1) 1 1 1 2 1 0 0 0 4 5 0 1 0 1 2 1 0 1

3 (2) 1 0 1 0 0 0 1 2 2 2 0 2 2 2 0 2 0 1

7 (4) 1 1 1 1 0 0 0 2 2 0 4 3 2 4 2 1 2 2

0 (1) 2 0 0 0 0 0 4 6 5 1 5 0 0 0 2 1 2 0

1 1 0 2 3 2 0 (0) 2 (1) 3 1 4 1 1 1 2 2 1 1

2 0 0 0 4 3 3 (0) 1 (16) 5 0 1 3 8 0 0 1 3 3

0 0 1 14 3 3 1 (2) 0 (8) 0 2 0 3 9 0 4 2 1 0

0 0 5 1 8 7 6 (6) 6 (1) 0 4 0 0 1 2 2 0 1 2

0 0 2 (2) 3 2 2 3 1 1 1 3 0 0 (2) 2 0 3 4 0

0 0 0 (0) 1 0 3 1 1 1 2 0 2 0 (2) 0 2 1 1 0

1 8 7 (7) 8 0 5 0 1 0 1 2 0 0 (2) 4 2 2 2 4

0 9 1 (0) 0 1 1 1 0 0 0 1 2 4 (0) 2 1 3 3 1

0 0 0 1 0 (2) 4 3 1 2 2 0 0 1 1 (2) 2 (0) 2 4

0 1 0 0 1 (2) 0 2 3 5 2 0 0 2 1 (1) 2 (0) 1 3

1 0 0 1 1 (0) 0 0 2 2 2 1 1 1 0 (0) 2 (0) 0 0

0 2 0 2 2 (0) 1 1 0 2 0 0 1 0 0 (1) 1 (1) 5 3

0 0 0 3 2 1 0 0 0 0 0 2 1 0 1 1 1 3 1 2

1 0 0 1 0 3 0 1 0 0 2 1 2 0 0 0 1 1 1 0

0 0 0 0 0 0 0 1 1 1 0 1 0 3 0 2 0 1 1 0

2 0 0 0 0 0 0 0 1 2 0 1 3 0 0 1 0 1 2 4

Figure 8b: Cluster Sampling Example for a Spatially Correlated Population

The abundance counts show a strong diagonal spatial correlation. The total abundance = 13354. There are

M = 400 secondary units and N = 100 primary units (clusters) of size L = 4. The sample contains n = 10

clusters. The secondary units sampled are in ( )

18 20 15 20 (20) 15 19 18 24 23 20 26 29 28 28 31 (31) 34 28 32

13 20 16 20 (15) 23 19 26 21 21 24 30 23 26 25 33 (31) 28 32 38

16 18 20 24 (25) 26 22 23 26 26 22 27 25 25 34 28 (37) 36 38 31

17 17 16 22 (21) 23 22 27 27 24 28 32 29 33 27 37 (37) 38 35 33

15 (19) 23 17 21 23 21 23 24 25 31 26 32 34 32 33 31 31 36 37

21 (24) 20 21 28 26 30 22 31 25 29 29 27 30 29 37 35 32 38 43

23 (17) 24 25 24 27 31 29 31 34 27 36 29 29 34 39 37 37 40 36

18 (24) 21 25 27 22 32 32 31 26 28 34 34 37 35 34 38 38 37 40

22 26 28 26 24 29 (33) 26 27 27 34 31 39 (32) 36 38 37 40 44 43

23 27 28 29) 26 32 (25) 31 35 34 32 33 37 (32) 42 40 40 37 42 44

23 21 31 23 30 27 (31) 30 32 35 30 40 32 (37) 37 36 40 44 44 40

26 29 31 26 30 31 (34) 36 30 38 36 32 38 (38) 37 42 42 41 40 49

28 24 28 27 26 31 32 (29) 32 33 38 34 39 38 40 37 (41) 43 42 43

32 25 31 32 29 29 35 (38) 38 32 36 35 39 42 39 40 (44) 42 41 45

27 29 35 28 35 35 31 (40) 35 37 38 44 40 40 47 39 (49) 48 51 49

30 29 32 32 33 30 36 (38) 42 36 35 38 44 47 45 49 (41) 43 44 51

28 35 35 34 (34) 33 41 33 34 35 39 44 44 48 44 (50) (49) 48 53 54

29 33 32 36 (39) 33 33 34 35 42 46 47 48 47 46 (45) (44) 52 54 55

28 37 38 37 (33) 33 34 37 45 40 39 42 42 46 47 (48) (52) 47 46 53

38 39 39 37 (34) 38 39 45 39 42 45 41 44 51 46 (50) (52) 51 51 53

93

6.2.3 Comparison to Simple Random Sampling

Because the variance formulas for

cl

and

cl

in (79) are determined only from the cluster-

to-cluster variability, the precision of the estimators can be improved by forming clusters

with small cluster-to-cluster variability.

Equivalently, we want to form clusters such that the y-values within each cluster are as

variable as possible but the y

i

values across clusters are as similar as possible.

We will compare var( ) from a SRS to var(

cl

) from a one-stage cluster sample.

Because

2

=

1

NL 1

N

i=1

L

j=1

(y

ij

)

2

, we have

(NL 1)

2

=

N

i=1

L

j=1

(y

ij

)

2

=

N

i=1

L

j=1

(y

ij

y

i

+ y

i

)

2

=

N

i=1

L

j=1

(y

ij

y

i

)

2

+ L

N

i=1

(y

i

)

2

= N(L 1)

2

+ L

N

i=1

(y

i

)

2

(82)

where

2

=

1

N

N

i=1

2

i

is the average within-cluster variance.

The sum in (82) is a weighted sum of within-cluster and cluster-to-cluster variabilities.

Let be the estimator of from a SRS (see Section 2 of the course notes). We use (82)

to compare the variance var( ) of the SRS estimator and the variance of the one-stage

cluster sample var(

cl

). After simplication, we get:

var( ) var(

cl

) =

N

2

(N n)(L 1)

nL(N 1)

_

2

_

(83)

If var( ) var(

cl

) > 0 (or, if

2

>

2

), then we say that

cl

is more ecient than for

estimating .

This result is also true for estimation of . That is, if var( ) var(

cl

) > 0, then the

one-stage cluster sample estimator

cl

would be more ecient than SRS estimator for

estimating .

Practically speaking, the one-stage cluster sample estimator will be more ecient than

the SRS estimator of or if the average within-cluster variability (

2

) is larger than the

population variance (

2

).

6.3 Relationship between Cluster Sampling Systematic Sampling

Systematic sampling is a sampling plan in which the sample population units are col-

lected systematically throughout the population. More specically, a single primary unit

consists of secondary units that are spaced in some systematic pattern throughout the

population.

94

Suppose the study area is partitioned into a 20 20 grid of 400 population units. A

systematic sample primary unit could consist of all population units that form a lattice

which are 5 units apart horizontally and vertically. In Figure 9a, N = 25 and L = 16. In

Figure 9b, each of the N = 50 primary units contains L = 8 secondary units.

Initially, systematic sampling and cluster sampling appear to be opposites because sys-

tematic samples contain secondary units that are spread throughout the population (good

global coverage of the study area) while cluster samples are collected in groups of close

proximity (good coverage locally within the study area).

Systematic and cluster sampling are similar, however, because whenever a primary unit

is selected from the sampling frame, all secondary units of that primary unit will be

included in the sample. Thus, random selection occurs at the primary unit level and not

the secondary unit level.

For estimation purposes, you could ignore the secondary unit y

ij

-values and only retain

the primary units y

i

-values. This is what we did with one-stage cluster sampling.

The systematic and cluster sampling principle: To obtain estimators of low variance,

the population must be partitioned into primary unit clusters in such a way that the

clusters are similar to each other with respect to the y

i

-values (small cluster-to-cluster

variability).

95

This is equivalent to saying that the within-cluster variability should be as large as possible

to obtain the most precise estimators. Thus, the ideal primary unit is representative of

the full diversity of y

ij

-values within the population.

With natural populations of spatially distributed plants, animals, minerals, etc., these

conditions are typically satised by systematic primary units (and are not satised by

primary units with spatially clustered secondary units).

6.4 Using Proc Surveymeans for One-Stage Cluster Samples

To use Proc Surveymeans to analyze data from a one-stage cluster sample with the goal

of estimating or , we need to include a Cluster statement followed by a cluster label.

In the rst example, the clusters are labeled cluster.

The value following total = is the number of primary units in the population.

The appropriate weight to use in the weight statement to get the correct estimates for

is M/(nL).

Analysis of the One-Stage Cluster Sample in Figure 8a

data Clus_8a;

wgt= 400/(8*4); * wgt = M/(n*L) ;

input _cluster trees @@;

datalines;

1 1 1 2 1 4 1 1 2 2 2 0 2 7 2 0

3 2 3 2 3 0 3 0 4 0 4 0 4 2 4 6

5 1 5 16 5 8 5 1 6 2 6 2 6 2 6 0

7 2 7 1 7 0 7 1 8 0 8 0 8 0 8 1

;

proc surveymeans data=Clus_8a total=100 mean clm sum clsum;

var trees;

cluster _cluster;

weight wgt;

title1 One-Stage Cluster Sample in Figure 8a --- Estimating mu and tau;

run;

=========================================================================

One-Stage Cluster Sample in Figure 8a -- Estimating mu and tau

The SURVEYMEANS Procedure

Data Summary

Number of Clusters 8

Number of Observations 32

Sum of Weights 400

96

Statistics

Std Error

Variable Mean of Mean 95% CL for Mean

-----------------------------------------------------------------

trees 2.062500 0.648436 0.52919341 3.59580659

-----------------------------------------------------------------

Variable Sum Std Dev 95% CL for Sum

-----------------------------------------------------------------

trees 825.000000 259.374247 211.677365 1438.32263

-----------------------------------------------------------------

Analysis of the One-Stage Cluster Sample in Figure 8b

data Clus_8b;

wgt= 400/(10*4); * wgt = M/(nL) ;

do _cluster = 1 to 10;

do sec_unit = 1 to 4;

input count @@; output;

end; end;

datalines;

19 24 17 24 20 15 25 21 34 39 33 34 33 25 31 34 29 38 40 38

32 32 37 38 50 45 48 50 31 31 37 37 41 44 49 41 49 44 52 52

;

proc surveymeans data=Clus_8b total=100 mean clm sum clsum;

var count;

cluster _cluster;

weight wgt;

title1 One-Stage Cluster Sample in Figure 8b -- Estimating mu and tau;

run;

========================================================================

One-Stage Cluster Sample in Figure 8b -- Estimating mu and tau

The SURVEYMEANS Procedure

Data Summary

Number of Clusters 10

Number of Observations 40

Sum of Weights 400

Statistics

Std Error

Variable Mean of Mean 95% CL for Mean

-----------------------------------------------------------------

count 35.325000 2.980573 28.5824765 42.0675235

-----------------------------------------------------------------

Variable Sum Std Dev 95% CL for Sum

-----------------------------------------------------------------

count 14130 1192.229005 11432.9906 16827.0094

-----------------------------------------------------------------

97

6.5 Systematic Sampling

If a systematic sample is selected using simple random sampling to select the system-

atic primary units, we can apply the estimation results for cluster sampling to dene (i)

estimators, (ii) the variance of each estimator, and (iii) the estimated variance of each

estimator.

The formulas we are about to introduce will be the same as those used for one-stage cluster

sampling. The subscript sys denotes the fact that data were collected under systematic

sampling.

6.5.1 Estimation of and

The unbiased estimators of and are:

sys

=

N

n

n

i=1

y

i

= Ny

sys

=

1

nL

n

i=1

y

i

=

y

L

=

sys

M

(84)

with variance

var(

sys

) = N(N n)

2

u

n

var(

sys

) =

N(N n)

M

2

2

u

n

y (85)

where

2

u

=

N

i=1

(y

i

1

)

2

N 1

.

Recall that y =

1

n

n

i=1

y

i

is the sample mean

and that s

2

u

=

n

i=1

(y

i

y)

2

n 1

is the sample variance of the primary units.

Because

2

u

is unknown, we use s

2

u

to get unbiased estimators of the variances:

var(

sys

) = N(N n)

s

2

u

n

var(

sys

) =

N(N n)

M

2

s

2

u

n

(86)

6.5.2 Condence Intervals for and

For a relatively small number n of sampled primary units, the following condence intervals

are recommended:

sys

t

_

var(

sys

)

sys

t

_

var(

sys

) (87)

where t

is the upper /2 critical value from the t(n 1) distribution. Note that the

degrees of freedom are based on n, the number of sampled primary units, and not on the

total number of secondary units nL.

98

Systematic Sampling Examples

In Figure 9a, each of the 25 primary units contains the 16 secondary units corresponding to the

same location within the 16 5x5 subregions. n = 3 primary units were sampled. In Figure 9b,

each of the 50 primary units contains the 8 secondary units corresponding to the same location

within the 8 10x5 subregions. n = 6 primary units were sampled.

Figure 9a

1 1 (1) 1 1 2 1 (0) 0 0 4 5 (0) 1 0 1 2 (1) 0 1

3 2 1 0 1 0 0 0 1 2 2 2 0 2 2 2 0 2 0 1

7 (4) 1 1 1 1 (0) 0 0 2 2 (0) 4 3 2 4 (2) 1 2 2

0 1 2 0 0 0 0 0 4 6 5 1 5 0 0 0 2 1 2 0

1 1 0 (2) 3 2 0 0 (2) 1 3 1 4 (1) 1 1 2 2 (1) 1

2 0 (0) 0 4 3 3 (0) 1 16 5 0 (1) 3 8 0 0 (1) 3 3

0 0 1 14 3 3 1 2 0 8 0 2 0 3 9 0 4 2 1 0

0 (0) 5 1 8 7 (6) 6 6 1 0 (4) 0 0 1 2 (2) 0 1 2

0 0 2 2 3 2 2 3 1 1 1 3 0 0 2 2 0 3 4 0

0 0 0 (0) 1 0 3 1 (1) 1 2 0 2 (0) 2 0 2 1 (1) 0

1 8 (7) 7 8 0 5 (0) 1 0 1 2 (0) 0 2 4 2 (2) 2 4

0 9 1 0 0 1 1 1 0 0 0 1 2 4 0 2 1 3 3 1

0 (0) 0 1 0 2 (4) 3 1 2 2 (0) 0 1 1 2 (2) 0 2 4

0 1 0 0 1 2 0 2 3 5 2 0 0 2 1 1 2 0 1 3

1 0 0 (1) 1 0 0 0 (2) 2 2 1 1 (1) 0 0 2 0 (0) 0

0 2 (0) 2 2 0 1 (1) 0 2 0 0 (1) 0 0 1 1 (1) 5 3

0 0 0 3 2 1 0 0 0 0 0 2 1 0 1 1 1 3 1 2

1 (0) 0 1 0 3 (0) 1 0 0 2 (1) 2 0 0 0 (1) 1 1 0

0 0 0 0 0 0 0 1 1 1 0 1 0 3 0 2 0 1 1 0

2 0 0 (0) 0 0 0 0 (1) 2 0 1 3 (0) 0 1 0 1 (2) 4

Figure 9b

18 (20) 15 20 20 15 (19) 18 24 23 20 (26) 29 28 28 31 (31) 34 28 32

13 20 16 20 15 23 19 26 21 21 24 30 23 26 25 33 31 28 32 38

(16) 18 20 24 (25) (26) 22 23 26 (26) (22) 27 25 25 (34) (28) 37 36 38 (31)

17 17 16 22 21 23 22 27 27 24 28 32 29 33 27 37 37 38 35 33

15 19 23 17 21 23 21 23 24 25 31 26 32 34 32 33 31 31 36 37

21 (24) 20 21 28 26 (30) 22 31 25 29 (29) 27 30 29 37 (35) 32 38 43

23 17 24 25 24 27 31 29 31 34 27 36 29 29 34 39 37 37 40 36

(18) 24 21 25 27 (22) 32 32 31 26 (28) 34 34 37 35 (34) 38 38 37 40

22 26 28 (26) 24 29 33 26 (27) 27 34 31 39 (32) 36 38 37 40 (44) 43

23 27 28 29 26 32 25 31 35 34 32 33 37 32 42 40 40 37 42 44

23 (21) 31 23 30 27 (31) 30 32 35 30 (40) 32 37 37 36 (40) 44 44 40

26 29 31 26 30 31 34 36 30 38 36 32 38 38 37 42 42 41 40 49

(28) 24 28 27 (26) (31) 32 29 32 (33) (38) 34 39 38 (40) (37) 41 43 42 (43)

32 25 31 32 29 29 35 38 38 32 36 35 39 42 39 40 44 42 41 45

27 29 35 28 35 35 31 40 35 37 38 44 40 40 47 39 49 48 51 49

30 (29) 32 32 33 30 (36) 38 42 36 35 (38) 44 47 45 49 (41) 43 44 51

28 35 35 34 34 33 41 33 34 35 39 44 44 48 44 50 49 48 53 54

(29) 33 32 36 39 (33) 33 34 35 42 (46) 47 48 47 46 (45) 44 52 54 55

28 37 38 (37) 33 33 34 37 (45) 40 39 42 42 (46) 47 48 52 47 (46) 53

38 39 39 37 34 38 39 45 39 42 45 41 44 51 46 50 52 51 51 53

99

6.5.3 Comments from W.G. Cochran

Cochran (from Sampling Techniques (1953)) makes the following comments about advan-

tages of systematic sampling:

Intuitively, systematic sampling seems likely to be more precise than simple

random sampling. In eect, it straties the population into [N] strata, which

consist of the rst [L] units, the second [L] units, and so on. We might there-

fore expect the systematic sample to be about as precise as the corresponding

stratied random sample with one unit per stratum. The dierence is that

with the systematic sample the units all occur at the same relative position

in the stratum, whereas with the stratied random sample the position in the

stratum is determined separately by randomization within each stratum. The

systematic sample is spread more evenly over the population, and this fact has

sometimes made systematic sampling considerably more precise than stratied

random sampling.

Cochran also warns us that:

The performance of systematic sampling relative to that of stratied or simple

random sampling is greatly dependent on the properties of the population. There

are populations for which systematic sampling is extremely precise and other for

which it is less precise that simple random sampling. For some populations and

values of [L], [var(

sys

)] may even increase when a larger sample is taken

a startling departure from good behavior. Thus it is dicult to give general

advice about the situation in which systematic sampling is to recommended. A

knowledge of the structure of the population is necessary for its most eective

use.

If a population contains a linear trend:

1. The variances of the estimators from systematic and stratied sampling will be smaller

than the variance of the estimator from simple random sampling.

2. The variance of the estimator from systematic sampling will be larger than the vari-

ance of the estimator from stratied sampling. Why? If the starting point of the

systematic sample is selected too low or too high, it will be too low or too high

across the population of units. Whereas, stratied sampling gives an opportunity for

within-stratum errors to cancel.

For example: Suppose a population has 12 secondary units ( = 130) and is ordered as

follows:

Sampling unit 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16

y-value 1 2 2 3 3 4 5 6 8 9 12 13 14 15 16 17

Note there is a linearly increasing trend in the y-values with the order of the sampling

units. Suppose we take a 1-in-4 systematic sample. The following table summarizes the

four possible 1-in-4 systematic samples.

100

Sampling unit 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16

y-values 1 2 2 3 3 4 5 6 8 9 12 13 14 15 16 17

sys

Sample 1 1 3 8 14 104

Sample 2 2 4 9 15 120

Sample 3 2 5 12 16 140

Sample 4 3 6 13 17 156

If a population has periodic trends, the eectiveness of the systematic sample depends

on the relationship between the periodic interval and the systematic sampling interval or

pattern. The following idealized curve was given by Cochran to show this. The height of

the curve represents the population y-value.

The sample points A represent the least favorable systematic sample because when-

ever L is equal to the period, every observation in the systematic sample will be the

same so the sample is no more precise than a single observation taken at random

from the population.

The sample points B represent the most favorable systematic sample because L is

equal to a half-period. Every systematic sample has mean exactly equal to the true

population mean because successive y-value deviations above and below the mean

cancel. Thus, the variance of the estimator is zero.

For other values of L, the sample has varying degrees of eectiveness that depends

on the relation between L and the period.

6.6 Using a Single Systematic Sample

Many studies generate data from a systematic sample based on a single randomly selected

starting unit (i.e., there is only one randomly selected primary unit).

When there is only one primary unit, it is possible to get unbiased estimators

sys

and

sys

of and . It is not possible, however, to get an unbiased estimator of the variances

var(

sys

) and var(

sys

).

If we can ignore the fact that the y

ij

-values were collected systematically and treat the L

secondary units in the single primary unit as a SRS, then the SRS variance estimator would

be a reasonable substitute only if the units of the population can reasonably be conceived

as being randomly ordered (i.e., there is no systematic pattern in the population such as

a linear trend or a periodic pattern).

If this assumption is reasonable, then

Var(

sys

)

Var( ) =

_

Nn

N

_

s

2

n

With natural populations in which nearby units are similar to each other (spatial correla-

tion), this procedure tends to provide overestimates of the variances of

sys

and

sys

.

Procedures for estimating variances from a single systematic sample are discussed in Bell-

house (1988), Murthy and Rao (1988), and Wolter (1984).

101

6.7 Using Proc Surveymeans for Systematic Samples

To use Proc Surveymeans to analyze data from a systematic sample with the goal of

estimating or , we need to include a Cluster statement followed by a label representing

the units in the systematic sample. In the rst example, the label is start pt.

The value following total = is the number of starting points for a primary unit in a

systematic sample. That is, it is the primary units in the population.

The appropriate weight to use in the weight statement to get the correct estimates for

is M/(nL).

Analysis of the Systematic Sample in Figure 9a

data Sys_9a;

wgt= 400 /(3*16); * wgt = M/(nL) ;

do start_pt = 1 to 3;

do sec_unit = 1 to 16;

input count @@; output;

end; end;

datalines;

1 0 0 1 0 0 1 1 7 0 0 2 0 1 1 1

4 0 0 2 0 6 4 2 0 4 0 2 0 0 1 1

2 2 1 1 0 1 0 1 1 2 1 0 0 1 0 2

;

proc surveymeans data=Sys_9a total=25 mean clm sum clsum;

var count;

cluster start_pt;

weight wgt;

title1 Systematic Sample in Figure 9a --- Estimating mu and tau;

run;

==================================================================

Systematic Sample in Figure 9a --- Estimating mu and tau

The SURVEYMEANS Procedure

Data Summary

Number of Clusters 3

Number of Observations 48

Sum of Weights 400

Statistics

Std Error

Variable Mean of Mean 95% CL for Mean

-----------------------------------------------------------------

count 1.187500 0.205902 0.30157311 2.07342689

-----------------------------------------------------------------

Variable Sum Std Dev 95% CL for Sum

-----------------------------------------------------------------

count 475.000000 82.360994 120.629244 829.370756

-----------------------------------------------------------------

102

Analysis of the Systematic Sample in Figure 9b

data Sys_9b;

wgt= 400 /(6*8); * wgt = M/(nL) ;

do start_pt = 1 to 6;

do sec_unit = 1 to 8;

input count @@; output;

end; end;

datalines;

20 19 26 31 21 31 40 40

16 26 22 28 28 31 38 37

25 26 34 31 26 33 40 43

24 30 29 35 29 36 38 41

18 22 28 34 29 33 46 45

26 27 32 44 37 45 46 46

;

proc surveymeans data=Sys_9b total=50 mean clm sum clsum;

var count;

cluster start_pt;

weight wgt;

title1 Systematic Sample in Figure 9b --- Estimating mu and tau;

run;

==================================================================

Systematic Sample in Figure 9b --- Estimating mu and tau

The SURVEYMEANS Procedure

Data Summary

Number of Clusters 6

Number of Observations 48

Sum of Weights 400

Statistics

Std Error

Variable Mean of Mean 95% CL for Mean

-----------------------------------------------------------------

count 31.916667 1.342334 28.4660867 35.3672466

-----------------------------------------------------------------

Variable Sum Std Dev 95% CL for Sum

-----------------------------------------------------------------

count 12767 536.933681 11386.4347 14146.8986

-----------------------------------------------------------------

103

6.8 Cluster Sampling with Unequal Cluster Sizes

Suppose the N cluster sizes M

1

, M

2

, . . . , M

N

are not all equal and that a one-stage cluster

sample of n primary units is taken with the goal of estimating or .

Let M

i

and y

i

(i = 1, 2, . . . , m) be the sizes and totals of the n sampled primary units.

Let m =

n

i=1

M

i

be the total number of secondary units in the sample.

We will discuss three methods of calculating estimates of and given the unequal cluster

sizes. These methods are based on two dierent representations of .

(i) as a population ratio:

=

N

i=1

y

i

N

i=1

M

i

=

N

i=1

y

i

M

(88)

expresses as the ratio of the total of the primary unit values to the total number

of secondary units.

(ii) as a mean cluster total:

=

_

N

M

_

N

i=1

y

i

N

(89)

expresses as a multiple of the mean of the cluster y

i

values.

Method 1: The sample cluster ratio: Substitution of sample values into (88) provides

the following ratio estimator for :

c(a)

=

n

i=1

y

i

n

i=1

M

i

=

n

i=1

y

i

m

which is the ratio of the sum of the sampled cluster totals to the sum of the sampled

cluster sizes.

c(a)

is a special case of the SRS ratio estimator (see Section 4 of the course notes).

Thus,

c(a)

is biased with the bias 0 as n increases.

Because

c(a)

is a ratio estimator, there is no closed-form for the true variance of

c(a)

.

However, an approximation is given in Thompson (2002). A sample-based estimate

of this approximate variance is given by

var(

c(a)

) =

(N n)N

n(n 1)M

2

n

i=1

M

2

i

(y

i

c(a)

)

2

. (90)

If M is not known, it can be estimated from the sample as M Nm/n. Substitution

into (90) yields:

var(

c(a)

) =

(N n)n

N(n 1)

n

i=1

_

M

i

m

_

2

(y

i

c(a)

)

2

. (91)

To estimate , multiply

c(a)

by M. To get the estimated variances, multiply var(

c(a)

)

by M

2

.

104

Method 2: The cluster sample total: Substitution of sample values into (89) provides

the following unbiased estimator for :

c(b)

=

N

M

n

i=1

y

i

n

=

N

nM

n

i=1

y

i

The variance var(

c(b)

) =

(N n)N

n(N 1)M

2

N

i=1

(y

i

1

)

2

=

(N n)N

nM

2

2

u

.

An estimate of this variance is given by

var(

c(b)

) =

(N n)N

n(n 1)M

2

n

i=1

(y

i

y)

2

=

(N n)N

nM

2

s

2

u

. (92)

If M is not known, we can substitute of M Nm/n into (92) and get:

var(

c(b)

) =

(N n)n

(n 1)Nm

2

n

i=1

(y

i

y)

2

=

(N n)n

Nm

2

s

2

u

. (93)

To estimate using Method 1 or Method 2, multiply

c(a)

or

c(b)

by M. To get the

estimated variances, multiply var(

c(a)

) and var(

c(b)

) by M

2

.

A condence interval for using either Method 1 (subscript a) or Method 2 (subscript b)

is:

c(k)

t

_

var(

c(k)

) for k = a, b (94)

where t

Method 3: Primary units selected with pps: Suppose that the primary units are

selected with replacement with draw-by-draw selection probabilities (p

i

) proportional to

the sizes of the primary units, p

i

= M

i

/M.

One way to construct the sampling design is to

1. Select m secondary units (say, u

1

, u

2

, . . . , u

m

) from the M in the population using

simple random sampling with replacement.

2. Then for each u

i

(i = 1, 2, . . . , m), sample all secondary units in the cluster containing

u

i

.

Thus, a primary unit is selected every time any of its secondary units is selected.

Now that we have dened p

i

, we simply use either the Hansen-Hurwitz estimator or the

Horvitz-Thompson estimator (and the associated variance estimators) discussed in Section

3 of the course notes.

105

Figure 10: Cluster Sampling with Unequal-Sized Cluster

The mean = 33.385. There are M = 400 secondary units and N = 49 primary units (clusters).

There are 9 clusters of size M

i

= 16, 24 clusters of size M

i

= 8, and 16 clusters of size M

i

= 4.

The boldfaced values represent the units in the sample.

18 20 15 20 20 15 19 18 24 23 20 26 29 28 28 31 31 34 28 32

13 20 16 20 15 23 19 26 21 21 24 30 23 26 25 33 31 28 32 38

16 18 20 24 25 26 22 23 26 26 22 27 25 25 34 28 37 36 38 31

17 17 16 22 21 23 22 27 27 24 28 32 29 33 27 37 37 38 35 33

15 19 23 17 21 23 21 23 24 25 31 26 32 34 32 33 31 31 36 37

21 24 20 21 28 26 30 22 31 25 29 29 27 30 29 37 35 32 38 43

23 17 24 25 24 27 31 29 31 34 27 36 29 29 34 39 37 37 40 36

18 24 21 25 27 22 32 32 31 26 28 34 34 37 35 34 38 38 37 40

22 26 28 26 24 29 33 26 27 27 34 31 39 32 36 38 37 40 44 43

23 27 28 29 26 32 25 31 35 34 32 33 37 32 42 40 40 37 42 44

23 21 31 23 30 27 31 30 32 35 30 40 32 37 37 36 40 44 44 40

26 29 31 26 30 31 34 36 30 38 36 32 38 38 37 42 42 41 40 49

28 24 28 27 26 31 32 29 32 33 38 34 39 38 40 37 41 43 42 43

32 25 31 32 29 29 35 38 38 32 36 35 39 42 39 40 44 42 41 45

27 29 35 28 35 35 31 40 35 37 38 44 40 40 47 39 49 48 51 49

30 29 32 32 33 30 36 38 42 36 35 38 44 47 45 49 41 43 44 51

28 35 35 34 34 33 41 33 34 35 39 44 44 48 44 50 49 48 53 54

29 33 32 36 39 33 33 34 35 42 46 47 48 47 46 45 44 52 54 55

28 37 38 37 33 33 34 37 45 40 39 42 42 46 47 48 52 47 46 53

38 39 39 37 34 38 39 45 39 42 45 41 44 51 46 50 52 51 51 53

n

i

y

i

y

i

16 401 25.0625

16 337 21.0625

8 273 34.8750

8 321 40.1250

8 280 35.0000

4 171 42.7500

4 187 46.7500

4 216 54.0000

m = 68

y

i

= 2192 y = 274

106

Figure 11: Horvitz-Thompson Estimation with Selection Probabilities

Proportional to Cluster Size

The mean = 33.385. There are M = 400 secondary units and M = 49 primary units (clusters).

There are 9 clusters with M

i

= 16, 24 clusters with M

i

= 8, and 16 clusters with M

i

= 4. Five

clusters were sampled with replacement. One cluster was sampled twice. The boldfaced values

are in the sample.

Sampled

twice

18 20 15 20 20 15 19 18 24 23 20 26 29 28 28 31 31 34 28 32

13 20 16 20 15 23 19 26 21 21 24 30 23 26 25 33 31 28 32 38

16 18 20 24 25 26 22 23 26 26 22 27 25 25 34 28 37 36 38 31

17 17 16 22 21 23 22 27 27 24 28 32 29 33 27 37 37 38 35 33

15 19 23 17 21 23 21 23 24 25 31 26 32 34 32 33 31 31 36 37

21 24 20 21 28 26 30 22 31 25 29 29 27 30 29 37 35 32 38 43

23 17 24 25 24 27 31 29 31 34 27 36 29 29 34 39 37 37 40 36

18 24 21 25 27 22 32 32 31 26 28 34 34 37 35 34 38 38 37 40

22 26 28 26 24 29 33 26 27 27 34 31 39 32 36 38 37 40 44 43

23 27 28 29 26 32 25 31 35 34 32 33 37 32 42 40 40 37 42 44

23 21 31 23 30 27 31 30 32 35 30 40 32 37 37 36 40 44 44 40

26 29 31 26 30 31 34 36 30 38 36 32 38 38 37 42 42 41 40 49

28 24 28 27 26 31 32 29 32 33 38 34 39 38 40 37 41 43 42 43

32 25 31 32 29 29 35 38 38 32 36 35 39 42 39 40 44 42 41 45

27 29 35 28 35 35 31 40 35 37 38 44 40 40 47 39 49 48 51 49

30 29 32 32 33 30 36 38 42 36 35 38 44 47 45 49 41 43 44 51

28 35 35 34 34 33 41 33 34 35 39 44 44 48 44 50 49 48 53 54

29 33 32 36 39 33 33 34 35 42 46 47 48 47 46 45 44 52 54 55

28 37 38 37 33 33 34 37 45 40 39 42 42 46 47 48 52 47 46 53

38 39 39 37 34 38 39 45 39 42 45 41 44 51 46 50 52 51 51 53

i y

i

M

i

p

i

= M

i

/M

i

= 1 (1 p

i

)

5

1 344 16 16/400=.04 1 .96

5

= .184627302

2 252 8 8/400=.02 1 .98

5

= .096079203

3 278 8 8/400=.02 1 .98

5

= .096079203

4 181 4 4/400=.01 1 .99

5

= .049009950

12

=

13

= [1 (.96

5

)] + [1 (.98

5

)] [1 (.94

5

)] = .0146105270

14

= [1 (.96

5

)] + [1 (.99

5

)] [1 (.95

5

)] = .00741819

24

=

34

= [1 (.98

5

)] + [1 (.99

5

)] [1 (.97

5

)] = .003823179

23

= [1 (.98

5

)] + [1 (.98

5

)] [1 (.96

5

)] = .007531104

107

108

Figure 12: Hansen-Hurwitz Estimation with Selection Probabilities Proportional

to Cluster Size

In Figure 11, the total abundance is = 13354. There are M = 400 secondary units and

M = 49 primary units (clusters). There are 9 clusters with M

i

= 16, 24 clusters with M

i

= 8,

and 16 clusters with M

i

= 4. The cluster totals y

i

for the clusters in Figure 11 are summarized

in the gure below. Also included is a cluster label (1 to 49). Eight clusters were sampled with

replacement. The sampled units are 2, 6, 6, 16, 25, 30, 32, and 44. Note that cluster 6 was

sampled twice. The boldfaced values are in the sample.

1 2 3 10 11 12 13

292 (344) 401 218 243 272 267

4 5 6 14 15 16 17

337 418 ((467)) 252 273 (279) 307

7 8 9 18 19 20 21

419 475 526 285 308 321 346

22 26 30 34 35 36 37

227 249 (278) 158 156 170 171

23 27 31 38 39 40 41

242 278 305 171 180 181 195

24 28 32 42 43 44 45

262 280 (322) 187 185 (193) 216

25 29 33 46 47 48 49

(293) 293 333 333 183 191 203

Unit i y

i

p

i

y

i

/p

i

2 344 .04 8600

6 467 .04 11675

6 467 .04 11675

16 279 .02 13950

25 293 .02 14650

30 273 .02 13900

32 322 .02 16100

44 193 .01 19300

109850

109

6.9 Attribute Proportion Estimation using Cluster Sampling

Instead of studying a quantitative measure associated with sampling units, we often are

interested in an attribute (a qualitative characteristic). Statistically, the goal is to estimate

a proportion. The population proportion p is the proportion of population units having

that attribute.

Examples: the proportion of females (or males) in an animal population, the proportion of

consumers who own motorcycles, the proportion of married couples with at least 1 child. . .

If a one-stage cluster sample is taken, then how do we estimate p?

6.9.1 Estimating p with Equal Cluster Sizes

Statistically, we use an indicator function that assigns a y

ij

value to secondary unit j in

primary unit (cluster) i as follows:

y

ij

= 1 if unit j in cluster i possesses the attribute

= 0 otherwise

Then =

N

i=1

M

j=1

y

ij

and p =

LN

=

M

where M

i

= L for each cluster.

The proportion for cluster i is dened as p

i

=

1

L

L

j=1

y

ij

.

By taking a one-stage cluster sample of n equal-sized clusters, we can estimate p as the

weighted average of the sampled cluster proportions:

p

c

=

n

i=1

p

i

n

.

p

c

is an unbiased estimator of p.

The variance of p

c

is

var( p

c

) =

_

N n

nN

_

N

i=1

(p

i

p)

2

N 1

=

_

1 f

n

_

N

i=1

(p

i

p)

2

N 1

(95)

where f = n/N = the proportion of clusters sampled.

Because p is unknown, we use p

c

as an estimate of p to get the unbiased estimator of

var( p

c

):

var( p

c

) =

_

N n

nN

_

n

i=1

(p

i

p

c

)

2

n 1

=

_

1 f

n

_

n

i=1

(p

i

p

c

)

2

n 1

(96)

110

6.9.2 Estimating p with Unequal Cluster Sizes

Suppose the cluster sizes are not all equal. Let m

i

be the number of secondary units in

cluster i and y

i

=

M

i

j=1

y

ij

= the cluster i total.

By taking a one-stage cluster sample of n cluster from a population with unequal-sized

clusters, we can estimate p as:

p

c

=

n

i=1

y

i

n

i=1

M

i

.

Note that p

c

is a ratio estimator. Therefore, it is a biased estimator. The bias, however,

tends to be small for large

n

i=1

M

i

.

The var( p

c

) is approximated by:

var( p

c

) =

_

1 f

nM

2

_

N

i=1

(y

i

pM

i

)

2

N 1

(97)

where M =

N

i=1

M

i

/N = the average number of elements per cluster in the population.

Because p is unknown, we use p

c

as an estimate to get the unbiased estimator of var( p

c

):

var( p

c

)

_

1 f

nm

2

_

n

i=1

(y

i

p

c

M

i

)

2

n 1

=

_

1 f

nm

2

_

n

i=1

y

2

i

2p

c

n

i=1

y

i

M

i

+ p

2

c

n

i=1

M

2

i

n 1

(98)

where m =

n

i=1

M

i

/n = the average number of elements per cluster in the sample.

Additional References

Bellhouse, D.R. (1988) Systematic sampling. Handbook of Statistics, Vol. 6 (Sampling). 125-145.

Eds: Krishnaiah and Rao. Elsevier Science Publishers. Amsterdam.

Murthy, M.N. and Rao, T.J. (1988) Systematic sampling with illustrative examples. Handbook

of Statistics, Vol. 6 (Sampling). 147-185. Eds: Krishnaiah and Rao. Elsevier Science Publishers.

Amsterdam.

Wolter, K.M. (1984) An investigation of some estimators of variance for systematic sampling. J.

of the American Statistical Association. 79 781-790.

111

Example: A simple random sample of n = 30 households (clusters) was drawn from

a health district in Baltimore (USA) that contains N = 15, 000 households. Using the

following data, estimate the proportion p of people in this health district that visited a

doctor last year.

Household Household Number who visited

Number Size (M

i

) doctor last year (y

i

)

1 5 5

2 6 0

3 3 2

4 3 3

5 2 0

6 3 0

7 3 0

8 3 0

9 4 0

10 4 0

11 3 0

12 2 0

13 7 0

14 4 4

15 3 1

16 5 2

17 4 0

18 4 0

19 3 1

20 3 3

21 4 2

22 3 0

23 3 0

24 1 0

25 2 2

26 4 2

27 3 0

28 4 2

29 2 0

30 4 1

Totals 104 30

112

- ProblemSet1 CorrectionUploaded byMarcelo Miranda
- 1Uploaded bysomeoneelses
- Question Paper 1 NHTET April 2017Uploaded bypeeyushbcihmct
- notes0102.docUploaded byRiya Gupta
- Research Methodology 1Uploaded byMarcus Lloyd
- teachersUploaded byPrateek Prateek B
- Mca4020 Slm Unit 11Uploaded byAppTest PI
- 2nd COT Lesoon Plan.docxUploaded byAldrin Dela Cruz
- Access to Health Care in Relation to Socioeconomic Status in the Amazonian Area of PeruUploaded byKeren Yee
- work study.pptxUploaded byEnna Gupta
- PEDAGOGICAL CONTENT KNOWLEDGE (PCK) AND TEACHER EFFECTIVENESS IN GEOGRAPHY TEACHING IN RESPECT OF EXPERIENCE AND QUALIFICATION; A COMPARATIVE STUDY.Uploaded byIJAR Journal
- Project ReportUploaded bymuskaan bhadada
- how to calculate sample size.pdfUploaded byMutia Febrina
- how to calculate sample size.pdfUploaded byAnonymous jNPc26snc
- MBABusiness Research MethodsUploaded byArdana Raffali
- Estimation 2Uploaded byjlosam
- Ang-Tagnau-converted.docxUploaded byRonie Lansang
- ResearchUploaded byNyrek Mihc
- Introduction 3 Later Latest Edition Neww 999 NhasiUploaded byIshie
- PSAI LectureUploaded byKoujou Yamaguchi
- 0096-3003%2886%2990028-7Uploaded byPrashant Nandan
- MCM 2007Uploaded byCameron Bracken
- UNIT IIIUploaded bymukulam
- Oop SqliteUploaded byasura game
- 01 & 02 STAT HWA aUploaded bycse0909
- A Research Question is a Highly Focused Question That Addresses One Concept or Component of the Hypothesis Whereas the Hypothesis Itself is Used to State the Relationship Between Two VariablesUploaded byHafiz Ahmed
- Penyakit5Uploaded byYhuyunlheaElf-Sanzha
- Chapter 09Uploaded byRodel Solano
- Umair AhmedUploaded byAun Naqvi
- Main Research Paper (1)Uploaded byEJ Zuniga

- Alternative Fuel Research Proposal.docxUploaded bycfmonarquia
- e2 Plugin Tutorial EnglishUploaded byCarlosSantana
- M021 Jsa for Dye Penetration TestUploaded byTanja Huygen
- EGE15B9Uploaded byJhonny Rafael Blanco Caura
- MiscUploaded byCandra Pungki Wibowo
- L System RulesUploaded byZainul Anwar
- 4 Year Plan DescriptionUploaded bydanilobmlusp
- Exam1_355_Fall_11Uploaded bymajunee1
- geas mod 17Uploaded byDenaiya Watton Leeh
- Burrell and Morgan 4 Paradigms v2lsuUploaded byMahmood Khan
- BBB 2015 Essay BSC Older Adult (1)Uploaded byAj Aquino
- Information Technology and Its Role in Creating Sustainable Competitive AdvantageUploaded byHari Dgand Hari
- Fundamentos Basicos Del InglesUploaded byIvan Castro Zapata
- IT Risk as a Language for AlignmentUploaded bykual21
- Burke - The Philosophy of Literary FormsUploaded byShannon Christensen
- Tool - Five Life Stages of a Nonprofit OrganizationUploaded byLisa Husniyyah Owens
- TranscriptUploaded byadil_qwerty
- How to create regional states and blocs: Economic integration by tradeUploaded byCarl Cord
- T05 Heidecker MRAM TechnologyUploaded bySaurabh Joshi
- Readme MW2 LiberationUploaded byevo98x
- 2-Generations of Computer .docxUploaded byJordan Smith
- history-social science reading notetaking guideUploaded byapi-279493388
- Excel UserForm Controls - CheckBox, OptionButton and ToggleButtonUploaded bySoniaChiches
- forouzan 5Uploaded byAngelo Vinicius
- Rudra Pratap Singh-Microstrip AntennaUploaded byrpatfb
- THREE 5574-Strategic ManagementUploaded byAhmedSaad647
- Analysis and Design of Multi-storied Building (Main Project) Sita GowriUploaded bydskumar49
- Most Appropriate Strategies in Araling PanlipunanUploaded byMaristela R. Galanida-Elandag
- Library Assistant _CDIUploaded byAziz El Hassani
- OpenText RightFax 10.6 Administrative Utilities GuideUploaded byAshwani Yadav

## Much more than documents.

Discover everything Scribd has to offer, including books and audiobooks from major publishers.

Cancel anytime.