You are on page 1of 42

TWO STAGE

CLUSTER
SAMPLING
A TWO STAGE CLUSTER SAMPLE

A two-stage cluster sampling is


obtained by selecting a probability
sample of clusters and then selecting
a probability sample of elements from
each sampled cluster.
ESTIMATION OF A POPULATION
MEAN & TOTAL
 The following notation is used in this chapter:
N  the number of clusters in the population
n  the number of clusters selected in a sr sample
M i  the number of elements in cluster i
mi  the number of elements selected in a sr sample from cluster i
 N 
M  the number of elements in the population   M i 
 i 1 
M 
M  the average cluster size for the population  
N
y ij  the jth observatio n in the sample from the ith cluster
 1 mi

y ij  the sample mean for the ith cluster   y ij 
 mi j 1 
ESTIMATION OF A POPULATION MEAN
M is known
 Estimator of the population mean:
n n

N  My i i
1 M y i i
̂    i 1
 i 1

M  n M n
 Estimated variance of mean:

 n  1  2 2  si2 
n
V ˆ   1  
1 mi
ˆ
 
2
sb 

2  
M i 1   
N n M nN M i 1  Mi  mi 
ESTIMATION OF A POPULATION MEAN
M is known
 Estimated variance of mean:
 n  1  2 2  si2 
n
V ˆ   1  
1 mi
ˆ
 
2
sb 

2  
M i 1   
N n M nN M i 1  Mi  mi 
where
Sample variance for the
Sample variance sample selected from cluster i
among the terms M i y i

 M y   y y 
n mi
2 2
i i  M ˆ ij i
j 1
s 
2 i 1
si2  i  1,2,....., n
b
n 1 mi  1
Notice that s b2 is simply the sample variance
among the terms M i y i
ESTIMATION OF A POPULATION TOTAL
M is known
 Estimator of the population total:
n
N
Tˆ  Mˆ   M i y i
n i 1

 Estimated variance of total:


Vˆ Tˆ  M 2Vˆ ˆ 
   2 N 2  si2 
2 n
n N mi
 1  
 N  n 
sb   M i 1   
n i 1  Mi  mi 
EXAMPLE
9.1 & 9.2
RATIO ESTIMATION OF A POPULATION MEAN
M is unknown
 Ratio estimator of the population mean:
n

M y i i
̂ r  i 1
n

M
i 1
i

 Estimated variance of mean:

 n  1  2 2  si2 
n
V ˆ r   1  
1 mi
ˆ
 
2
sr 

2  
M i 1   
N n M nN M i 1  Mi  mi 
RATIO ESTIMATION OF A POPULATION MEAN
M is unknown
where:
n n

 M y  ˆ r   M y  M i ˆ r 
2 2 2
i i i i
sr2  i 1
 i 1

n 1 n 1
and

 y  yi 
mi
2
ij
j 1
si2  i  1,2,....., n
mi  1

The estimator ̂ r is biased, but the bias is


negligible when n is large
EXAMPLE
9.3
ESTIMATION OF A POPULATION PROPORTION
M is unknown
 Ratio estimator of the population proportion:
n

 M pˆ i i
pˆ  i 1
n

M
i 1
i

 Estimated variance of proportion:


 n  1  2   pˆ i qˆ i 
V p   1  
n
1 mi
ˆ ˆ sr   
M 1 
2
 
 mi  1 
i
  
2 2
N n M nN M i 1  Mi
ESTIMATION OF A POPULATION PROPORTION
M is unknown
where:

 M pˆ i  pˆ   M i pi  M i p 
n n

i
2 2
ˆ ˆ 2

sr2  i 1
 i 1

n 1 n 1
and
qˆ i  1  pˆ i
EXAMPLE
9.4
TWO-STAGE CLUSTER SAMPLING WITH
PROBABILITIES PROPORTIONAL TO SIZE (PPS)

 It is often advantageous to sample clusters


with probabilities proportional to their sizes
as the number of elements in a cluster may
vary greatly from cluster to cluster.
 Generally, pps sampling is used only at the
first stage of a two stage sampling procedure
because the elements within clusters tend to
be somewhat similar in size.
 Hence, the estimators for mean, total and
proportion were the same as one-stage with
pps.
PPS SAMPLING
ESTIMATION OF A POPULATION MEAN

 Estimator of the population mean:


1 n
̂ pps   yi
n i 1
 Estimated variance of mean:

   
n
1
ˆ
V  pps 
ˆ  yi   pps
ˆ 2

nn  1 i 1
PPS SAMPLING
ESTIMATION OF A POPULATION TOTAL

 Estimator of the population total:


n
M
Tˆpps 
n
 yi where yi is the mean
i 1 for the ith
cluster
 Estimated variance of total:

 
Vˆ Tˆpps 
M2 n

nn  1 i 1
yi  ˆ pps 2
PPS SAMPLING
ESTIMATION OF A POPULATION PROPORTION

 Estimator of the population proportion:


1 n
pˆ pps   pˆ i
n i 1
 Estimated variance of proportion:

   
n
1
ˆ
V p pps 
ˆ  pi  p pps
ˆ ˆ 2

nn  1 i 1
EXAMPLE 9.6 (pg 304)
 Eg:
• To estimate the proportion of current patients
who have been (or will be) in the hospital for
more than two consecutive days)
• The hospitals vary in size, so they will be
sampled with pps to their numbers of patients.
• For the three selected hospitals, 10% of the
records of current patients will be examined.
• Given the information on hospital sizes, select
a sample of three with pps.
EXAMPLE 8.12 (pg 275 – 276): Solution
 Number of employees and cumulative range
Hospital No of Cumulative range Number staying more
than two days
patients
1 328 1 – 328
2 109 329 – 437
3 432 438 – 869 25
4 220 870 – 1089
5 280 1090 – 1369 15
6 190 1370 – 1559 8
Total 1559

 Three random numbers must between 0001 and


1559.
 Selected numbers 1505, 1256, 0827
EXAMPLE 9.7 (pg 304-305)
 Eg:
• Estimate the proportion of patients staying
more than two days, for all six hospitals, and
place a bound on the error of estimation.
1 n 1  25 15 8 
pˆ pps   pˆ i       0.51
n i 1 3  43 28 19 

V  pˆ pps  
ˆ 1

32 
0.58  0.51  ...  0.42  0.51
2 2

 0.0025
2 0.0025  0.10
DISCUSSION ON SELECTION OF THE
CLUSTER ACCORDING TO VARIANCE

VARIANCE
DESCRIPTIONS
CONDITIONS

• Select few clusters and many


sb  si elements from within each sampled

sb  si cluster.

• great care should be taken in


s b  si planning the selection of clusters.
• refer the comments made from
sb  si previous chapter (page 277)
SAMPLING EQUAL -SIZE
TWO STAGE CLUSTERS
 Suppose that each cluster contains M elements;
that is,
M
M1  M 2  ......  M N  M 
N
 In this case it is common to take samples of equal
size from each cluster,so that

m1  m2  ......  mn  m
 Estimator of population mean:

1 n 1 n m
̂   y i   y ij
n i 1 nm i 1 j 1
SAMPLING EQUAL -SIZE
TWO STAGE CLUSTERS
 Estimated variance of ̂ :
1  f1 2  1  f2  2
V ˆ  
ˆ sb  f1  sw
n  nm 
where
n
sb 
2 1
 y i  y CL  
2
Variance between - cluster
n  1 i 1

 
n m
1
s   ij i 
2 2
y y
n m  1 i 1 j 1
w

n
 Variance within - cluster
1
  si2
n i 1
SAMPLING EQUAL -SIZE
TWO STAGE CLUSTERS
where....(contd.)

n= the number of clusters selected in SRS


m=the number of elements selected in SRS from
each selected cluster

n
f1   Cluster sampling fraction
N
m  Within – cluster sampling
f2 
M fraction
EXAMPLE
A new bottling machine is being tested by a
company. During a test run,the machine
fills 24 cases, each containing 12 bottles.
The company wishes to estimate the
average number of ounces of fill per bottle.
A two-stage cluster sample is employed
using six cases (clusters),with four bottles
(elements) randomly selected from each. The
results are given in the accompanying table.
Estimate the average number of ounces per
bottle and place a bound on the error of
estimation.
EXAMPLE
Average
Sample
Case ounces of fills
variance
for sample
1 7.9 0.15
2 8.0 0.12
3 7.8 0.09
4 7.9 0.11
5 8.1 0.10
6 7.9 0.12
SOLUTION:
SAMPLING METHODS/ SAMPLING TECHNIQUES
STA550/STA552

EXAMPLE 9.1 (PAGE 293)

A garment manufacturer has 90 plants located throughout the United States and wants to
estimate the average number of hours that the sewing machines were down for repairs in the
past months. Because the plants are widely scattered, she decides to use cluster sampling,
specifying each plant as a cluster of machines. Each plant contains many machines, and
checking the repair record for each machine would be time-consuming. Therefore, she uses
two-stage sampling. Enough time and money are available to sample n=10 plants and
approximately 20% of the machines in each plant. Using the data in Table 9.1, estimate the
average downtime per machine and place a bound on the error of estimation. The manufacture
knows she has a combined total of 4500 machines in all plants.

TABLE 9.1: DOWNTIME FOR SWING MACHINES

Plant Mi mi Downtime (hours) yi s i2


1 50 10 5, 7, 9, 0, 11, 2, 8, 4, 3, 5 5.40 11.38
2 65 13 4, 3, 7, 2, 11, 0, 1, 9, 4, 3, 2, 1, 5 4.40 10.67
3 45 9 5, 6, 4, 11, 12, 0, 1, 8, 4 5.67 16.75
4 48 10 6, 4, 0, 1, 0, 9, 8, 4, 6, 10 4.80 13.29
5 52 10 11, 4, 3, 1, 0, 2, 8, 6, 5, 3 4.30 11.12
6 58 12 12, 11, 3, 4, 2, 0, 0, 1, 4, 3, 2, 4 3.83 14.88
7 42 8 3, 7, 6, 7, 8, 4, 3, 2 5.00 5.14
8 66 13 3, 6, 4, 3, 2, 2, 8, 4, 0, 4, 5, 6, 3 3.85 4.31
9 40 8 6, 4, 7, 3, 9, 1, 4, 5 4.88 6.13
10 56 11 6, 7, 5, 10, 11, 2, 1, 4, 0, 5, 4 5.00 11.80

SOLUTION

n n

 M y  501 240.02  4.80


 N 11
 Mi y i  i i
Mn Mn
i1 i1

Because M is not known, the M must be estimated by m where

n 2


 
 Mi y i  M  
 
s b2  i1
 27.722
n 1

 m  s 2 
n

i1
Mi2 1  i  i   21,985
 Mi  mi 

1|Page
SAMPLING METHODS/ SAMPLING TECHNIQUES
STA550/STA552

n  1  2  s i2 
n



 s b  1  m
V( y )  1   Mi2 1  i  
 N  nM 2   m i 
2
 nNM i1  Mi 
 10   
 27.72  21,985 
1 2 1
 1  
 90   10 50  
2
10 90 50 2
 0.0371

  
 2 V()  4.80  2 0.0371  4.80  0.39

Thus, the average downtime is estimated to be 4.80 hours. The error of estimation should be
less than 0.39 hour with a probability of approximately .95.

EXAMPLE 9.2 (PAGE 295)

Estimate the total amount of downtime during the past month for all machines owned by the
manufactured in Example 9.1. Place a bound on the error of estimation.

SOLUTION

M y  90240.02  21,602
  N
  M  i i
n
i1

  
V     M2 V     4500 2 0.0371
   

  
 2 V( )  21,605.31  2 4500 2 0.0371  21,602  1733

EXAMPLE 9.3 (PAGE 296)

Using the data in Table 9.1, estimate the average downtime per machine and place a bound on
the error of estimation. Assume the manufacture does not know how many machines there are
in all plants combined.

2|Page
SAMPLING METHODS/ SAMPLING TECHNIQUES
STA550/STA552

SOLUTION

n n


M y
i1
i i
1
n M y
i1
i i
240.02
r  n
 n
  4.60
52.2
M i1
i
1
n M
i1
i

n 2


  
Mi2  y i   r 
 
i1
 35.1
2
s r2 
n 1

 s i2 
n


 m   21,985
Mi2 1  i 
i1  Mi  mi 

M i1
i
522
M   52.2
n 10

n  1  2  s i2 
n


    s r  1  m
V   r   1   Mi2 1  i  
   N  nM   m i 
2 2
 nNM i1  Mi 
 10   
 35.1  21,985 
1 2 1
 1  
 90   10 52.2 
2
10 9052.22
 0.0492

  
 r  2 V( r )  4.60  2 0.0492  4.60  0.44

Thus, the estimated mean downtime per machine is 4.60 hours with a bound on the error of
estimation of 0.44 hour.

3|Page
SAMPLING METHODS/ SAMPLING TECHNIQUES
STA550/STA552

EXAMPLE 9.4 (PAGE 298)

The manufacture in Example 9.1 wants to estimate the proportion of machines that have been
shut down for major repairs (those requiring parts from stock outside the factory). The sample
proportions of machines requiring major repairs are given in Table 9.2. The data are for
machines sampled in Example 9.1. Estimate p, the proportion of machines involved in major
repairs for all plants combined, and place a bound on the error of estimation.

TABLE 9.2: PROPORTION OF SEWING MACHINES AJOR REPAIRS

Proportion of machines requiring


Plant Mi mi 
major repairs, p
1 50 10 0.40
2 65 13 0.38
3 45 9 0.22
4 48 10 0.30
5 52 10 0.50
6 58 12 0.25
7 42 8 0.38
8 66 13 0.31
9 40 8 0.25
10 56 11 0.36

SOLUTION

n n

 
 1 
Mi p i Mi p i
 n 17.61
i1 i1
p n
 n
  0.34
52.2
M
i1
i
1
n M
i1
i

n 2


 
Mi2  p i  p 
 
i1
 4.29 
2
s r2 
n 1

n    
 p i qi 

 mi
Mi2 1     505.91
i1  Mi  m i  1 
 

M i1
i
522
M   52.2
n 10

4|Page
SAMPLING METHODS/ SAMPLING TECHNIQUES
STA550/STA552

   
n  1  2 n



 s r  1  m  p i qi 
V  p   1   Mi2 1  i  
   N  nM   m i  1 
2 2
 nNM i1  Mi
 
 10   
 42.9   505.91
1 2 1
 1  
 90   10 52.2 
2
1090 52.22
 0.00081

  
p 2 V(p)  0.34  2 0.00081  0.34  0.057

Thus, the estimated proportion of machines involved in major repairs is 0.34 with a bound on
the error of estimation of 0.057.

EXAMPLE 9.6 (PAGE 304)

From the six hospitals in a city, a researcher wants to sample three hospitals for the purpose of
estimating the proportion of current patients who have been (or will be) in the hospital for more
than two consecutive days. Because the hospitals vary in size, they will be sampled with
probabilities proportional to their number of patients. For the three hospitals, 10% of the records
of current patients will be examined to determine how many patients will stay in the hospital for
more than two days. Given the information on hospital sizes in the accompanying table, select a
sample of three hospitals with probabilities proportional to size.

Hospital Number of patients Cumulative range


1 328 1 – 328
2 109 329 – 437
3 432 438 – 869
4 220 870 – 1089
5 280 1090 – 1369
6 190 1370 – 1559
1559

SOLUTION

Because three hospitals are to be selected, three random numbers between 0001 and 1559
must be chosen from the random number table. Our numbers turned out to be 1505, 1256 and
0827. Locating these numbers in the cumulative range column leads to the selection of
hospitals 3, 5, and 6.

5|Page
SAMPLING METHODS/ SAMPLING TECHNIQUES
STA550/STA552

EXAMPLE 9.7 (PAGE 304)

Suppose the sampled hospitals in Example 9.6 yielded the following data on number of patients
staying more than two days:

Number of Number staying more


Hospital
patients sampled than two days
3 43 25
5 28 15
6 19 8

Estimate the proportion of patients staying more than two days, for all sic hospitals , and place a
bound on the error of estimation.

SOLUTION

 1  25 15 8 
 pps       0.51
3  43 28 19 

 
V   pps  
1
 
0.58  0.512  0.54  0.512  0.42  0.512  0.0025
  3(2)

  
 pps  2 V   pps 
 
0.51  2 0.0025
0.51  0.10

6|Page
SAMPLING METHODS/ SAMPLING TECHNIQUES
STA550/STA552

QUESTION 1 (DEC 2019)

The new play toy, Classic Builder Toy (CBT) is being test-marketed. A market research firm
decided to sample four cities from 20 cities and then to sample supermarkets within the cities, in
order to obtain the number of CBT sold.

Number of
City Number of CBT sold Mi y i s i2
supermarkets
1 35 199, 179, 98, 63, 126, 87, 62 4070.15 2974.5
2 10 12, 23 175 60.5
3 20 99, 101, 52, 121 1865 854.9
4 15 87, 43, 59 945 496

Based on the above data, construct a 95% confidence interval for the total number of CBT sold
and interpret the value obtained.

QUESTION 2 (JUN 2019)

A researcher constructed a sampling plan to estimate the water bill per month for 360 houses
from eight residential areas. He decided to sample four residential areas and then sample
houses within selected area. The monthly water bills (RM) are recorded as below.

Number of
Number of
Residential area houses yi s i2
houses
sample
A 24 11 19 3.44
C 32 16 23 2.63
F 46 26 20 2.05
G 48 23 24 1.05

Estimate the total amount of water bill per month for all houses. Hence, obtain a 95%
confidence interval for the water bill and interpret the value obtained.

1|Page
SAMPLING METHODS/ SAMPLING TECHNIQUES
STA550/STA552

QUESTION 3 (DEC 2018)

A nurseryman wants to estimate the total height of seedlings in a large field that is divided into
25 plots that vary slightly in size. He decides to use a two-stage cluster sample and sampled
10% of the trees within each of the three selected plots. The data are given in the table below.

Number of
Number of Heights of Seedlings
Plots Seedlings Mi y i s i2
Seedlings (in inches)
Sampled
1 52 5 12, 11, 11, 10, 13 592.8 1.3
2 60 6 6, 5, 7, 5, 6, 4 330 1.1
3 46 5 7, 8, 6, 7, 6 312.8 0.7

a) Identify the element and observation in this study.

b) Estimate the total height of seedlings in the field and place a bound on the error of
estimation.

QUESTION 4 (JUN 2018)

A study was conducted to investigate the prevalence of smoking among female university
students in a state. A simple random sample without replacement was used to select three
universities from a population of 29 universities. From each selected sample university, simple
random sampling without replacement was used to select samples of secondary units. The
results are as in the following table.

Number of Number of    
female female Number of mi  p i qi 
University Mi2 1  
students in students smokers  Mi  mi  1
university interviewed  
1 447 15 3 2203.90
2 511 20 6 2773.12
3 792 25 10 6074.64

a) Identify the element and observation in this study.

b) Estimate the total height of seedlings in the field and place a bound on the error of
estimation.

2|Page
SAMPLING METHODS/ SAMPLING TECHNIQUES
STA550/STA552

QUESTION 5 (JAN 2018)

A consumer survey was conducted to estimate satisfaction level of households towards the
facilities provided by the developer. The scale used to measure satisfaction is as follows:

1 2 3 4 5 6 7
Neither
Entirely Mostly Somewhat Satisfied Somewhat Mostly Entirely
Satisfied satisfied Satisfied nor Dissatisfied Dissatisfied Dissatisfied
Dissatisfied

A simple random sample of 10 condominium blocks was selected from 120 in the community.
The results of the survey are given below.

Number of
Condominium Number of
Household Satisfaction yi s i2
Block Household
Sampled
1 54 10 5, 7, 6, 5, 4, 7, 6, 6, 4, 5 5.50 1.08
2 48 10 7, 7, 7, 6, 5, 4, 7, 7, 6, 6 6.20 1.03
3 68 14 5, 6, 5, 6, 4, 5, 6, 5, 4, 5, 4, 6, 5, 6 5.14 0.77
4 70 14 6, 5, 7, 6, 7, 6, 5, 7, 5, 7, 6, 5, 7, 6 6.07 0.83
5 52 10 4, 5, 4, 5, 5, 6, 5, 4, 4, 4 4.60 0.70
6 62 12 5, 7, 6, 4, 3, 1, 5, 4, 6, 4, 5, 7 4.75 1.71
7 41 8 7, 6, 7, 7, 6, 6, 5, 7 6.38 0.74
8 53 11 6, 6, 5, 4, 6, 7, 5, 5, 7, 6, 5 5.64 0.92
9 64 12 7, 6, 5, 4, 6, 5, 7, 4, 3, 6, 5, 7 5.42 1.31
10 43 9 7, 6, 6, 5, 7, 3, 5, 4, 5 5.33 1.32

a) Briefly explain why a two stage cluster sampling is used in this study.

b) It is known that there are 6860 households in 120 condominium blocks. Obtain a 95%
confidence interval for the average satisfaction towards the facilities in the population
and interpret your answer.

QUESTION 6 (JUL 2017)

A researcher constructed sampling plan to estimate the monthly usage of electricity for 2564
houses from 20 housing estates of a particular town. He decided to sample three housing
estates from the 20 housing estates and then sample houses within the housing estate
selected. The monthly usage of electricity is based on the electricity bill (nearest RM) for each
house. The results of the sample are listed as follows.

3|Page
SAMPLING METHODS/ SAMPLING TECHNIQUES
STA550/STA552

Housing Number of
Number of
Estate House yi s i2
Houses
Selected Sampled
VII 145 19 181.33 1572.512
XI 130 17 229.86 1124.143
XV 120 16 169.00 1826.017

a) Estimate the standard error of the mean electricity usage for all the houses in that town.
Hence, obtain a 95% confidence interval for the mean electricity usage for all the houses
in that town.

b) Estimate the total amount of electricity usage in that town and place a bound on the error
of estimation. Hence, interpret the two values obtain.

QUESTION 7 (DEC 2016)

BCX Berhad is introducing a new package of internet plan. The marketing manager wishes to
estimate the average number of family favoring the new internet package. Out of 20 cities, 5
cities were selected as the sample.

Number of Average number of


Number of
City family families who favor the s i2
family
sampled new plan
1 50 10 8.2 12
2 65 13 10.0 20
3 45 9 7.6 22
4 48 10 9.0 16
5 52 10 9.4 26

a) Suggest the appropriate sampling method used in this study.

b) Construct a 95% confidence interval for the average number of families who favor the
new internet package.

QUESTION 8 (JUN 2016)

A large firm has its equipment inventories listed separately by department. From 20
departments in the firm, FIVE were randomly sampled by an auditor. The proportion of inventory
items not properly identified is of interest to the auditor. The auditor selects approximately 10%
of the equipment due to time constraint. The data are given in the accompanying table.

4|Page
SAMPLING METHODS/ SAMPLING TECHNIQUES
STA550/STA552

2
Number of equipment Number of items not   
Department Mi2  p i  p 
items properly identified  
1 150 2 38.44
2 270 3 26.63
3 90 1 2.96
4 310 1 342.99
5 160 2 27.88

a) Identify the sampling method used in this study. Justify your answer.

b) Estimate the proportion of inventory items in the firm not properly identified. Hence,
calculate the standard error of estimation.

c) Calculate a 95% confidence interval for the proportion of inventory items in the firm not
properly identified. Interpret your answer.

QUESTION 9 (DEC 2015)

A researcher select 5 out of 15 local health centers as a sample for the purpose in estimating
the total number of patients who are given new medicine as part of their therapeutic regimen.
The number of patients treated in each center is listed in the accompanying table.

Health Number of Frequency patients prescribed


yi s i2
Centre patients the medicine
1 45 5, 6, 4, 11, 12, 0, 1, 8, 4 5.67 16.75
2 52 11, 4, 3, 1, 0, 2, 8, 6, 5, 3 4.30 11.12
3 58 12, 11, 3, 4, 2, 0, 0, 1, 4, 3, 2, 4 3.83 14.88
4 42 3, 7, 6, 7, 8, 4, 3, 2 5.00 5.14
5 40 6, 4, 7, 3, 9, 1, 4, 5 4.88 6.13

a) Suggest the appropriate sampling method used in this study.

b) Construct a 95% confidence interval for the total number of patients who are to be given
the medicine as part of their therapeutic regimen.

5|Page
SAMPLING METHODS/ SAMPLING TECHNIQUES
STA550/STA552

QUESTION 10 (JUN 2015)

A survey is carried out to estimate the everage time secondary school students spend in the
school library within a year. A researcher selected a simple random sample of ten secondary
schools from a total of 90 secondary schools in a particular state. The resulting data are given in
the table below.

Number of Number of
Average time spent in the school
School students per students yi s i2
library (in hours)
school, Mi sampled, mi
1 50 10 5, 7, 9, 0, 11, 2, 8, 4, 3, 5 5.40 11.38
2 65 13 4, 3, 7, 2, 11, 0, 1, 9, 4, 3, 2, 1, 5 4.00 10.67
3 45 9 5, 6, 4, 11, 12, 0, 1, 8, 4 5.67 16.75
4 48 10 6, 4, 0, 1, 0, 9, 8, 4, 6, 10 4.80 13.29
5 52 10 11, 4, 3, 1, 0, 2, 8, 6, 5, 3 4.30 11.12
6 58 12 12, 11, 3, 4, 2, 0, 0, 1, 4, 3, 2, 4 3.83 14.88
7 42 8 3, 7, 6, 7, 8, 4, 3, 2 5.00 5.14
8 66 13 3, 6, 4, 3, 2, 2, 8, 4, 0, 4, 5, 6, 3 3.85 4.31
9 40 8 6, 4, 7, 3, 9, 1, 4, 5 4.88 6.13
10 56 11 6, 7, 5, 10, 11, 2, 1, 4, 0, 5, 4 5.00 11.80

a) Explain why this design may be considered as a two-stage cluster sampling.

b) Estimate and construct a 95% confidence interval for the average time to use the library
per student.

6|Page

You might also like