Professional Documents
Culture Documents
CLUSTER
SAMPLING
A TWO STAGE CLUSTER SAMPLE
N My i i
1 M y i i
̂ i 1
i 1
M n M n
Estimated variance of mean:
n 1 2 2 si2
n
V ˆ 1
1 mi
ˆ
2
sb
2
M i 1
N n M nN M i 1 Mi mi
ESTIMATION OF A POPULATION MEAN
M is known
Estimated variance of mean:
n 1 2 2 si2
n
V ˆ 1
1 mi
ˆ
2
sb
2
M i 1
N n M nN M i 1 Mi mi
where
Sample variance for the
Sample variance sample selected from cluster i
among the terms M i y i
M y y y
n mi
2 2
i i M ˆ ij i
j 1
s
2 i 1
si2 i 1,2,....., n
b
n 1 mi 1
Notice that s b2 is simply the sample variance
among the terms M i y i
ESTIMATION OF A POPULATION TOTAL
M is known
Estimator of the population total:
n
N
Tˆ Mˆ M i y i
n i 1
Vˆ Tˆ M 2Vˆ ˆ
2 N 2 si2
2 n
n N mi
1
N n
sb M i 1
n i 1 Mi mi
EXAMPLE
9.1 & 9.2
RATIO ESTIMATION OF A POPULATION MEAN
M is unknown
Ratio estimator of the population mean:
n
M y i i
̂ r i 1
n
M
i 1
i
n 1 2 2 si2
n
V ˆ r 1
1 mi
ˆ
2
sr
2
M i 1
N n M nN M i 1 Mi mi
RATIO ESTIMATION OF A POPULATION MEAN
M is unknown
where:
n n
M y ˆ r M y M i ˆ r
2 2 2
i i i i
sr2 i 1
i 1
n 1 n 1
and
y yi
mi
2
ij
j 1
si2 i 1,2,....., n
mi 1
M pˆ i i
pˆ i 1
n
M
i 1
i
M pˆ i pˆ M i pi M i p
n n
i
2 2
ˆ ˆ 2
sr2 i 1
i 1
n 1 n 1
and
qˆ i 1 pˆ i
EXAMPLE
9.4
TWO-STAGE CLUSTER SAMPLING WITH
PROBABILITIES PROPORTIONAL TO SIZE (PPS)
n
1
ˆ
V pps
ˆ yi pps
ˆ 2
nn 1 i 1
PPS SAMPLING
ESTIMATION OF A POPULATION TOTAL
Vˆ Tˆpps
M2 n
nn 1 i 1
yi ˆ pps 2
PPS SAMPLING
ESTIMATION OF A POPULATION PROPORTION
n
1
ˆ
V p pps
ˆ pi p pps
ˆ ˆ 2
nn 1 i 1
EXAMPLE 9.6 (pg 304)
Eg:
• To estimate the proportion of current patients
who have been (or will be) in the hospital for
more than two consecutive days)
• The hospitals vary in size, so they will be
sampled with pps to their numbers of patients.
• For the three selected hospitals, 10% of the
records of current patients will be examined.
• Given the information on hospital sizes, select
a sample of three with pps.
EXAMPLE 8.12 (pg 275 – 276): Solution
Number of employees and cumulative range
Hospital No of Cumulative range Number staying more
than two days
patients
1 328 1 – 328
2 109 329 – 437
3 432 438 – 869 25
4 220 870 – 1089
5 280 1090 – 1369 15
6 190 1370 – 1559 8
Total 1559
V pˆ pps
ˆ 1
32
0.58 0.51 ... 0.42 0.51
2 2
0.0025
2 0.0025 0.10
DISCUSSION ON SELECTION OF THE
CLUSTER ACCORDING TO VARIANCE
VARIANCE
DESCRIPTIONS
CONDITIONS
sb si cluster.
m1 m2 ...... mn m
Estimator of population mean:
1 n 1 n m
̂ y i y ij
n i 1 nm i 1 j 1
SAMPLING EQUAL -SIZE
TWO STAGE CLUSTERS
Estimated variance of ̂ :
1 f1 2 1 f2 2
V ˆ
ˆ sb f1 sw
n nm
where
n
sb
2 1
y i y CL
2
Variance between - cluster
n 1 i 1
n m
1
s ij i
2 2
y y
n m 1 i 1 j 1
w
n
Variance within - cluster
1
si2
n i 1
SAMPLING EQUAL -SIZE
TWO STAGE CLUSTERS
where....(contd.)
n
f1 Cluster sampling fraction
N
m Within – cluster sampling
f2
M fraction
EXAMPLE
A new bottling machine is being tested by a
company. During a test run,the machine
fills 24 cases, each containing 12 bottles.
The company wishes to estimate the
average number of ounces of fill per bottle.
A two-stage cluster sample is employed
using six cases (clusters),with four bottles
(elements) randomly selected from each. The
results are given in the accompanying table.
Estimate the average number of ounces per
bottle and place a bound on the error of
estimation.
EXAMPLE
Average
Sample
Case ounces of fills
variance
for sample
1 7.9 0.15
2 8.0 0.12
3 7.8 0.09
4 7.9 0.11
5 8.1 0.10
6 7.9 0.12
SOLUTION:
SAMPLING METHODS/ SAMPLING TECHNIQUES
STA550/STA552
A garment manufacturer has 90 plants located throughout the United States and wants to
estimate the average number of hours that the sewing machines were down for repairs in the
past months. Because the plants are widely scattered, she decides to use cluster sampling,
specifying each plant as a cluster of machines. Each plant contains many machines, and
checking the repair record for each machine would be time-consuming. Therefore, she uses
two-stage sampling. Enough time and money are available to sample n=10 plants and
approximately 20% of the machines in each plant. Using the data in Table 9.1, estimate the
average downtime per machine and place a bound on the error of estimation. The manufacture
knows she has a combined total of 4500 machines in all plants.
SOLUTION
n n
n 2
Mi y i M
s b2 i1
27.722
n 1
m s 2
n
i1
Mi2 1 i i 21,985
Mi mi
1|Page
SAMPLING METHODS/ SAMPLING TECHNIQUES
STA550/STA552
n 1 2 s i2
n
s b 1 m
V( y ) 1 Mi2 1 i
N nM 2 m i
2
nNM i1 Mi
10
27.72 21,985
1 2 1
1
90 10 50
2
10 90 50 2
0.0371
2 V() 4.80 2 0.0371 4.80 0.39
Thus, the average downtime is estimated to be 4.80 hours. The error of estimation should be
less than 0.39 hour with a probability of approximately .95.
Estimate the total amount of downtime during the past month for all machines owned by the
manufactured in Example 9.1. Place a bound on the error of estimation.
SOLUTION
M y 90240.02 21,602
N
M i i
n
i1
V M2 V 4500 2 0.0371
2 V( ) 21,605.31 2 4500 2 0.0371 21,602 1733
Using the data in Table 9.1, estimate the average downtime per machine and place a bound on
the error of estimation. Assume the manufacture does not know how many machines there are
in all plants combined.
2|Page
SAMPLING METHODS/ SAMPLING TECHNIQUES
STA550/STA552
SOLUTION
n n
M y
i1
i i
1
n M y
i1
i i
240.02
r n
n
4.60
52.2
M i1
i
1
n M
i1
i
n 2
Mi2 y i r
i1
35.1
2
s r2
n 1
s i2
n
m 21,985
Mi2 1 i
i1 Mi mi
M i1
i
522
M 52.2
n 10
n 1 2 s i2
n
s r 1 m
V r 1 Mi2 1 i
N nM m i
2 2
nNM i1 Mi
10
35.1 21,985
1 2 1
1
90 10 52.2
2
10 9052.22
0.0492
r 2 V( r ) 4.60 2 0.0492 4.60 0.44
Thus, the estimated mean downtime per machine is 4.60 hours with a bound on the error of
estimation of 0.44 hour.
3|Page
SAMPLING METHODS/ SAMPLING TECHNIQUES
STA550/STA552
The manufacture in Example 9.1 wants to estimate the proportion of machines that have been
shut down for major repairs (those requiring parts from stock outside the factory). The sample
proportions of machines requiring major repairs are given in Table 9.2. The data are for
machines sampled in Example 9.1. Estimate p, the proportion of machines involved in major
repairs for all plants combined, and place a bound on the error of estimation.
SOLUTION
n n
1
Mi p i Mi p i
n 17.61
i1 i1
p n
n
0.34
52.2
M
i1
i
1
n M
i1
i
n 2
Mi2 p i p
i1
4.29
2
s r2
n 1
n
p i qi
mi
Mi2 1 505.91
i1 Mi m i 1
M i1
i
522
M 52.2
n 10
4|Page
SAMPLING METHODS/ SAMPLING TECHNIQUES
STA550/STA552
n 1 2 n
s r 1 m p i qi
V p 1 Mi2 1 i
N nM m i 1
2 2
nNM i1 Mi
10
42.9 505.91
1 2 1
1
90 10 52.2
2
1090 52.22
0.00081
p 2 V(p) 0.34 2 0.00081 0.34 0.057
Thus, the estimated proportion of machines involved in major repairs is 0.34 with a bound on
the error of estimation of 0.057.
From the six hospitals in a city, a researcher wants to sample three hospitals for the purpose of
estimating the proportion of current patients who have been (or will be) in the hospital for more
than two consecutive days. Because the hospitals vary in size, they will be sampled with
probabilities proportional to their number of patients. For the three hospitals, 10% of the records
of current patients will be examined to determine how many patients will stay in the hospital for
more than two days. Given the information on hospital sizes in the accompanying table, select a
sample of three hospitals with probabilities proportional to size.
SOLUTION
Because three hospitals are to be selected, three random numbers between 0001 and 1559
must be chosen from the random number table. Our numbers turned out to be 1505, 1256 and
0827. Locating these numbers in the cumulative range column leads to the selection of
hospitals 3, 5, and 6.
5|Page
SAMPLING METHODS/ SAMPLING TECHNIQUES
STA550/STA552
Suppose the sampled hospitals in Example 9.6 yielded the following data on number of patients
staying more than two days:
Estimate the proportion of patients staying more than two days, for all sic hospitals , and place a
bound on the error of estimation.
SOLUTION
1 25 15 8
pps 0.51
3 43 28 19
V pps
1
0.58 0.512 0.54 0.512 0.42 0.512 0.0025
3(2)
pps 2 V pps
0.51 2 0.0025
0.51 0.10
6|Page
SAMPLING METHODS/ SAMPLING TECHNIQUES
STA550/STA552
The new play toy, Classic Builder Toy (CBT) is being test-marketed. A market research firm
decided to sample four cities from 20 cities and then to sample supermarkets within the cities, in
order to obtain the number of CBT sold.
Number of
City Number of CBT sold Mi y i s i2
supermarkets
1 35 199, 179, 98, 63, 126, 87, 62 4070.15 2974.5
2 10 12, 23 175 60.5
3 20 99, 101, 52, 121 1865 854.9
4 15 87, 43, 59 945 496
Based on the above data, construct a 95% confidence interval for the total number of CBT sold
and interpret the value obtained.
A researcher constructed a sampling plan to estimate the water bill per month for 360 houses
from eight residential areas. He decided to sample four residential areas and then sample
houses within selected area. The monthly water bills (RM) are recorded as below.
Number of
Number of
Residential area houses yi s i2
houses
sample
A 24 11 19 3.44
C 32 16 23 2.63
F 46 26 20 2.05
G 48 23 24 1.05
Estimate the total amount of water bill per month for all houses. Hence, obtain a 95%
confidence interval for the water bill and interpret the value obtained.
1|Page
SAMPLING METHODS/ SAMPLING TECHNIQUES
STA550/STA552
A nurseryman wants to estimate the total height of seedlings in a large field that is divided into
25 plots that vary slightly in size. He decides to use a two-stage cluster sample and sampled
10% of the trees within each of the three selected plots. The data are given in the table below.
Number of
Number of Heights of Seedlings
Plots Seedlings Mi y i s i2
Seedlings (in inches)
Sampled
1 52 5 12, 11, 11, 10, 13 592.8 1.3
2 60 6 6, 5, 7, 5, 6, 4 330 1.1
3 46 5 7, 8, 6, 7, 6 312.8 0.7
b) Estimate the total height of seedlings in the field and place a bound on the error of
estimation.
A study was conducted to investigate the prevalence of smoking among female university
students in a state. A simple random sample without replacement was used to select three
universities from a population of 29 universities. From each selected sample university, simple
random sampling without replacement was used to select samples of secondary units. The
results are as in the following table.
Number of Number of
female female Number of mi p i qi
University Mi2 1
students in students smokers Mi mi 1
university interviewed
1 447 15 3 2203.90
2 511 20 6 2773.12
3 792 25 10 6074.64
b) Estimate the total height of seedlings in the field and place a bound on the error of
estimation.
2|Page
SAMPLING METHODS/ SAMPLING TECHNIQUES
STA550/STA552
A consumer survey was conducted to estimate satisfaction level of households towards the
facilities provided by the developer. The scale used to measure satisfaction is as follows:
1 2 3 4 5 6 7
Neither
Entirely Mostly Somewhat Satisfied Somewhat Mostly Entirely
Satisfied satisfied Satisfied nor Dissatisfied Dissatisfied Dissatisfied
Dissatisfied
A simple random sample of 10 condominium blocks was selected from 120 in the community.
The results of the survey are given below.
Number of
Condominium Number of
Household Satisfaction yi s i2
Block Household
Sampled
1 54 10 5, 7, 6, 5, 4, 7, 6, 6, 4, 5 5.50 1.08
2 48 10 7, 7, 7, 6, 5, 4, 7, 7, 6, 6 6.20 1.03
3 68 14 5, 6, 5, 6, 4, 5, 6, 5, 4, 5, 4, 6, 5, 6 5.14 0.77
4 70 14 6, 5, 7, 6, 7, 6, 5, 7, 5, 7, 6, 5, 7, 6 6.07 0.83
5 52 10 4, 5, 4, 5, 5, 6, 5, 4, 4, 4 4.60 0.70
6 62 12 5, 7, 6, 4, 3, 1, 5, 4, 6, 4, 5, 7 4.75 1.71
7 41 8 7, 6, 7, 7, 6, 6, 5, 7 6.38 0.74
8 53 11 6, 6, 5, 4, 6, 7, 5, 5, 7, 6, 5 5.64 0.92
9 64 12 7, 6, 5, 4, 6, 5, 7, 4, 3, 6, 5, 7 5.42 1.31
10 43 9 7, 6, 6, 5, 7, 3, 5, 4, 5 5.33 1.32
a) Briefly explain why a two stage cluster sampling is used in this study.
b) It is known that there are 6860 households in 120 condominium blocks. Obtain a 95%
confidence interval for the average satisfaction towards the facilities in the population
and interpret your answer.
A researcher constructed sampling plan to estimate the monthly usage of electricity for 2564
houses from 20 housing estates of a particular town. He decided to sample three housing
estates from the 20 housing estates and then sample houses within the housing estate
selected. The monthly usage of electricity is based on the electricity bill (nearest RM) for each
house. The results of the sample are listed as follows.
3|Page
SAMPLING METHODS/ SAMPLING TECHNIQUES
STA550/STA552
Housing Number of
Number of
Estate House yi s i2
Houses
Selected Sampled
VII 145 19 181.33 1572.512
XI 130 17 229.86 1124.143
XV 120 16 169.00 1826.017
a) Estimate the standard error of the mean electricity usage for all the houses in that town.
Hence, obtain a 95% confidence interval for the mean electricity usage for all the houses
in that town.
b) Estimate the total amount of electricity usage in that town and place a bound on the error
of estimation. Hence, interpret the two values obtain.
BCX Berhad is introducing a new package of internet plan. The marketing manager wishes to
estimate the average number of family favoring the new internet package. Out of 20 cities, 5
cities were selected as the sample.
b) Construct a 95% confidence interval for the average number of families who favor the
new internet package.
A large firm has its equipment inventories listed separately by department. From 20
departments in the firm, FIVE were randomly sampled by an auditor. The proportion of inventory
items not properly identified is of interest to the auditor. The auditor selects approximately 10%
of the equipment due to time constraint. The data are given in the accompanying table.
4|Page
SAMPLING METHODS/ SAMPLING TECHNIQUES
STA550/STA552
2
Number of equipment Number of items not
Department Mi2 p i p
items properly identified
1 150 2 38.44
2 270 3 26.63
3 90 1 2.96
4 310 1 342.99
5 160 2 27.88
a) Identify the sampling method used in this study. Justify your answer.
b) Estimate the proportion of inventory items in the firm not properly identified. Hence,
calculate the standard error of estimation.
c) Calculate a 95% confidence interval for the proportion of inventory items in the firm not
properly identified. Interpret your answer.
A researcher select 5 out of 15 local health centers as a sample for the purpose in estimating
the total number of patients who are given new medicine as part of their therapeutic regimen.
The number of patients treated in each center is listed in the accompanying table.
b) Construct a 95% confidence interval for the total number of patients who are to be given
the medicine as part of their therapeutic regimen.
5|Page
SAMPLING METHODS/ SAMPLING TECHNIQUES
STA550/STA552
A survey is carried out to estimate the everage time secondary school students spend in the
school library within a year. A researcher selected a simple random sample of ten secondary
schools from a total of 90 secondary schools in a particular state. The resulting data are given in
the table below.
Number of Number of
Average time spent in the school
School students per students yi s i2
library (in hours)
school, Mi sampled, mi
1 50 10 5, 7, 9, 0, 11, 2, 8, 4, 3, 5 5.40 11.38
2 65 13 4, 3, 7, 2, 11, 0, 1, 9, 4, 3, 2, 1, 5 4.00 10.67
3 45 9 5, 6, 4, 11, 12, 0, 1, 8, 4 5.67 16.75
4 48 10 6, 4, 0, 1, 0, 9, 8, 4, 6, 10 4.80 13.29
5 52 10 11, 4, 3, 1, 0, 2, 8, 6, 5, 3 4.30 11.12
6 58 12 12, 11, 3, 4, 2, 0, 0, 1, 4, 3, 2, 4 3.83 14.88
7 42 8 3, 7, 6, 7, 8, 4, 3, 2 5.00 5.14
8 66 13 3, 6, 4, 3, 2, 2, 8, 4, 0, 4, 5, 6, 3 3.85 4.31
9 40 8 6, 4, 7, 3, 9, 1, 4, 5 4.88 6.13
10 56 11 6, 7, 5, 10, 11, 2, 1, 4, 0, 5, 4 5.00 11.80
b) Estimate and construct a 95% confidence interval for the average time to use the library
per student.
6|Page