Professional Documents
Culture Documents
CHAPTER ONE
1. SAMPLING THEORY AND SAMPLING DISTRIBUTIONS
1.1. Sampling Theory
Definition of sampling
Sampling is a technique that is used to select a sample out of a population.
It is a process of gathering information from part of the population.
Common terminologies of sampling
a) Population (universe): A collection of items or individuals chosen for a study.
b) Sample: A subset of a population. It is some representative group of the study population.
c) Census: A process of gathering information from all elements of a population.
d) Survey: A process of gathering information from sample.
The significance of sampling (Advantages of sampling)
a) To save time, money and effort.
b) The destructive nature of some tests. If all food products of a company are tested for quality
---
c) It is impossible to checking all items in the population. The population of fish, birds – they
are very large and frequently moving.
d) The adequacy of sample results. Sample results may directly give complete information for the
population.
Limitations of sampling
It is not possible to get information about each individual in the population
Sampling gives rise to certain errors
Checking the omission is impossible in sampling survey
Since relative advantages of sample survey are decidedly more than the limitations, census survey
is seldom undertaken and the sample surveys are extremely common.
Types of sampling
Two types: random sampling and non-random sampling.
1) Random sampling (representative or probability sampling)
It is a sample selected in such a way that every element of the population has a known
chance (non- zero) of being included in the sample.
All samples of size n from a given population N, has equal probability of being chosen.
Example: 15- lower, 8- middle and 5 top level managers: when we select one of the 28 managers
randomly- it is called random sampling if not non –random sampling.
There are four kinds of random samples:
a) Simple random samples
It is a sample selected in such a way that:
Every element in the population has the same chance of being chosen
Every sample of size n has the same chance of being chosen
Example: Abebe, Chaltu, Daniel and Eshetu are an office staff population for x company. All want
the same vacation period, but only two can be away at the same time. Consequently, chips letter
A, C, D and E (for the name of each staff respectively) are shaken in a container and the office
manager (blind folded) drawn out two chips. Then the possible samples of size 2 from the given
population are: AC AD AE CD CE DE
p(A)=3/6=1/2=p(D)=p(E)=p(E). This fact indicates that each element has equal chance
of being selected.
1
Statistics For Management II 2010 E.C
2
Statistics For Management II 2010 E.C
3
Statistics For Management II 2010 E.C
The expression √ N −n
N −1 is called finite population correction factor/finite population multiplier. In
the calculation of the standard error of the mean, if the population standard deviation δ is unknown,
δ
the standard error of the mean x , can be estimated by using the sample standard error of the mean
4
Statistics For Management II 2010 E.C
33.33 2 0.2
36.67 1 0.1
40.00 1 0.1
TOTAL 10.00 1.00
Columns 1 and 2 show frequency distribution of sample means.
Columns 1 and 3 show sampling distribution of the mean.
∑ X = ∑ x =30 ,
μ=
N n Regardless of the sample size . μ=X
x (Observation) x−μ (x−μ)2
10 -20 400
20 -10 100
30 0 0
40 10 100
50 20 400
∑ (x−μ)2 1,000
σ X=
δ
√n
∗
√ N−n 14 . 142
=
N−1 √ 3
∗
5−3
5−1
=5 .774
√
√ ∑ ( X i− X )
√ √
2
∑ ( X i −X )
√
2
1000 333 . 4
σ= = =14 . 142 = = =5 . 774
N 5 N 10
δ
Since averaging reduces variability x < δ except the cases where δ = 0 and n = 1.
Central Limit Theorem and the Sampling Distribution of the Mean
The Central Limit Theorem (CLT) states that:
1) If the population is normally distributed, the distribution of sample means is normal
regardless of the sample size.
2) If the population from which samples are taken is not normal, the distribution of sample
means will be approximately normal if the sample size (n) is sufficiently large (n ≥ 30).
The larger the sample size is used, the closer the sampling distribution is to the normal
curve.
The relationship between the shape of the population distribution and the shape of the sampling
distribution of the mean is called the Central Limit Theorem.
The significance of the Central Limit Theorem is that it permits us to use sample statistics to
make inference about population parameters without knowing anything about the shape of the
frequency distribution of that population other than what we can get from the sample. It also
permits us to use the normal distribution curve for analyzing distributions whose shape is
unknown. It creates the potential for applying the normal distribution to many problems when the
sample is sufficiently large. As mentioned earlier the above properties must exist, given this
value of sample mean X is first converted in to a value Z on the standard normal distribution to
know how any single value deviates from X of sample mean values ( μ x), by using the formula;
X−μ
X−μ x
Z= = δ because μ x= μ
δx
√n
5
Statistics For Management II 2010 E.C
If the population is finite and samples of fixed size n are drawn without replacement, then the
standard error of sampling distribution of mea can be modified to adjust the continued change in the
size of population μ due to the several draws of samples of size n is as follows:
Example 1. The mean length of a certain tool is 41.5 hours with a standard deviation of 2.5 hours.
What is the probability that a simple random sample of size 50 drawn from this population will
have a mean between 40.5 hours and 42 hours?
μ=41.5 δ =2.5 n=50
P (40.5≤ X ≤42.0) =?
δ 2.5 2.5
μ x= μ δ x = = = = 0.3536
√ n √50 7.0711
The population distribution is unknown, but sample size n=50 is large enough to apply the central
limit theorem. Hence the normal distribution can be used to find the required probability.
X 1−μ X −μ
P (40.5≤ X ≤420) = P ( ≤Z≤ 2 )
δx δx
40.5−41.5 42−41.5
=P( ≤ Z≤ )
0.3536 0.3536
= P (−2.8281 ≤ Z ≤ 1.4140)
=P ( Z ≥−2.8281) + P ( Z ≤ 1.4140)
=0.4977+0.4207=0.9184
Thus 0.9184 is the probability of the tool having mean life between the required hours.
δ=300
0.497 0.420
μ X =800 X =900
6
Statistics For Management II 2010 E.C
X−μ x 900−8 00
P ( x ≥ 900 ) =P (Z≥ = ¿
δx 75
=P (Z≥ 1.33¿
=0.5000-0.4082
=0.0918
b) Since Z=1.96 for the middle 95% area under the normal curve, therefore using the formula for z
to solve for the values of x in terms of the known values are as follows.
x 1= μ X -Zδ x x = μ +Zδ x
0.95 2 X
=800-1.96(75) =800+1.96(75)
=653gms =300 =947gms
√ √
μ P = P and δ P = pq = p(1−P)
n n
If a large sample size (n≥ 30) satisfying following two conditions.
a) np≥5 b) nq≥ 5
Then the sampling distribution of proportions is very closely normally distributed. It may
be noted that the sampling distribution of the proportion would actually follow binomial
distribution because population is binomially distributed.
For finite population in which sampling is done without replacement we have;
√ √
μ P = P and δ P = pq = N −n
n N −1
Under the same guidelines as mentioned in the previous sections, for a large sample size n ≥
30, the sampling distribution of proportion is closely approximated by a normal distribution
with a mean and standard deviation as stated above.
7
Statistics For Management II 2010 E.C
P−μ P
Hence, to standardize sample proportion P , the standard normal variable. Z= =
δP
P−P
√
Example 3.
pq
n
Few years back, a policy was introduced to give loans to unemployed engineers to
start their own business. Out of 1,000,000 engineers, 600,000 accepted the policy and got the loan.
A sample of 100 unemployed engineers is taken at the same time of allotment of loans. What is the
probability that sample portion would have exceeded 50% acceptance?
Solution:
μ P = P=0.60 , n=100, N=1,000,000 and P ( P ≥ 0.5) =?
√
δ P = pq
n
√ N −n ¿ ¿=¿ )( √1,000,000−100 ¿ ¿)δ
N −1
P−μP
1,000,000−1
0.50−0.60
P =0.0489
P ( P ≥ 0.5) =P 0.479
(Z≥ ) =P (Z≥ ) =0.4793+0.5000=0.9793
δP 0.0489
3 0.500
0
5 P=0.60
Example 4. A population proportion is 0.40. A simple random sample of size 200 will be taken
and the sample proportion will be used to estimate the population proportion, what is the
probability that the sample proportion will be within ± 0.03 of the population proportion.
Given:
μ P = P=0.40 and n=200
√
P−P
δ P = ( 0.4 ) (0.6) (-0.03≤ P≤ 0.03 ) (Z≥
200 0.30 0.30
=0.0346 = P = 2P ) = 2P (Z
δP
78
≤ 0.87 ¿=2 x 0.3078=0.6156 78
P=0.40
Example 5. A manufacturer of watches has determined from past experience that 3% of the
watches he produces are defective. If a random sample of 300 watches is examined, what is the
probability that the proportion of defective is between 0.02 and 0.035?
μ P = P=0.03 P2=0.035
P1=0.02 n=300
8
Statistics For Management II 2010 E.C
√
δ P = ( 0.03 ) (0.97) =0.0098
300
P−P P−P
P (-0.03≤ P≤ 0.03 ) = P ( ≤ Z≤ )
δP δP
0.02−0.03 0.035−0.03
=P( ≤Z≤ )
0.0098 0.0098
= P (-1.02≤ Z ≤ 0.51) =P (Z≥−1.02 ) + P (Z≤ 0.51) =0.3461+0.1950= 0.5411
Hence the probability that the proportion of defective will lie between 0.02 and 0.035 is
0.5411 0.346 0.195
1 0
δ (X −X )
1 2
δ (X − X )
1 2
δ ¿¿= √ δ X 2 + δ X 2 =
means)
1 2
√ δ 12 δ 2 2
+
n1 n2
(standard error of sampling distribution of difference of two
n1 and n2 are independent random samples drawn from first and second population ,
respectively.
Example 6. Car stereos of manufacturer A have a mean lifetime of 1,400 hours with a standard
deviation of 200 hours, while those of manufacturer B have a mean life time of 1,200 hours with a
standard deviation of 100 hours. If a random sample of 125 stereos of each manufacturer are tested,
what is the probability that manufacturer A’s stereos will have a mean life time which is at least;
A. 160 hours more than manufacturer B’s stereos?
B. 250 hours more than manufacturer B’s stereos?
Solution:
Manufacturer A μ1=1,400 hours
δ 1= 200 hours n1 =125
9
Statistics For Management II 2010 E.C
√ √
2 2
δ ( X −X )= δ 1 + δ 2 = (200) + (100) = √ 80+320=√ 400 =20
2 2
n1 n2 125 125
1 2
P ( X 1 −X 2 ≥160) = P ( Z ≥ ¿ ¿)
160−200
=P ( Z ≥ )
20
=P ( Z ≥ −2)
=0.5000+0.4772
=0.9772 (area under normal curve)
0.977
2
X 1 −X 2=160 μ X −X =200
1 2
Hence, the probability is very high that the life time of the stereos of A is 160 hours more than
that of b.
b) Proceeding in the same manner as in part a) as follows:
( X 1−X 2 )(μ 1−μ2 ) 250−200
P ( X 1 −X 2 ≥250) = P (Z ≥ =P ( Z ≥ )
δ ( X −X )
1 2
20
=P ( Z ≥ −2.5 )
=0.5000 - 0.4938
=0.0062 (area under normal curve)
0.006
2
Example 7. The strength of a wire produced by company has a mean of 4,500kg and a δ 1of
200 kg. Company B has a mean of 4,000 kg and a δ 2of 300 kg. if 50 wires of company A and 100
wires of company B are selected at random and tested for strength, what is the probability that the
sample mean strength of a will be at least 600gk more than that of B?
Given:
μ1= 4,500 δ 1=200
10
Statistics For Management II 2010 E.C
n1 =5 δ 2=300
μ2= 4,000 n2 =100
√ √
2 22 2
δ ( X −X )= δ 1 + δ 2 = (200) + (300) = =41.23
n1 n2 50 100
1 2
P ( X 1 −X 2 ≥600) = P ( Z ≥ ¿ ¿)
600−500
=P ( Z ≥ )
41.23
=P ( Z ≥ 2.43)
=0.4925
=0.5000 - 0.4925=0.0075 (area under normal curve)
0.0075
of n2 from the second population, compute sample proportion P2 and standard deviationδ P . 2
For all combinations of these samples from these populations, we can obtain a sampling distribution
of the difference P1−P2 of sample proportion. Such a distribution is called sampling distribution of
the difference of two proportions. The mean and standard deviation of this distribution are given
by;
μ P −μ P = P1−P2
1 2
P1 q1 P2 q 2
δ ¿¿= √ δ P 2 + δ P 2 =
1
n1
+
n2
2
√
If sample size n1 ∧n1 are large i.e. n1 ≥30, then the sampling distribution of difference of
proportions is closely approximated by a normal distribution.
Example 8. 10% of the machines produced by company a are defective and 5% of these
produced by company B are defective. A random sample of 250 machines is taken from company A
and a random sample of 300 machines is taken from company B. what is the probability that the
difference in sample proportion is less than or equal to0.02?
μ P −μ P = P1−P2= 0.10−0.05=0.05n1 =250
1 2
n2 =300
11
Statistics For Management II 2010 E.C
δ ( P −P ) = √ δ P 2 + δ P 2 =
1 2 1 2
√ P1 q1 P2 q 2
n1
+ δ
n 2 (P − P )
1
= √ 0.0052 = 0.0228
2
12