You are on page 1of 12

Statistics For Management II 2010 E.

CHAPTER ONE
1. SAMPLING THEORY AND SAMPLING DISTRIBUTIONS
1.1. Sampling Theory
Definition of sampling
 Sampling is a technique that is used to select a sample out of a population.
 It is a process of gathering information from part of the population.
Common terminologies of sampling
a) Population (universe): A collection of items or individuals chosen for a study.
b) Sample: A subset of a population. It is some representative group of the study population.
c) Census: A process of gathering information from all elements of a population.
d) Survey: A process of gathering information from sample.
The significance of sampling (Advantages of sampling)
a) To save time, money and effort.
b) The destructive nature of some tests. If all food products of a company are tested for quality
---
c) It is impossible to checking all items in the population. The population of fish, birds – they
are very large and frequently moving.
d) The adequacy of sample results. Sample results may directly give complete information for the
population.
Limitations of sampling
 It is not possible to get information about each individual in the population
 Sampling gives rise to certain errors
 Checking the omission is impossible in sampling survey
Since relative advantages of sample survey are decidedly more than the limitations, census survey
is seldom undertaken and the sample surveys are extremely common.
Types of sampling
 Two types: random sampling and non-random sampling.
1) Random sampling (representative or probability sampling)
 It is a sample selected in such a way that every element of the population has a known
chance (non- zero) of being included in the sample.
 All samples of size n from a given population N, has equal probability of being chosen.
 Example: 15- lower, 8- middle and 5 top level managers: when we select one of the 28 managers
randomly- it is called random sampling if not non –random sampling.
 There are four kinds of random samples:
a) Simple random samples
 It is a sample selected in such a way that:
Every element in the population has the same chance of being chosen
Every sample of size n has the same chance of being chosen
 Example: Abebe, Chaltu, Daniel and Eshetu are an office staff population for x company. All want
the same vacation period, but only two can be away at the same time. Consequently, chips letter
A, C, D and E (for the name of each staff respectively) are shaken in a container and the office
manager (blind folded) drawn out two chips. Then the possible samples of size 2 from the given
population are: AC AD AE CD CE DE
p(A)=3/6=1/2=p(D)=p(E)=p(E). This fact indicates that each element has equal chance
of being selected.

1
Statistics For Management II 2010 E.C

b) Systematic random samples


 Every nth element of a population should be included in the sample.
 First – a sampling interval K is calculated, using N/n.
 Second – a number between 1 and K (the sampling interval) is chosen at random.
 Third – the other elements of the sample then becomes the value which we obtain by
adding the sampling interval value in each selected item.
Example: Suppose we have to select a sample of 50 out of 500 units
Thus K for the example = 500/50 = 10
Let us assume the selected item from 1 to 10 for the above example is happening to be 8.
The sequence of the sample members becomes 8th, 18th, 28th, 38th, -----, 488th, 498th.
c) Stratified random sampling
 The population is first divided into certain strata (divided into mutually exclusive strata or
sub groups)
 And then a simple random sample is selected within each stratum of the population.
For example, in XYZ Company there are 10,000 workers,
First divide the workers in terms of: income, age, sex or any others called strata or sub groups
Then, select some workers (randomly) from each strata or sub groups.
Together they will make up a random sample of 100 with proper representation. This method of
sampling is called stratified sampling.
d) Cluster random sampling
 Divide the total population based on some homogeneous characteristics called clusters.
 A random sample of clusters is selected.
 All items in the chosen clusters are included in the sample.
 This is very cost effective because the interviewer won’t have as much travel time between
interviews.
 The drawback is that items in a cluster tend to be more similar than items in different clusters.
For example, in XYZ Company there are 10,000 workers,
Divide the workers in terms of similar character: age, sex, income or any others called clusters.
Then, select some of the characteristics (randomly), it may be income (all workers in the income
cluster are included)
2) Non-Random Sampling (also called Non-Representative or Non- Probability Sampling)
 Each item or person in the population being studied has unknown likelihood of being
included in the sample (does not have the same chance included in the sample).
a) Judgmental Sampling
 The choice of sampling items depends extensively on the judgment of the investigator.
 It is also very practical when the investigator is highly skilled.
 A bias can be introduced if the investigator has a pre-conceived set of beliefs.
b) Convenience Sampling
 Select cases based on their availability for the study.
 For example, you could pick up 251 people to be surveyed simply from the telephone
directory; you may simply stand in the corner of a street and sample the first 251 people
who pass by.
c) Quota Sampling
 The selection strategy for this sampling is to select a sample that yields the same
proportions as the population proportions on easily identified variables.
 Example, if the investigator is sent out to interview 528 persons in a given area with regard to their
television watching habits with a pre- established criterion that for each 251 persons interviewed,
126 should be housewives, 65 should be males and the reaming should be children, then the
investigator is free to choose the sample items within this quota and conditions.

2
Statistics For Management II 2010 E.C

1.2. SAMPLING DISTRIBUTION


Probability distribution
 The normal probability distribution is used to determine probabilities for the normally
distributed individual measurements, given the mean and the standard deviation.
Population distribution
 It is the distribution of measured values of its members and have mean denoted by μ and
variance δ 2and standard deviationσ .
 The population standard deviation describes the variation among values of members of the
population; where as the standard deviation of sampling distribution measures the variability
among values of the statistics (sample) such as mean values, proportion values due to
sampling errors.
Sample distribution:
 It is the distribution of measured values of sample in random samples drawn from a given
population. Each sample mean would vary from sample to sample.
 This variability serves as the basis for random sampling distribution.
Sampling distribution
 In contrast to the distributions of individual measurements, a sampling distribution is a
probability distribution for the possible values of a sample statistic.
 A sampling distribution is a probability distribution for the possible values of a sample
statistic, such as a sample mean.
1.2.1. Sampling distribution of the mean
 It shows how much the sample mean is far from a known population mean.
 It is the probability distribution of all possible values of a given statistic (sample) from all
distinct possible sample of equal size drawn from a population or a process.
 The sampling distribution of the mean values has its own arithmetic mean denoted by μ x
(read as mu sub x bar) and standard deviation δ x (sigma sub x bar).
 The sampling distribution of the mean is the probability distributions of the means, X of
all simple random samples of a given sample size n that can be drawn from the population.
NB:
The sampling distribution of the mean is not the sample distribution, which is the
distribution of the measured values of X in one random sample. Rather, the sampling
distribution of the mean is the probability distribution for X , the sample mean.
For any given sample size n taken from a population with mean µ and standard deviation δ,
the value of the sample mean X would vary from sample to sample if several random
samples were obtained from the population. This variability serves as the basis for
sampling distribution.
The sampling distribution of the mean is described by two parameters: the expected value (
X ) = X , or mean of the sampling distribution of the mean, and the standard deviation of
δ
the mean x , the standard error of the mean.

3
Statistics For Management II 2010 E.C

PROPERTIES OF THE SAMPLING DISTRIBUTION OF MEANS


1) The arithmetic mean μ x of sampling distribution of mean values is equal to the population mean
μ regardless of the form of population distribution .i.e. μ x= μ
2) The sampling distribution has a standard deviation (also called standard error) equal to the
σ
population standard deviation divided by the square root of the sample size i.e. δ x = . This hold
√n
true if and only of n<0.05N and N is very large. If N is finite and n≥0.05N
δx =
δ
√n √

N −n
N −1 .

The expression √ N −n
N −1 is called finite population correction factor/finite population multiplier. In
the calculation of the standard error of the mean, if the population standard deviation δ is unknown,
δ
the standard error of the mean x , can be estimated by using the sample standard error of the mean

SX which is calculated as follows:


SX =
√n
S
or S X =
S
√n

√ N −n
N −1 .
3) A sample size n≥ 30 is generally said to be considered to be a large sample for statistical
analysis where as a sample of size n¿ 30 is considered to be a small sample. The sampling
distribution of means is approximately normal for sufficiently large sample sizes (n≥ 30).
4) When standard deviation of population σ is not known, the standard deviation of the sample s
s
which closely approximates σ value is used to compute standard error, i.e.δ x = .
√n
Example 1: A population consists of the following ages: 10, 20, 30, 40, and 50. A random sample
of three is to be selected from this population and mean computed. Develop the sampling
distribution of the mean.
Solution: The number of simple random samples of size n that can be drawn without replacement
N!
from a population of size N is N C n ( ). With N= 5 and n = 3, 5C3 = 10 samples can be
n !(N −n)!
drawn from the population as:
Sampled items
Sample means ( X )
10, 20, 30 20.00
10, 20, 40, 23.33
10, 20, 50 26.67
10, 30, 40 26.67
10, 30, 50 30.00
10, 40, 50 33.33
20, 30, 40 30.00
20, 30, 50 33.33
20, 40, 50 36.67
30, 40, 50 40.00
300.00
A systematic organization of the above figures gives the following:
Sample mean ( X )
Frequency
Prob. (relative freq.) of X
20.00 1 0.1
23.33 1 0.1
26.67 2 0.2
30.00 2 0.2

4
Statistics For Management II 2010 E.C

33.33 2 0.2
36.67 1 0.1
40.00 1 0.1
TOTAL 10.00 1.00
Columns 1 and 2 show frequency distribution of sample means.
Columns 1 and 3 show sampling distribution of the mean.
∑ X = ∑ x =30 ,
μ=
N n Regardless of the sample size . μ=X
x (Observation) x−μ (x−μ)2
10 -20 400
20 -10 100
30 0 0
40 10 100
50 20 400
∑ (x−μ)2 1,000

σ X=
δ
√n

√ N−n 14 . 142
=
N−1 √ 3

5−3
5−1
=5 .774

√ ∑ ( X i− X )
√ √
2
∑ ( X i −X )

2
1000 333 . 4
σ= = =14 . 142 = = =5 . 774
N 5 N 10
δ
Since averaging reduces variability x < δ except the cases where δ = 0 and n = 1.
Central Limit Theorem and the Sampling Distribution of the Mean
The Central Limit Theorem (CLT) states that:
1) If the population is normally distributed, the distribution of sample means is normal
regardless of the sample size.
2) If the population from which samples are taken is not normal, the distribution of sample
means will be approximately normal if the sample size (n) is sufficiently large (n ≥ 30).
The larger the sample size is used, the closer the sampling distribution is to the normal
curve.

The relationship between the shape of the population distribution and the shape of the sampling
distribution of the mean is called the Central Limit Theorem.
The significance of the Central Limit Theorem is that it permits us to use sample statistics to
make inference about population parameters without knowing anything about the shape of the
frequency distribution of that population other than what we can get from the sample. It also
permits us to use the normal distribution curve for analyzing distributions whose shape is
unknown. It creates the potential for applying the normal distribution to many problems when the
sample is sufficiently large. As mentioned earlier the above properties must exist, given this
value of sample mean X is first converted in to a value Z on the standard normal distribution to
know how any single value deviates from X of sample mean values ( μ x), by using the formula;
X−μ
X−μ x
Z= = δ because μ x= μ
δx
√n
5
Statistics For Management II 2010 E.C

If the population is finite and samples of fixed size n are drawn without replacement, then the
standard error of sampling distribution of mea can be modified to adjust the continued change in the
size of population μ due to the several draws of samples of size n is as follows:
Example 1. The mean length of a certain tool is 41.5 hours with a standard deviation of 2.5 hours.
What is the probability that a simple random sample of size 50 drawn from this population will
have a mean between 40.5 hours and 42 hours?
μ=41.5 δ =2.5 n=50
P (40.5≤ X ≤42.0) =?
δ 2.5 2.5
μ x= μ δ x = = = = 0.3536
√ n √50 7.0711
The population distribution is unknown, but sample size n=50 is large enough to apply the central
limit theorem. Hence the normal distribution can be used to find the required probability.
X 1−μ X −μ
P (40.5≤ X ≤420) = P ( ≤Z≤ 2 )
δx δx
40.5−41.5 42−41.5
=P( ≤ Z≤ )
0.3536 0.3536
= P (−2.8281 ≤ Z ≤ 1.4140)
=P ( Z ≥−2.8281) + P ( Z ≤ 1.4140)
=0.4977+0.4207=0.9184
Thus 0.9184 is the probability of the tool having mean life between the required hours.
δ=300

0.497 0.420

x=40.5 μ=41.5 x=40.5


Example 2. A continuous manufacturing process produces items whose weights are normally
distributed with a mean weight of 800gms and a standard deviation of 300gms. A random sample of
16 items is to be selected from the process.
a) What is the probability that the arithmetic mean of the sample exceeds 900gms? Interpret the
result.
b) Find the values of the sample arithmetic mean within which the middle 95% of all sample
means will fall.
Solution:
a) ¿ μ=800gms , n=16¿ δ =300gms
P ( x ≥ 900 ) =?
δ 300 300
δx = = = = 75
√ n √16 4 0.091
8

μ X =800 X =900

6
Statistics For Management II 2010 E.C

X−μ x 900−8 00
P ( x ≥ 900 ) =P (Z≥ = ¿
δx 75
=P (Z≥ 1.33¿
=0.5000-0.4082
=0.0918
b) Since Z=1.96 for the middle 95% area under the normal curve, therefore using the formula for z
to solve for the values of x in terms of the known values are as follows.
x 1= μ X -Zδ x x = μ +Zδ x
0.95 2 X
=800-1.96(75) =800+1.96(75)
=653gms =300 =947gms

1.2.2. Sampling distribution of sample proportions


 It shows how much the sample proportion is far from a known population proportion.
 The sample proportion P having the characteristic of interest (success or failure, accept or
reject, head or tail) is the best use for statistical inferences about the population parameter
number of success , X
P. the sample proportion can be defined as: P =
sample ¿ n
 With same logic of sampling distribution of mean, the sampling distribution of sample
proportions with mean μ P and standard deviation also called standard error) δ P is given by:


√ √
μ P = P and δ P = pq = p(1−P)
n n
If a large sample size (n≥ 30) satisfying following two conditions.
a) np≥5 b) nq≥ 5
 Then the sampling distribution of proportions is very closely normally distributed. It may
be noted that the sampling distribution of the proportion would actually follow binomial
distribution because population is binomially distributed.
 For finite population in which sampling is done without replacement we have;


√ √
μ P = P and δ P = pq = N −n
n N −1
Under the same guidelines as mentioned in the previous sections, for a large sample size n ≥
30, the sampling distribution of proportion is closely approximated by a normal distribution
with a mean and standard deviation as stated above.

7
Statistics For Management II 2010 E.C

P−μ P
 Hence, to standardize sample proportion P , the standard normal variable. Z= =
δP
P−P


Example 3.
pq
n
Few years back, a policy was introduced to give loans to unemployed engineers to
start their own business. Out of 1,000,000 engineers, 600,000 accepted the policy and got the loan.
A sample of 100 unemployed engineers is taken at the same time of allotment of loans. What is the
probability that sample portion would have exceeded 50% acceptance?
Solution:
μ P = P=0.60 , n=100, N=1,000,000 and P ( P ≥ 0.5) =?


δ P = pq
n
√ N −n ¿ ¿=¿ )( √1,000,000−100 ¿ ¿)δ
N −1
P−μP
1,000,000−1
0.50−0.60
P =0.0489

P ( P ≥ 0.5) =P 0.479
(Z≥ ) =P (Z≥ ) =0.4793+0.5000=0.9793
δP 0.0489
3 0.500
0

5 P=0.60

Example 4. A population proportion is 0.40. A simple random sample of size 200 will be taken
and the sample proportion will be used to estimate the population proportion, what is the
probability that the sample proportion will be within ± 0.03 of the population proportion.
Given:
μ P = P=0.40 and n=200


P−P
δ P = ( 0.4 ) (0.6) (-0.03≤ P≤ 0.03 ) (Z≥
200 0.30 0.30
=0.0346 = P = 2P ) = 2P (Z
δP
78
≤ 0.87 ¿=2 x 0.3078=0.6156 78

P=0.40

Example 5. A manufacturer of watches has determined from past experience that 3% of the
watches he produces are defective. If a random sample of 300 watches is examined, what is the
probability that the proportion of defective is between 0.02 and 0.035?
μ P = P=0.03 P2=0.035
P1=0.02 n=300

8
Statistics For Management II 2010 E.C


δ P = ( 0.03 ) (0.97) =0.0098
300
P−P P−P
P (-0.03≤ P≤ 0.03 ) = P ( ≤ Z≤ )
δP δP
0.02−0.03 0.035−0.03
=P( ≤Z≤ )
0.0098 0.0098
= P (-1.02≤ Z ≤ 0.51) =P (Z≥−1.02 ) + P (Z≤ 0.51) =0.3461+0.1950= 0.5411
Hence the probability that the proportion of defective will lie between 0.02 and 0.035 is
0.5411 0.346 0.195
1 0

=0.02 P=0.03 =0.035

1.2.3. SAMPLING DISTRIBUTION OF THE DIFFERENCE BETWEEN TWO MEANS


The concept of sampling distribution of sample mean introduced earlier can also be used to
compare a population of size N 1 having mean μ1and standard deviation δ 1 with another
similar type of population of size N 2 having mean μ2and standard deviation δ 2.
Let X 1 ∧X 2be the mean of sampling distribution of the mean of two populations,
respectively. Then the difference between their mean values μ1and μ2can be estimated by
generalizing the formula of standard normal variable as follows;
( X 1− X 2 )−( μ X −μ X ) ( X 1− X 2 )−(μ1−μ2 )
Z= =
1 2

δ (X −X )
1 2
δ (X − X )
1 2

Where: μ X −μ X = μ1−μ 2 (mean of sampling distribution of sample mean)


1 2

δ ¿¿= √ δ X 2 + δ X 2 =

means)
1 2
√ δ 12 δ 2 2
+
n1 n2
(standard error of sampling distribution of difference of two

n1 and n2 are independent random samples drawn from first and second population ,
respectively.
Example 6. Car stereos of manufacturer A have a mean lifetime of 1,400 hours with a standard
deviation of 200 hours, while those of manufacturer B have a mean life time of 1,200 hours with a
standard deviation of 100 hours. If a random sample of 125 stereos of each manufacturer are tested,
what is the probability that manufacturer A’s stereos will have a mean life time which is at least;
A. 160 hours more than manufacturer B’s stereos?
B. 250 hours more than manufacturer B’s stereos?
Solution:
Manufacturer A μ1=1,400 hours
δ 1= 200 hours n1 =125

9
Statistics For Management II 2010 E.C

Manufacturer B μ1=1,200 hours


δ 1= 200 hours n1 =125
a)

√ √
2 2
δ ( X −X )= δ 1 + δ 2 = (200) + (100) = √ 80+320=√ 400 =20
2 2

n1 n2 125 125
1 2

P ( X 1 −X 2 ≥160) = P ( Z ≥ ¿ ¿)
160−200
=P ( Z ≥ )
20
=P ( Z ≥ −2)
=0.5000+0.4772
=0.9772 (area under normal curve)

0.977
2

X 1 −X 2=160 μ X −X =200
1 2

Hence, the probability is very high that the life time of the stereos of A is 160 hours more than
that of b.
b) Proceeding in the same manner as in part a) as follows:
( X 1−X 2 )(μ 1−μ2 ) 250−200
P ( X 1 −X 2 ≥250) = P (Z ≥ =P ( Z ≥ )
δ ( X −X )
1 2
20
=P ( Z ≥ −2.5 )
=0.5000 - 0.4938
=0.0062 (area under normal curve)
0.006
2

Example 7. The strength of a wire produced by company has a mean of 4,500kg and a δ 1of
200 kg. Company B has a mean of 4,000 kg and a δ 2of 300 kg. if 50 wires of company A and 100
wires of company B are selected at random and tested for strength, what is the probability that the
sample mean strength of a will be at least 600gk more than that of B?
Given:
μ1= 4,500 δ 1=200

10
Statistics For Management II 2010 E.C

n1 =5 δ 2=300
μ2= 4,000 n2 =100

√ √
2 22 2
δ ( X −X )= δ 1 + δ 2 = (200) + (300) = =41.23
n1 n2 50 100
1 2

P ( X 1 −X 2 ≥600) = P ( Z ≥ ¿ ¿)
600−500
=P ( Z ≥ )
41.23
=P ( Z ≥ 2.43)
=0.4925
=0.5000 - 0.4925=0.0075 (area under normal curve)

0.0075

1.2.4. SAMPLING DISTRIBUTION OF THE DIFFERENCE OF TWO PROPORTIONS


Suppose two populations of size N 1and N 2are given. For each sample of size n1 from the first
population, compute sample proportion P1and standard deviationδ P . Similarly for each sample size
1

of n2 from the second population, compute sample proportion P2 and standard deviationδ P . 2

For all combinations of these samples from these populations, we can obtain a sampling distribution
of the difference P1−P2 of sample proportion. Such a distribution is called sampling distribution of
the difference of two proportions. The mean and standard deviation of this distribution are given
by;
μ P −μ P = P1−P2
1 2

P1 q1 P2 q 2
δ ¿¿= √ δ P 2 + δ P 2 =
1
n1
+
n2
2

If sample size n1 ∧n1 are large i.e. n1 ≥30, then the sampling distribution of difference of
proportions is closely approximated by a normal distribution.
Example 8. 10% of the machines produced by company a are defective and 5% of these
produced by company B are defective. A random sample of 250 machines is taken from company A
and a random sample of 300 machines is taken from company B. what is the probability that the
difference in sample proportion is less than or equal to0.02?
μ P −μ P = P1−P2= 0.10−0.05=0.05n1 =250
1 2
n2 =300

11
Statistics For Management II 2010 E.C

The standard error of the difference in a sample proportions is given by

δ ( P −P ) = √ δ P 2 + δ P 2 =
1 2 1 2
√ P1 q1 P2 q 2
n1
+ δ
n 2 (P − P )
1
= √ 0.0052 = 0.0228
2

The desired probability of the difference in sample proportion is given by


( P1−P2 )−(P1−P2 ) 0.02−0.05
P ¿0.02) =P ( Z ≥ =P ( Z ≥ ) =P ( Z ≥ −1.3
δ ( P −P )
1 2
0.0228
=0.5000 0.4066=0.0934 (area under normal curve) Hence the desired probability for the difference
in sample proportions is 0.0934
0.093
4

P1−P2 =0.02 μ(P − P )=0.05


1 2

12

You might also like