You are on page 1of 23

CHAPTER TWO

SAMPLING AND SAMPLING DISTRIBUTION

SAMPLING THEORY
Sampling is simply the process of learning about the population on the basis of a sample
drawn from it. Thus in sampling technique instead of every unit of the population only
part of the population is studied and the conclusions are drawn on that basis for the entire
population. The process of sampling involves three elements: selecting the sample,
collecting the information and making an inference about the population.

BASIC CONCEPTS OF SAMPLING THEORY


Population: In Statistics the term population is used to mean the totality of cases (items)
under consideration in a given investigation or research. In other words, the largest
collection of observations on a variable constitutes the population.

Census: The process of gathering data from every element in the population.

Sample: Is part of the population of interest. Any non-empty subset of a population is


called a sample. There are different possible samples that can be selected from a single
population. Nevertheless, the one that best reflects or represents the behavior of the
population is considered to be the most appropriate one.

Sampling: The method of selecting a sample from a population.

Statistic: It is a measurable characteristic of the sample. In short it is a sample result.

Parameter: It is a measurable characteristic of the population or it is a numerical result


obtained as measuring the population.

Sampling Frame: - The list of all possible units in the reference population.

Sample Size: - The number of elements/observations in a specific sample.

Sampling Error: - The difference between sample statistic and population parameter.

1
Sampling Unit: - Elements of the population to be sampled or the unit of selection in the
sampling process.

Sample design: Is the set of procedures for selecting the sample elements from the
population.

REASONS FOR SAMPLING


The following are the major reasons for sampling technique:

Cost/Economy
Unit cost of collecting data in the case of census is significantly less than in the case of
sampling. However, due to due to the larger number of items in the population, the total
cost involves in the case of census is significantly higher than in the case of sampling.
Suppose it takes Birr 200 per unit to make a census of 10,000,000 individuals but the unit
cost of sampling 5000 individuals is Birr 1000. Thus, the total cost is: 10,000,000 x 200 =
2,000,000,000 but that of sample is 5,000 x 1000 = 50,000,000

Timeliness
Due to the larger size of population total time involves in the case of census is
significantly higher than that of sampling (i.e., the sample may provide us with necessary
information quickly).

Large Population Size


Sometimes, many populations about which inferences must be made are quite large
implying that it is impossible to cover all the items in the population. Thus, the solution is
to take sample from such a population.

Inaccessibility of the Entire Population


In some cases the entire population may not be accessible due to diseases, death, conflict,
mental abnormality, prisoners, etc. In that case sampling is necessary.

2
Destructive Nature of Many Tests
Due to destructive nature of many tests, the resources are completed to collect
information only from part of the population. For example: blood test for a patient, life
hours of a tube light, strength of wires, etc.

Accuracy
Non-sampling error in the case of census is higher than the non-sampling error
committed in the case of a sample survey ( as less qualified investigator are involve in the
case of census and the supervision, monitoring and quality control mechanism in the case
of census may be poor). The higher the degree of non-sampling error, the less reliable
your result may be.

SAMPLING METHODS
There are two principal methods of drawing a sample from a population: Probability
sampling and Non-probability sampling.

1) Probability Sampling

In the case of probability sampling each observation in the population has an equal
chance of being selected to become part of the sample. There is no human judgment in
the case of probability sampling.

There are four basic types of probability sampling techniques.

i. Simple Random Sampling


ii. Stratified Sampling
iii. Systematic Sampling
iv. Cluster Sampling
i. Simple Random Sample
Simple Random Sampling is a method of probability sampling in which every unit in the
population has an equal nonzero chance of being selected (or part of the sample). In other
words, each element of the population has an equal and independent chance of being
included into the sample. The probability is given by n/N.

3
There are two methods to select a simple random sample:
Lottery method- In this method, each population item is numbered 1 to N on slips of
identical cards (size, shape and color). Then place numbered cards in a bowl, mix them
thoroughly, and select as many cards as needed in a blind fold selection. The subjects
whose numbers are selected constitute the sample. Since it is difficult to mix the cards
thoroughly, there is a chance of obtaining a biased sample. Thus we need other method of
selecting sample elements.

Random Number method- due to the problem of lottery method, statisticians use another
method known as the random number method where numbers are generated using
computers.

How to use random number table method

a) Assign a unique number to each population element in the sampling frame. Start
with serial number 1, or 01, or 001, etc. depending on the number of digits
required.
b) Choose a random starting position by closing your eyes (blind fold selection) and
placing your finger on a number in the table.
c) Select serial numbers across rows or down columns or diagonally from the
starting point.
d) Discard numbers that are not assigned to any population element and ignore
numbers that have already been selected.
e) Repeat the selection process until the required number of sample elements is
selected.
Advantage of simple random sampling
 It ensures that the sample is unbiased.
Disadvantages simple random sampling
 It requires a Sampling Frame, and this is sometimes impossible (the case of fish
population).
 If the population is very large, it is tedious and time consuming to number and
select the sample.

4
 Minority subgroups of the population may not be represented in the sample.
ii. Stratified Sampling
In stratified sampling, a population is first divided into subgroups, called strata (singular stratum),
and a sample is selected from each stratum based on simple random or systematic sampling
method. The strata are made according to various homogeneous characteristics such as sex, race,
region or institutional affiliation such as faculty. Stratified sampling is applied if the population is
heterogeneous.
Stratified random sampling method is a three-step process:
 Step 1- Divide the population into homogeneous, mutually exclusive and collectively
exhaustive groups or strata using some stratification variable (e.g. income level, sex,
education level, etc.);
 Step 2- Select an independent simple random sample from each stratum (using simple
random sample);
 Step 3- Form the final sample by consolidating all sample elements chosen in step 2.

Stratified samples can be:


Proportionate: involving the selection of sample elements from each stratum, such that the ratio
of sample elements and total number of population elements (n/N) is constant/equal for all strata.
Disproportionate: the sample is disproportionate when the above mentioned ratio is unequal.

Example: To select a proportionate stratified sample of 20 households from Addis Ababa that
belong to three income groups: low (50), middle (30) and high (20) (N=50+30+20=100).

 Sub-divide the club members into three homogeneous sub-groups or strata by the
income groups: low, middle and high.
 Calculate the overall sampling fraction, f, in the following manner: f=n/N=20/100=0.2
Where n = sample size and N = population size: n1=0.2*50=10, n2=0.2*30=6 and n3=0.2*2=4.
Thus, n=n1+n2+n3=10+6+4=20

Advantage of Stratified Sampling:


 The representation of the sample is improved
Disadvantages Stratified Sampling
 If there are many variables of interest, dividing a large population in to representative
subgroups requires a great deal of effort,

5
 If variables are somewhat complex or ambiguous (such as beliefs, attitudes, etc), it is
difficult to separate individuals in to the sub-groups according to these variables.
iii. Systematic Sampling
In systematic sampling only one random number is needed throughout the entire
sampling process. Elements of the population will be arranged in some order and the
elements to be included in the sample will be selected at a constant interval.
To use systematic sampling, a researcher needs:
a. A sampling frame of the population;
b. a skip interval (K) calculated as follows:

population list ¿(N ¿) N


Skip interval (K) = ¿∨ =K
sam ple¿ n ¿ n
The first element (number), which is between 1 and K, is determined using simple
random sampling and then the next items are selected using the skip interval. For
th th th
instance, the j unit is selected at first and then ( j+k ) ,( j+2 K ) … etc until the
required sample size is obtained.

Example: Suppose there are 2000 subjects in the population and a sample size of 50
subjects are needed. The sampling interval (k) is 40 (2000/50). Select the starting point,
say ‘x’, from 1 through 40 using simple random sampling, and then include every 40 th
element starting from ‘x’.

Advantages of Systematic Sampling:


 Less time consuming and easier to perform than SRS,
 It is more convenient to use as compared to SRS, and
 It provides a good approximation to SRS.

Disadvantages of Systematic Sampling:


If there is any sort of cyclic ordering of the subjects, the samples will not be
representative of the population. Example: If subjects in the population are arranged in a
manner such as:
 Defective item
 Non-defective item

6
 Defective item
 Non-defective item etc,

iv. Cluster Sampling


Cluster sampling can be used if the population is homogeneous and very large in size. It is a type
of sampling in which the population is divided into non-overlapping heterogeneous groups called
clusters or groups and clusters/groups of elements are sampled as the sampling units using simple
random sampling technique in the first phase (if it is the two-phase cluster sampling). In other
words, cluster sampling is a type of sampling which involves dividing the population into groups
(or clusters). Then, one or more clusters are chosen at random and individual within the chosen
cluster is sampled.
A two-step-process:
 Step 1- Defined population is divided into number of mutually exclusive and
collectively exhaustive heterogonous groups or clusters;
 Step 2- Select an independent simple random sample of clusters using sample
random sampling.
Advantages of Cluster Sampling:
 A list of all individual study units in the reference population is not required,
 Reduces cost, and
 Simplifies field work and it is convenient.
Disadvantages:
 The members of the clusters are often more homogeneous than the members of the
whole population and therefore, it may not be representative.
 The elements in a cluster may not have the same variation in characteristics as
elements selected individually from the population.

2) Non-Probability Sampling
In the case of non-probability sampling, not every unit in the population has a chance of
being included in the sample. It involves at least some degree of personal subjectivity
instead of following predetermined, probabilistic rules for selection.

There are three basic types of non-probability sampling techniques.


i. Convenience sampling
ii. Judgmental sampling

7
iii. Quota sampling
i. Convenience Sampling
Convenience sampling implies sample drawn at the convenience of the researcher. It is common
in exploratory research. Does not lead to any conclusion

ii. Judgmental sampling


Sampling based on some judgment, gut-feelings or experience of the researcher. It is common in
commercial marketing research projects. If inference drawing is not necessary, these samples are
quite useful.

iii. Quota sampling


In this method, the decision maker requires the sample to contain a certain number of items with
a given characteristic. It is something like judgmental sampling.

Example 2.5: Suppose we know that 54% of the adults in a community are females, and the study
requires 100 respondents as a sample. In quota sampling, we might interview the first 54 females and
the first 46 males.

DETERMINANT FACTORS OF THE SAMPLE SIZE


Size of sample means the number of sampling units selected from the population for
investigation. If the size of sample is small it may not represent the population and the
inference drawn about the population may be misleading. On the other hand, if the size of
sample is very large, it may be too burdensome financially, requires a lot of time and may
have a serious problem of managing it. Hence the sample size should be neither too small
nor too large. Rather it should be optimal.

The following factors should be considered while deciding the sample size:

8
i. The size of the population: the larger the size of the population, the bigger should be
the sample size.
ii. The resource available: if the resources available are vast a large sample size could be
taken. However, in most cases resources constitute a big constraint on sample size.
iii. The degree of accuracy or precision desired: the greater the degree of accuracy
desired, the larger should be the sample size. However, it does not necessarily mean
that bigger samples always ensure greater accuracy.
iv. Homogeneity or heterogeneity of the population: If the population consists of
homogeneous units a small sample may serve the purpose, but if the population
consists of heterogeneous units a large sample may be inevitable.
v. Nature of study: For an intensive and continuous study a small sample may be
suitable. But for studies which are not likely to be repeated and are quite extensive in
nature, it may be necessary to take a large sample size.
vi. Method of sampling adopted: The size of sample is also influenced by the type of
sampling plan adopted. For example if the sample is a simple random sample it may
necessitate a bigger sample size, However, in a properly drawn stratified sampling
plan, even a small sample may give better results.
vii. Nature of respondent: Where it is expected a large number of respondents will not co-
operate and send back the questionnaire, a large sample should be selected.

SAMPLING DISTRIBUTION
NOTE: The normal probability distribution is used to determine probabilities for the
normally distributed individual measurements, given the mean and the standard
deviation. Symbolically, the variable is the measurement X, with the population mean µ
and population standard deviation δ. In contrast to such distributions of individual
measurements, a sampling distribution is a probability distribution for the possible values
of a sample statistic.

Population distribution: Is the distribution of measured values of its members and have
mean denoted byμ and variance δ 2and standard deviationσ . The population standard
deviation describes the variation among values of members of the population; where as the

9
standard deviation of sampling distribution measures the variability among values of the
statistics (sample) such as mean values, proportion values due to sampling errors.

Sample distribution: Is the distribution of measured values of sample in random samples


drawn from a given population. Each sample mean would vary from sample to sample.
This variability serves as the basis for random sampling distribution. A sampling
distribution is a probability distribution for the possible values of a sample statistic, such
as a sample mean.

SAMPLING DISTRIBUTION OF THE MEAN


Sampling distribution of the mean: Is the probability distribution of all possible values of
a given statistic (sample) from all distinct possible sample of equal size drawn from a
population or a process. The sampling distribution of the mean values has its own
arithmetic mean denoted by μ x́ (read as mu sub x bar) and standard deviation δ x́ (sigma sub
x bar).The sampling distribution of the mean is the probability distributions of the means,
X of all simple random samples of a given sample size n that can be drawn from the
population.

NB: The sampling distribution of the mean is not the sample distribution, which is the
distribution of the measured values of X in one random sample. Rather, the sampling
distribution of the mean is the probability distribution for X , the sample mean.

For any given sample size n taken from a population with mean µ and standard deviation
δ, the value of the sample mean X would vary from sample to sample if several
random samples were obtained from the population. This variability serves as the basis
for sampling distribution.

The sampling distribution of the mean is described by two parameters: the expected value

( X ) = X , or mean of the sampling distribution of the mean, and the standard

deviation of the mean


δ x , the standard error of the mean.

PROPERTIES OF THE SAMPLING DISTRIBUTION OF MEANS


1. The arithmetic mean μ x́ of sampling distribution of mean values is equal to the
population meanμ regardless of the form of population distribution .i.e. μ x́=μ
2. The sampling distribution has a standard deviation (also called standard error) equal to
the population standard deviation divided by the square root of the sample size i.e. δ x́ =
σ
. This hold true if and only of n<0.05N and N is very large. If N is finite and
√n

10
δ N −n
n˃0.05N
δx = ∗

√ n N−1 .
N −n
The expression √ N −1 is called finite population correction factor/finite population
multiplier. In the calculation of the standard error of the mean, if the population

standard deviation δ is unknown, the standard error of the mean


δ x , can be estimated

by using the sample standard error of the mean


S X which is calculated as follows:

S S N−n
δX =
√n
or δ X = ∗

√ n N−1 .
3. A sample size n≥30 is generally said to be considered to be a large sample for statistical
analysis where as a sample of size n¿ 30 is considered to be a small sample. The
sampling distribution of means is approximately normal for sufficiently large sample
sizes (n≥ 30).
4. When standard deviation of population σ is not known, the standard deviation of the
sample s which closely approximates σ value is used to compute standard error, i.e. δ x́ =
s
.
√n
Example 1. A population consists of the following ages: 10, 20, 30, 40, and 50.
A random sample of three is to be selected from this population and mean
computed. Develop the sampling distribution of the mean.

Solution: The number of simple random samples of size n that can be drawn without
N!
replacement from a population of size N is N C n( ).With N= 5 and n = 3, 5C3 =
n !( N −n)!
10 samples can be drawn from the population as:

Sampled items Sample means ( X )


10, 20, 30 20.00
10, 20, 40, 23.33
10, 20, 50 26.67
10, 30, 40 26.67
10, 30, 50 30.00
10, 40, 50 33.33
20, 30, 40 30.00
20, 30, 50 33.33
20, 40, 50 36.67
30, 40, 50 40.00

11
300.00

A systematic organization of the above figures gives the following:

Sample mean ( X ) Frequency Prob. (relative freq.) of X

20.00 1 0.1
23.33 1 0.1
26.67 2 0.2
30.00 2 0.2
33.33 2 0.2
36.67 1 0.1
40.00 1 0.1
TOTAL 10.00 1.00
Columns 1 and 2 show frequency distribution of sample means.
Columns 1 and 3 show sampling distribution of the mean.

μ=
∑ X = ∑ x =30 ,
N n Regardless of the sample size μ=X .

x (Observation) x−μ ( x−μ)2


10 -20 400
20 -10 100
30 0 0
40 10 100
50 20 400
∑ ( x−μ)2 1,000

∑ ( X i− X )2
σ=
√ N
=
√ 1000
5
=14 . 142

δ N−n 14 .142 5−3


σ X=
√n

√ =
N−1 √3

5−1 √
=5.774

∑ ( X i −X )2
=
√ N
=

Since averaging reduces variability
333 .4
10
=5. 774
δ x < δ except the cases where δ = 0 and n =
1.

12
Central Limit Theorem and the Sampling Distribution of the Mean
The Central Limit Theorem (CLT) states that:

1. If the population is normally distributed, the distribution of sample means is


normal regardless of the sample size.
2. If the population from which samples are taken is not normal, the distribution of
sample means will be approximately normal if the sample size (n) is sufficiently
large (n ≥ 30). The larger the sample size is used, the closer the sampling
distribution is to the normal curve.

The relationship between the shape of the population distribution and the shape of the
sampling distribution of the mean is called the Central Limit Theorem.

The significance of the Central Limit Theorem is that it permits us to use sample statistics
to make inference about population parameters without knowing anything about the
shape of the frequency distribution of that population other than what we can get from the
sample. It also permits us to use the normal distribution curve for analyzing distributions
whose shape is unknown. It creates the potential for applying the normal distribution to
many problems when the sample is sufficiently large. As mentioned earlier the above
properties must exist, given this value of sample mean X́ is first converted in to a value Z
on the standard normal distribution to know how any single value deviates from X́ of
sample mean values ( μ x́), by using the formula;

X́−μ
X́−μ x́
Z= = δ because μ x́=μ
δ x́
√n
If the population is finite and samples of fixed size n are drawn without replacement, then
the standard error of sampling distribution of mean can be modified to adjust the continued
change in the size of population μ due to the several draws of samples of size n is as
follows:

Example 2: The mean length of a certain tool is 41.5 hours with a standard deviation of 2.5
hours. What is the probability that a simple random sample of size 50 drawn from this
population will have a mean between 40.5 hours and 42 hours?

μ=41.5 δ=2.5 n=50

P (40.5≤ X́ ≤42.0) =?

δ 2.5 2.5
μ x́= μ δ x́ = = = = 0.3536
√ n √50 7.0711

13
The population distribution is unknown, but sample size n=50 is large enough to apply the
central limit theorem. Hence the normal distribution can be used to find the required
probability.
X́ 1−μ X́ −μ
P (40.5≤ X́ ≤42) = P ( ≤Z≤ 2 )
δ x́ δ x́
40.5−41.5 42−41.5
=P( ≤ Z≤ )
0.3536 0.3536
= P (−2.8281 ≤ Z ≤ 1.4140)
=P (Z ≥−2.8281) + P (Z ≤ 1.4140)
=0.4977+0.4207=0.9184
Thus 0.9184 is the probability of the tool having mean life between the required hours.
δ =300
0.4977
0.4207

x́=40.5 μ=41.5 x́=40.5

Example 2. A continuous manufacturing process produces items whose weights


are normally distributed with a mean weight of 800gms and a standard
deviation of 300gms. A random sample of 16 items is to be selected from
the process.
A. What is the probability that the arithmetic mean of the sample exceeds 900gms? Interpret
the result.
B. Find the values of the sample arithmetic mean within which the middle 95% of all sample
means will fall.

Solution:

A. P (x́ ≥ 900) =?
μ X́ =μ=800gms δ=300gms
n=16
P (x́ ≥ 900) =?
δ 300 300
δ x́ = = = = 75
√ n √16 4

14
0.0918

μ X́ =800 X́ =900

X́−μ x́ 900−8 00
P (x́ ≥ 900) =P (Z≥ = ¿
δ x́ 75
=P (Z≥ 1.33¿
=0.5000-0.4082
=0.0918

B. Since Z=1.96 for the middle 95% area under the normal curve, therefore using the formula
for z to solve for the values of x́ in terms of the known values are as follows.
x́ 1= μ X́ -Zδ x́ x́ 2= μ X́ +Zδ x́
=800-1.96(75) =800+1.96(75)
=653gms =947gms
0.95
=300

SAMPLING DISTRIBUTION OF SAMPLE PROPORTIONS


The sample proportion Ṕ having the characteristic of interest (success or failure, accept or
reject, head or tail) is the best use for statistical inferences about the population parameter
P. the sample proportion can be defined as:

number of success , X
Ṕ=
sample ¿ n

With same logic of sampling distribution of mean, the sampling distribution of sample
proportions with mean μ Ṕ and standard deviation also called standard error) δ Ṕ is given by:

15
μ Ṕ = P and δ Ṕ = pq = p(1−P)
√ √
n n

If a large ample size (n≥30) satisfying following two conditions,

A. np≥5
B. nq≥5
Then the sampling distribution of proportions is very closely normally distributed. It may
be noted that the sampling distribution of the proportion would actually follow binomial
distribution because population is binomially distributed.
For finite population in which sampling is done without replacement we have;
μ Ṕ = P and δ Ṕ = pq * N −n
√ √
n N −1
Under the same guidelines as mentioned in the previous sections, for a large sample size n≥
30, the sampling distribution of proportion is closely approximated by a normal distribution
with a mean and standard deviation as stated above. Hence, to standardize sample
proportion Ṕ, the standard normal variable,
Ṕ−P
Ṕ−μ Ṕ
Z= = pq
δ Ṕ

Example 3.
√ n
Few years back, a policy was introduced to give loans to
unemployed engineers to start their own business. Out of 1,000,000
engineers, 600,000 accepted the policy and got the loan. A sample of 100
unemployed engineers is taken at the same time of allotment of loans. What
is the probability that sample portion would have exceeded 50%
acceptance?
Solution:

μ Ṕ = P=0.60 N=1,000,000
n=100 P ( Ṕ ≥0. 5) =?

δ Ṕ = pq √ N−n ¿ ¿=¿ )( √ 1,000,000−100 ¿ ¿)


√ n N−1
δ Ṕ =0.0489
1,000,000−1

Ṕ−μ Ṕ 0.50−0.60
P ( Ṕ ≥0. 5) =P (Z≥ ) =P (Z≥ ) =0.4793+0.5000=0.9793
δ Ṕ 0.0489

0.4793 16
0.5000
5 P=0.60

Example 4. A population proportion is 0.40. A simple random sample of size


200 will be taken and the sample proportion will be used to estimate the
population proportion, what is the probability that the sample proportion
will be with in ±0.03 of the population proportion.

Given:
μ Ṕ = P=0.40 n=200

Ṕ−P
δ Ṕ = ( 0.4 ) (0.6) =0.0346
√ 200
P (-0.03≤ Ṕ ≤ 0.03) = 2P (Z≥
δ Ṕ
)

= 2P (Z ≤ 0.87 ¿
=2x0.3078
=0.6156

0.3078 0.3078

P=0.40

Example 5. A manufacturer of watches has determined from past experience that


3% of the watches he produces are defective. If a random sample of 300
watches is examined, what is the probability that the proportion of defective
is between 0.02 and 0.035?

μ Ṕ = P=0.03 Ṕ2=0.035
Ṕ1=0.02 n=300

17
δ Ṕ = ( 0.03 ) (0.97) =0.0098
√ 300

Ṕ−P Ṕ−P
P (-0.03≤ Ṕ ≤ 0.03) = P ( ≤Z ≤ )
δ Ṕ δ Ṕ
0.02−0.03 0.035−0.03
=P( ≤Z≤ )
0.0098 0.0098
= P (-1.02≤ Z ≤0.51)
=P (Z≥−1.02) + P (Z≤ 0.51)
=0.3461+0.1950
= 0.5411
Hence the probability that the proportion of defective will lie between 0.02 and
0.035 is 0.5411

0.3461 0.1950

=0.02 P=0.03 =0.035

Sampling Distribution of the Difference between Two Means


The concept of sampling distribution of sample mean introduced earlier can also be
used to compare a population of size N 1 having mean μ1and standard deviation δ 1
with another similar type of population of size N 2 having mean μ2and standard
deviationδ 2.

Let X́ 1 ∧ X́ 2be the mean of sampling distribution of the mean of two populations,
respectively. Then the difference between their mean values μ1and μ2can be
estimated by generalizing the formula of standard normal variable as follows;

( X́ 1− X́ 2 )−(μ X́ −μ X́ ) ( X́ 1− X́ 2 )−(μ1−μ2 )
Z= 1 2
=
δ ( X́ −X́ )
1 2
δ ( X́ − X́ )
1 2

Where: μ X́ −μ X́ = μ1−μ 2 (mean of sampling distribution of sample mean)


1 2

2 2
δ ¿¿= δ X́ + δ X́
√ 1 2

18
δ 1 2 δ 22

two means)
=
√ n1 n2
+ (standard error of sampling distribution of difference of

n1andn2 are independent random samples drawn from first and second
population , respectively.

Example: Car stereos of manufacturer A have a mean lifetime of 1,400 hours with a standard
deviation of 200 hours, while those of manufacturer B have a mean life time of 1,200 hours with a
standard deviation of 100 hours. If a random sample of 125 stereos of each manufacturer are tested,
what is the probability that manufacturer A’s stereos will have a mean life time which is at least;

A. 160 hours more than manufacturer B’s stereos?


B. 250 hours more than manufacturer B’s stereos?
Solution:

Manufacturer A μ1=1,400 hours


δ 1= 200 hours n1=125
Manufacturer B μ1=1,200 hours
δ 1= 200 hours n1=125
a)
2 2 2 2
δ ( X́ −X́ )= δ 1 + δ 2 = (200) + (100) = √ 80+320=√ 400 =20
1 2
√ n1 n2
P ( X́ 1 − X́ 2 ≥160) =
125√ 125
P (Z ≥ ¿ ¿)
160−200
=P ( Z ≥ )
20
=P (Z ≥−2)
=0.5000+0.4772
=0.9772 (area under normal curve)

0.9772

X́ 1 − X́ 2=160 μ X́ −X́ =200


1 2

Hence, the probability is very high that the life time of the stereos of A is 160 hours more
than that of b.

19
b) Proceeding in the same manner as in part a) as follows:
( X́ 1− X́ 2)(μ1 −μ 2) 250−200
P ( X́ 1 − X́ 2 ≥250) = P (Z ≥ =P ( Z ≥ )
δ ( X́ − X́ )
1 2
20
=P (Z ≥−2.5)
=0.5000 - 0.4938
=0.0062 (area under normal curve)

0.0062

Example 6. The strength of a wire produced by company has a mean of 4,500kg


and a δ 1of 200 kg. Company B has a mean of 4,000 kg and a δ 2of 300 kg. if
50 wires of company A and 100 wires of company B are selected at random
and tested for strength, what is the probability that the sample mean strength
of a will be at least 600gk more than that of B?

Given:
μ1= 4,500 μ2= 4,000
δ 1=200 δ 2=300
n1=5 n2 =100

2 2 2 2
δ ( X́ −X́ )= δ 1 + δ 2 = (200) + (300) = =41.23
1 2

n1 n2 √50
P (Z ≥ ¿ ¿)
P ( X́ 1 − X́ 2 ≥600) =
100

600−500
=P ( Z ≥ )
41.23
=P (Z ≥ 2.43)
=0.4925
=0.5000 - 0.4925=0.0075 (area under normal curve)

20
0.0075

21
SAMPLING DISTRIBUTION OF THE DIFFERENCE OF TWO PROPORTIONS

Suppose two populations of size N 1and N 2are given. For each sample of size n1from the first
population, compute sample proportion Ṕ1and standard deviation δ Ṕ . Similarly for each sample
1

size of n2 from the second population, compute sample proportion Ṕ2 and standard deviation δ Ṕ . 2

For all combinations of these samples from these populations, we can obtain a sampling
distribution of the difference Ṕ1− Ṕ2 of sample proportion. Such a distribution is called sampling
distribution of the difference of two proportions. The mean and standard deviation of this
distribution are given by;

μ Ṕ −μ Ṕ = P1−P2
1 2

P1 q1 P2 q 2
2 2
δ ¿¿= δ Ṕ + δ Ṕ =
√ 1 2
√ n1
+
n2

If sample size n1 ∧n1 are large i.e. n1 ≥30, then the sampling distribution of difference of
proportions is closely approximated by a normal distribution.

Example 7. 10% of the machines produced by company a are defective and 5% of these
produced by company B are defective. A random sample of250 machines is taken
from company A and a random sample of 300 machines is taken from company B.
what is the probability that the difference in sample proportion is less than or equal
to0.02?

μ Ṕ −μ Ṕ = P1−P2= 0.10−0.05=0.05
1 2

n1=250 n2 =300
The standard error of the difference in a sample proportions is given by

δ ( Ṕ −Ṕ )= δ Ṕ 2 + δ Ṕ 2 = P1 q1 + P2 q 2
1 2 √ 1 2
n1 √ n2
δ ( Ṕ −Ṕ )= √ 0.0052 = 0.0228
1 2

The desired probability of the difference in sample proportion is given by

( Ṕ 1− Ṕ2 )−(P1−P2 )
P¿0.02) =P ( Z ≥
δ ( Ṕ − Ṕ )
1 2

0.02−0.05
=P ( Z ≥ )
0.0228
=P (Z ≥−1.32)

22
=0.5000 - 0.4066=0.0934 (area under normal curve)
Hence the desired probability for the difference in sample proportions is 0.0934

0.093

Ṕ1− Ṕ2=0.02 μ( Ṕ − Ṕ )=0.05


1 2

23

You might also like