You are on page 1of 24

UNIT 8 INTERVAL ESTIMATION FOR TWO

POPULATIONS
Structure
8.1 Introduction
Objectives
8.2 Confidence Interval for Difference of Two Population Means
8.3 Confidence Interval for Difference of Two Population Proportions
8.4 Confidence Interval for Ratio of Two Population Variances
8.5 Summary
8.6 Solutions / Answers

8.1 INTRODUCTION
In the previous unit, we have discussed the method of obtaining confidence
interval for population mean, population proportion and population variance
for a population under study. There are so many situations where two
populations exist and one wants to obtain the interval estimate for the
difference or ratio of two parameters as means, proportions, variances, etc. For
example, a company manufacturing two types of blubs and product manager
may be interested to obtain the confidence interval for difference of average
life of two types of bulbs, one may wish to obtain the interval estimate of the
difference of proportions of alcohol drinkers in the two cities, a quality control
engineer wants to obtain the interval estimate for the ratio of variances of the
quality of the product, etc.
Therefore, it becomes necessary to construct the confidence interval for
difference of means, proportions and ratio of variances of two populations. In
this unit, we shall discuss how we construct confidence intervals for difference
or ratio of the above mentioned parameters of two populations.
This unit comprises the following six sections. Section 8.1 introduces the need
of confidence intervals for the difference or ratio of the parameters of two
normal populations. Section 8.2 is devoted to method of obtaining the
confidence interval for difference of two population means when population
variances are known and unknown. Section 8.3 described the method of
obtaining the confidence intervals for difference of two population proportions
with examples, whereas the method of obtaining the confidence interval for
ratio of population variances is explored in Section 8.4. Unit ends by providing
summary of what we have discussed in this unit in Section 8.5 and solution of
exercises in Section 8.6.

Objectives
After studying this unit, you should be able to:
• introduce the confidence intervals in case of two populations;
• describe the method of obtaining the confidence interval for difference of
means of two normal populations when variances are known and unknown;
• describe the method of obtaining the confidence interval for difference of
means of two normal populations when observations are paired;
85
Estimation • describe the method of obtaining the confidence interval for difference of
proportions of two populations; and
• describe the method of obtaining the confidence interval for ratio of
variances of two normal populations.

8.2 CONFIDENCE INTERVAL FOR DIFFERENCE


OF TWO POPULATION MEANS
There are so many problems in business and economics where someone may
be interested to find out the confidence interval for difference of two
population means. For example, a company manufacturing two types of bulbs
and product manager may be interested to obtain the confidence interval for
difference of average life of two types of bulbs, two different types of drugs,
say, A and B were tried on certain number of patients for controlling blood
pressure and one may be interested to obtain the interval estimate of the
difference of average effect of two types of drugs, etc.
Let us suppose there are two normal populations, say, population-I and
population-II under study. Let X1 , X 2 , ..., X n1 be a random sample of size n1
taken from normal population-I with mean µ1 and variance σ12, and also let
Y1 , Y2 , ..., Yn2 be another random sample of size n2 taken from normal
population-II with mean µ2 and variance σ22. Here, the following three cases
are to be considered if one is interested to obtain the confidence interval for
difference between two population means for two normal population:
1. When both the samples are independent and σ12 and σ22 are known.
2. When both the samples are independent and σ12 and σ22 are unknown and
3. When the observations are paired.
For these cases, we find confidence intervals one by one in subsequent sub-
sections.
8.2.1 Confidence Interval for Difference of Two Population
Means when Samples are Independent and Population
Variances are Known
Let X1 , X 2 , ..., X n1 be a random sample of size n1 taken from normal
( )
population-I N µ1 , σ12 and Y1 , Y2 , ..., Yn2 be another independent random sample
( )
of size n2 taken from normal population-II N µ 2 , σ22 . Let X and Y be the
sample means of random samples X1 , X 2 , ..., X n1 and Y1 , Y2 , ..., Yn2
respectively then we know that sample means X and Y are normally
distributed, that is,

X ~ N (µ1, σ12 / n1 ) and Y ~ N (µ2 , σ22 / n2 )

Also, by the property of normal distribution (described in Unit 13 of MST-


003), we have

 σ2 σ2 
X − Y ~ N  µ1 − µ2 , 1 + 2 
 n1 n 2 

86
Interval Estimation
And the variate for Two Populations

Z=
( X − Y ) − (µ 1 − µ2 )
~ N ( 0, 1)
σ12 σ22
+
n1 n 2
is normally distributed with mean 0 and variance unity. Since distribution of Z
is independent of parameters so it can be taken as pivotal quantity.
Therefore, we introduce two constants zα/2 and z1-α/2 = -zα/2, such that
P[− z α / 2 ≤ Z ≤ z α / 2 ] = 1 − α … (1)
where, zα/2 is the value of the variate Z having an area of α/2 under the right (1 − α)100%
Confidence
tail of the probability curve of Z as shown in Fig. 8.1. interval
α/2 α/2
Now, by putting the value of Z in equation (1), we have
  Z = - zα / 2 Z=0 Z = zα / 2

  Fig. 8.1

P −z α / 2 ≤
( X − Y ) − ( µ1 − µ2 )
≤ zα / 2  = 1 − α
 σ1 σ2
2 2 
 + 
 n1 n 2 
Now, to convert this interval for (µ1-µ2), first we multiplying each term in the
σ12 σ 22
above inequality by + and then subtracting ( X − Y ) from each term, we
n1 n 2
get
 σ12 σ 22 σ12 σ22 
P − ( X − Y ) − zα / 2 + ≤ − ( µ1 − µ 2 ) ≤ − ( X − Y ) + z α / 2 +  = 1− α
 n1 n 2 n1 n 2 
Multiplying each term by (-1) in the above inequality, we have
 σ2 σ2 σ2 σ2  ∵ by multiplying ( −1) 
P ( X − Y ) + z α / 2 1 + 2 ≥ (µ1 − µ 2 ) ≥ ( X − Y ) − z α / 2 1 + 2  = 1 − α  the inequality is reversed 
 n1 n 2 n1 n 2 

This can be rewritten as


 σ2 σ2 σ2 σ2 
P ( X − Y ) − z α / 2 1 + 2 ≤ (µ1 − µ 2 ) ≤ ( X − Y ) + z α / 2 1 + 2  = 1 − α
 n1 n 2 n1 n 2 

Hence, required (1 −α) 100% confidence interval for (µ1 − µ 2 ) is

 σ2 σ 2 σ12 σ22 
( X − Y ) − z α / 2 1 + 2 , (X − Y) + z α/2 +  … (2)
 n1 n 2 n1 n 2 

and corresponding limits are given by

σ12 σ22
(X − Y) ∓ z α/2 +
n1 n 2
… (3)

Note 1: When Y is greater than X then we take ( Y − X ) in place of ( X − Y ) .


The following example will make you user friendly with the way how above
concepts can be used in a practical situation:
87
Estimation Example 1: A company manufacturing two types of bulbs and product
manager tested a random sample of 50 bulbs of type I and 60 bulbs of type II
and found the following information:
Mean life (hrs.) Population Standard
Deviation (hrs.)
Type I 1300 41
Type II 1280 46
Obtain the 95% confidence interval for the difference of average life of two
types of bulbs assuming that the distributions of average lives of two types of
bulbs follow the normal distribution.
Solution: Here, we are given that
n1 = 50, X = 1300, σ1 = 41

n2 = 60, Y = 1280, σ2 = 46
Since population standard deviations i.e. population variances of both the
populations are known so we use (1-α) 100% confidence limits for the
difference of population mean when population variances are known which are
given by

σ12 σ22
(X − Y) ∓ z α/2 +
n1 n 2
For 95% confidence interval, 1 − α = 0.95 ⇒ α = 0.05 and α/2 = 0.025. Also
z α / 2 = 1.96.
Thus, the 95% confidence limits for the difference of average lives of two
types of bulbs are

( 41) ( 46 )
2 2
σ12 σ22
(X − Y) ∓ z 0.025 +
n1 n 2
= (1300 − 1280 ) ∓ 1.96
50
+
60

= 20 ∓ 1.96 33.62 + 35.27


= 20 ∓ 1.96 × 8.3
= 20 ∓ 16.27
= 3.73 and 36.27
Hence, required 95% confidence interval is
[3.73, 36.27]

8.2.2 Confidence Interval for Difference of Two Population


Means when Population Variances are Unknown
Let X1 , X 2 , ..., X n1 be a random sample of size n1 taken from normal
population-I with mean µ 1 and unknown variance σ12 and also let
Y1 , Y2 , ..., Yn2 be another independent random sample of size n2 taken from
second normal population with mean µ 1 and unknown variance σ22 . Then
keeping the contents of the course in the mind we will discuss following two
cases:
88
Case I: When σ12 = σ22 = σ2 Interval Estimation
for Two Populations
In this case, σ2 is estimated by pooled S2p where,

1
S2p =  n1S12 + n2S22 
n1 + n2 − 2 

1 n1 1 n2
∑ ( ) ∑ ( Yi − Y )
2 2
where, S12 = X − X and S2
=
n1 − 1 i=1 n 2 − 1 i=1
i 2

This can also be written as


 n1 n2
2
1
∑ ( ) ∑ ( Yi − Y ) 
2
S2p =  X − X +
n1 + n 2 − 2  i=1
i
i =1 
In this situation, the variate

t=
( X − Y ) − (µ 1 − µ2 )
~ t (n1 + n 2 − 2)
1 1
Sp +
n1 n 2
follows t-distribution with (n1 + n2 − 2) degrees of freedom. Since distribution
of variate t is independent of parameters, therefore, it can be taken as pivotal
quantity.
Introduce two constants t (n1 + n 2 −2 ) , α / 2 and t (n1 + n 2 −2 ), (1−α / 2 ) = - t (n1 + n 2 −2 ) , α / 2 such
that

P  − t ( n1 + n 2 − 2 ), α / 2 ≤ t ≤ t ( n1 + n 2 − 2 ), α / 2  = 1 − α …
(4)
where, t (n1 + n 2 −2 ) , α / 2 is the value of the variate t having an area of α/2 under the
right tail of the probability curve of t-variate with (n1 + n2 − 2) degrees of
(1 − α)100%
freedom as shown in Fig. 8.2. Confidence
α/2 interval α/2
Now, by putting the value of variate t in equation (4), we have
t = - t ( n -1 ),α / 2 t = 0 t = t ( n -1 ),α / 2
 
  Fig. 8.2
P  − t ( n1 + n2 −2 ), α / 2 ≤
( X − Y ) − (µ1 − µ 2 ) ≤ t 
( n1 + n 2 − 2 ), α / 2 = 1 − α
 1 1 
 Sp + 
 n1 n 2 
Now, to convert this interval for (µ1-µ2), first we multiplying each term in the
1 1
above inequality by Sp + and then subtracting ( X − Y ) from each term,
n1 n 2
we get
 1 1 
P  − ( X − Y ) − t ( n1 + n2 − 2 ), α / 2Sp ≤ − (µ1 − µ 2 ) ≤ − ( X − Y ) + t ( n1 + n 2 − 2 ), α / 2Sp
1 1
+ +  = 1− α
 n 1 n 2 n1 n2 
Multiplying each term by (-1) in the above inequality, we get
 1 1  ∵ by multiplying ( −1) 
P ( X − Y ) + t ( n1 + n2 ≥ ( µ1 − µ 2 ) ≥ ( X − Y ) − t ( n1 + n2
1 1
− 2 ), α / 2
Sp + − 2 ), α / 2
Sp +  = 1− α
 n1 n 2 n1 n 2   the inequality is reversed 

89
Estimation This inequality can be rewritten as
 1 1 
P ( X − Y ) − t ( n1 + n2 ≤ ( µ1 − µ 2 ) ≤ ( X − Y ) + t ( n1 + n2
1 1
− 2 ), α / 2
Sp + − 2 ), α / 2
Sp +  = 1− α
 n1 n 2 n1 n 2 
Hence, required (1-α) 100% confidence interval is
 1 1 
( X − Y ) − t ( n1 + n2 −2), α / 2 Sp + , ( X − Y ) + t ( n1 + n2 −2), α / 2 Sp
1 1
+  … (5)
 n1 n 2 n1 n 2 

and corresponding limits are given by

(X − Y ) ∓ t(
1 1
n1 + n 2 − 2 ), α / 2
sp + … (6)
n1 n 2

Note 2: When Y is greater than X then we take ( Y − X ) in place of ( X − Y ) .

Case II: When σ12 ≠ σ 22


In this case, σ12 & σ22 are estimate by the values of their estimators S12 &S22
respectively, where
1 n1 1 n1
∑ ( ) ∑ ( Yi − Y )
2 2
S12 = X − X & S2
=
n1 − 1 i=1 n 2 − 1 i=1
i 2

In this situation, the pivotal quantity is difficult to obtain. But we know by


central limit theorem that for large sample sizes n1 and n2 ( > 30 ) the variate

X−Y
Z= ~ N(0, 1)
S12 S22
+
n1 n 2
follows normal distribution with mean 0 and variance unity, so (1-α) 100%
confidence interval for difference of population means may be obtained by the
procedure as same as we have followed in case when σ12 & σ22 are known by
taking S12 &S22 in place of σ12 & σ22 respectively. Then (1-α) 100% confidence
interval for (µ1 − µ 2 ) is given by

 S2 S2 S12 S22 
( X − Y ) − z α / 2 1 + 2 , (X − Y) + z α/2 +  … (7)
 n1 n 2 n1 n 2 

and corresponding limits are given by

S12 S22
(X − Y) ∓ z α/2 +
n1 n 2
… (8)

Note 3: When Y is greater than X then we take ( Y − X ) in place of ( X − Y ) .


The following examples will make you user friendly with the way how above
concepts can be used in a practical situation:
Example 2: A random sample of 120 colonies was taken from a city A and the
average population per colony was found to be 540 with a standard deviation
18. Another sample of 150 colonies taken from another city B gave an average
population 460 per colony with a standard deviation 15. Find 95% confidence

90
interval for the difference of two population averages by assuming that both the Interval Estimation
average populations follow the normal distributions. for Two Populations

Solution: Here, we are given that


n1 = 120, X = 540, S1 = 18

n 2 = 150, Y = 460, S2 = 15
Since population variances are unknown, therefore, we use (1-α) 100%
confidence limits for the difference of population mean when population
variances are unknown which are given by

S12 S22
(X − Y) ∓ z 0.025 +
n1 n 2
For 95% confidence interval, 1 − α = 0.95 ⇒ α = 0.05 and α/2 = 0.025. Also
z α / 2 = 1.96.
So 95% confidence limits for the difference of two population averages are
given by

(18) (15)
2 2

( 540 − 460) ∓ 1.96 +


120 150
324 225
= 80 ∓ 1.96 +
120 150
= 80 ∓ 1.96 2.70 + 1.50
= 80 ∓ 1.96 × 2.05
= 80 ∓ 4.02
= 75.98 and 84.02

Hence, required 95% confidence interval for ( µ1 −µ2 ) is

[75 .98, 84 .02 ]


Example 3: It is known that the average height of cadets of a centre I and II of
follows normal distribution. Two independent random samples of 6 and 8
cadets of these centres were taken and height (in inch) of each selected cadet
was measured. The height of the cadets had the following values:
Centre I: 70 72 80 82 78 80
Centre II: 92 100 85 94 95 90 96 100
Assuming that the variances of the height of the cadets of both the populations
are equal, compute 95% confidence limits for the difference of the average
heights of cadets of centres I and II.
Solution: Here, we are given that
n1 = 6 and n2 = 8
Here, variances of the height of the cadets of both the populations are equal so
we use (1−α) 100% confidence limits for the difference of the means when
population variances are equal which are given by

91
Estimation
( X − Y ) ∓ t( n1 + n 2 − 2 ), α / 2
Sp
1 1
+
n1 n 2

Calculation for X, Y and Sp :


X (X − X) ( X − X)
2 Y (Y − Y) ( Y − Y)
2

70 -7 49 92 -2 4
72 -5 25 100 6 36
80 3 9 85 -9 81
82 5 25 94 0 0
78 1 1 95 1 1
80 3 9 90 -4 16
96 2 4
100 6 36
∑ X = 462 ∑( X − X )
2
∑ Y = 752 ∑( Y − Y)
2

= 118 = 178

Form the above calculation, we have


1 1
X=
n
∑ X = × 462 = 77 ,
6
1 1
Y=
n
∑ Y = × 752 = 94
8

S2p =
1 ∑ ( X − X )2 + ∑ ( Y − Y ) 2 
n1 + n 2 − 2  

1 1
= (118 + 178) = × 296
6+8−2 12
S2p = 24.67

⇒ Sp = 4.97
For 95% confidence
interval
For 95% confidence interval, 1 − α = 0.95 ⇒ α = 0.05 and α/2 = 0.025.
1 − α = 0.95 ⇒ α = 0.05
and α/2 = 0.025. From the t-table, we have t ( n1 + n2 − 2 ), α / 2
= t (12 ), 0.025 = 2.18.

Since in this case, Y > X therefore, we take ( Y − X ) in place of ( X − Y ) .

Thus, 95% confidence limits of difference of average heights of cadets of


centres I and II are given by

( Y − X ) ∓ t( n1 + n 2 − 2 ), 0.025
Sp
1 1
+
n1 n 2

1 1
= ( 94 − 77 ) ∓ 2.18 × 4.97 × +
6 8
= 17 ∓ 2.18 × 4.97 × 0.17 + 0.13
= 17 ∓ 2.18 × 4.97 × 0.55 = 17 ∓ 5.96
= 11.04 and 22.96

92
8.2.3 Confidence Interval for Difference of Two Population Interval Estimation
for Two Populations
Means when Observations are Paired
In the earlier Section 8.2, we have assumed that both the samples have been
randomly drawn from two different normal populations and they were
independent also. However, there are so many situations where two samples
are not independent and observations are recorded on the same individuals or
items. Generally such type of observations is recorded to assess the
effectiveness of a particular training, diet, treatment, medicine, etc. In such
situations, the observations are recorded “before and after” the insertion of
treatment to the same object. For example, if we wish to test a new diet using,
say, 15 individuals, then the weight of the individuals recorded before diet and
after the diet will form two different samples in which observations will be
paired as per each individual, in the test of blood-sugar in human body, fasting
sugar level before meal and sugar level after meal, both are recorded for a
patient as paired observations, etc.
Let (X1, Y1), (X2, Y2), …,(Xn, Yn) be a paired random sample of size n and the
difference between paired observations be denoted by Di, that is,
D i = X i − Yi for all i = 1, 2,..., n
If all or mostly Yi’s are larger than Xi’s then we take
D i = Yi − X i for all i = 1, 2,..., n
Hence, we can assume that D1, D2, …, Dn be a random sample from the normal
population with mean µD and unknown variance σ2D . This is same as the case of
finding confidence interval for population mean when population variance is
unknown which is described in Sub-section 7.4.2 of the Unit 7 of this course.
The unknown σ2D is estimated by S2D where,
1 n 1 n
∑( i ) ∑ Di
2
S2D = D − D and D =
n − 1 i=1 n i =1
Also the variate
D − µD
t= ~ t n −1
SD / n ( )
follows the t-distribution with (n−1) df. Since distribution of variate t is
independent of the parameters. So t can be taken as pivotal quantity.
Introduce two constants t(n-1), α/2 and t(n-1), (1-α/2) = -t(n-1), α/2 such that

P  − t ( n −1), α / 2 ≤ t ≤ t ( n −1), α / 2  = 1 − α … (9)


 
where, t ( n −1), α / 2 is the value of the variate t having an area of α/2 under the
right tail of the probability curve of t-variate.
Now, by putting the value of t in equation (9), we have
 D − µD 
P  − t ( n −1), α / 2 ≤ ≤ t ( n −1), α / 2  = 1 − α
 SD / n 
Now, convert this interval for µD, first we multiplying each term in the above

inequality by SD / n and then subtracting D from each term, we get

93
Estimation  S S 
P  − D − t ( n −1), α / 2 D ≤ −µ D ≤ − D + t ( n −1), α / 2 D  = 1 − α
 n n
Multiplying each term by (-1) in the above inequality, we have
 S S  ∵ by multiplying 
P  D + t ( n−1), α / 2 D ≥ µd ≥ D − t ( n −1), α / 2 D  = 1 − α ( −1) the inequality 
 n n  is reversed 

This inequality can be written as


 S S 
P  D − t ( n−1), α / 2 D ≤ µd ≤ D + t ( n −1), α / 2 D  = 1 − α
 n n
Hence, required (1-α) 100% confidence interval is
 SD SD 
 D − t ( n−1), α / 2 n , D + t ( n −1), α / 2 n  … (10)

Therefore, corresponding (1− α)100% confidence limits are given by


SD
D ∓ t ( n −1), α / 2 … (11)
n
Now, this the time to do an example which make you user friendly with the
way how above concepts can be used in a practical situation:
Example 4: Following are the scores of ten soldiers of hitting a target 10 times
before and after training:
Soldier Number 1 2 3 4 5 6 7 8 9 10
Score (before) 5 6 4 7 7 6 8 4 6 5
Score (after) 7 7 6 9 8 9 9 6 6 7

Assuming that the hitting score of the soldiers before and after the training
follows normal distribution, estimate 95% confidence interval for the average
change of score after training.
Solution: Since given data are in the form of before and after so we use (1-α)
100% confidence limits for paired observation which are given by
S
D ∓ t ( n −1), α / 2 D
n
n
1 n
1
where, D = ∑ Di and S2D = ∑ ( Di − D )
2

n i =1 n − 1 i=1
Calculation for D and S2D :
S. No. X Y D= Y−X ( D − D) ( D − D)
2

1 5 7 2 0.4 0.16
2 6 7 1 -0.6 0.36
3 4 6 2 0.4 0.16
4 7 9 2 0.4 0.16
5 7 8 1 -0.6 0.36
6 6 9 3 1.4 1.96
7 8 9 1 -0.6 0.36
8 4 6 2 0.4 0.16
9 6 6 0 -1.6 2.56
10 5 7 2 0.4 0.16
58 74
∑ D = 16 ∑ (D − D)
2
= 6.40
94
From the calculation, we have Interval Estimation
for Two Populations
1 1
D=
n
∑ D = × 16 = 1.6
10

( D − D)
1

2
S2D =
n −1
1
= × 6.40
9
= 0.71
⇒ SD = 0.84
For 95% confidence interval, 1 − α = 0.95 ⇒ α = 0.05 and α/2 = 0.025.
From the t-table, we have t ( n −1), α / 2 = t (9 ), 0.025 = 2.26.

Therefore, 95% confidence limits for the average change of score after training
are
0.84
1.60 ∓ t (9 ),0.025 = 1.60 ∓ 2.26 × 0.27
10
= 1.60 ∓ 0.61
= 0.99 and 2.21
Hence, required 95% confidence interval for the average change of score after
training is
[0.99, 2.21]
It is time for you to try the following exercises to make sure that you learnt
about the confidence interval for difference of two population means in
different cases.
E1) A sample of height of 2500 Bangladeshis has a mean of 68.50 inches
and a standard deviation of 2.52 inches, while sample of height of 1600
Indians has a mean 70.25 inches and a standard deviation 2.58 inches.
Find 90% interval estimate for the difference of mean heights of both
the countries by assuming that the height of the persons of both the
countries are normally distributed.
E2) In a experiment, while comparing two types of pigs food, increase in
weight (in pounds) are observed in pigs:
Pig number 1 2 3 4 5 6 7 8 9 10
Increase Food A 10 12 16 13 12 16 12 9 16 14
Weight
(in pounds) Food B 7 13 12 12 10 17 12 6 12 9

Assuming that the increase in the weight due to both foods follows
normal distribution, find the 99% confidence limits for difference of
increase in weights due to food A and B in each of the cases:
(i) When two samples of pigs are independent,
(ii) When the same set of 10 pigs are used in both foods.

95
Estimation
8.3 CONFIDENCE INTERVAL FOR DIFFERENCE
OF TWO POPULATION PROPORTIONS
In Section 7.5 of the previous unit, we have point out that in many situations, in
business and other areas, the data are collected in form of counts or the
collected data classified into two categories or groups according to an attribute
under study. In such situation we study proportion instead of mean. There are
many situations where someone is interested to find the interval estimate of the
difference of two proportions of an attributes in two different populations or
group. For example, one may wish to obtain the interval estimate of the
difference of proportions of alcohol drinkers in two cities, one may wish to
obtain the interval estimate of the difference of proportions of literates between
two groups of peoples, etc.
Let there be two populations, say, population-I and population-II. And
population-I has population proportion P1 and the population-II has population
proportion P2 according to an attribute under study. Let us consider two
independent random samples of sizes n1 and n2 taken from these populations
respectively. Let X1 and X2 represent the number of observations or elements
possessing the attribute under study in the sample of sizes n1 and n2
respectively. Also let p1 and p2 are observed sample proportions respectively
which are defined as
X X
p1 = 1 and p2 = 2
n1 n2
As we have seen in Section 2.5 of the Unit 2 of this course that if n1 and n2 are
sufficiently large, such that n1p1 > 5, n1q1 > 5, n2p2 > 5 and n2q2 > 5 then by
central limit theorem, the sampling distribution of sample proportions p1 and p2
are approximately normally distributed as
 PQ   PQ 
p1 ~ N  P1, 1 1  and p2 ~ N  P2 , 2 2 
 n1   n2 
Also, by the property of normal distribution described in Unit 13 of MST 003,
we have
 PQ P Q 
p1 − p 2 ~ N P1 − P2 , 1 1 + 2 2 
 n1 n2 
Hence, the variate
p1 − p 2 − (P1 − P2 )
Z= ~ N(0, 1)
P1Q1 P2 Q 2
+
n1 n2
follows normal distribution with mean 0 and variance unity. The probability
density function of standard normal variate Z is given by
1 − 12 z2
f (z ) = e ; −∞<z<∞

Since distribution of Z is independent of parameters, therefore, it can be taken
as pivotal quantity.
We introduce two constants zα/2 and z1-α/2= -zα/2 such that
P[− z α / 2 ≤ Z ≤ z α / 2 ] = 1 − α … (12)
96
where, zα/2 is the value of the variate Z having an area of α/2 under the right Interval Estimation
tail of the probability curve of Z. for Two Populations

Now, by putting the value of Z in equation (12), we get


 

(p − p 2 ) − (P1 − P2 ) ≤ z  = 1 − α
P − z α / 2 ≤ 1
 P1Q1 P2 Q 2
α/2

 + 
 n1 n2 
P1Q1 P2Q 2
For large samples the population variance of (P1-P2), that is, + can
n1 n2
p1q1 p 2q 2
be estimated by value of + such that
n1 n2
p1 − p 2 − (P1 − P2 )
Z= ~ N(0,1)
p1q1 p 2 q 2
+
n1 n2
Therefore, the confidence interval for (P1-P2) would be
 
 

P − zα/ 2 ≤
(p1 − p 2 ) − (P1 − P2 )
≤ zα/ 2  = 1 − α
 p1q1 p 2 q 2 
 + 
 n1 n2 

p1q1 p 2q 2
Multiplying each term by + and then subtracting (p1-p2) from each
n1 n2
term in the above inequality, we get
 pq 
P − (p1 − p 2 ) − z α / 2 1 1 + 2 2 ≤ − (P1 − P2 ) ≤ − (p1 − p 2 ) + z α / 2 1 1 + 2 2  = 1 − α
pq p q pq
 n1 n2 n1 n2 
Multiplying each term by (-1) in the above inequality, we get

  ∵ by multiplying ( −1) 
P (p1 − p 2 ) + z α / 2 1 1 + 2 2 ≥ (P1 − P2 ) ≥ (p1 − p 2 ) − z α / 2 1 1 + 2 2
pq p q pq p q
 = 1− α  the inequality is reversed 
 n1 n2 n1 n2 
This inequality can be rewritten as
 
P (p1 − p 2 ) − z α / 2 1 1 + 2 2 ≤ (P1 − P2 ) ≤ (p1 − p 2 ) + z α / 2 1 1 + 2 2
pq p q pq p q
 = 1− α
 n1 n2 n1 n2 
Hence, (1-α) 100% confidence interval for difference of population
proportions is
 p 1q 1 p 2 q 2 
(p1 − p 2 ) − z α / 2 (p1 − p 2 ) + z α / 2
p 1q 1 p 2 q 2
+ , +  …
 n 1 n 2 n1 n2 
(13)
Therefore, corresponding (1− α)100% confidence limits are given by

p1q1 p 2 q 2
( p1 − p 2 ) ∓ z α / 2 + … (14)
n1 n2

97
Estimation Note 4: When p2 is greater than p1 then we take ( p 2 − p1 ) in place of
( p1 − p2 ) .
The following example will make you user friendly with the way how above
concepts can be used in applied problem:
Example 5: In a large city A, 800 persons out of a sample of 1000 persons
were found to be alcohol drinkers. In another large city B, 800 persons were
alcohol drinkers in a sample of 1200 persons. Construct (i) 95% and 99%
confidence limits for the difference in proportions of the alcohol drinkers of the
two cities A and B.
Solution: Here, we are given that
n1 = 1000, X1 = 800, n 2 = 1200, X 2 = 800
X1 800
p1 = = = 0.80, q1 = 1 − p1 = 1 − 0.80 = 0.20
n1 1000
X 800
p2 = 2 = = 0.67, q 2 = 1 − p2 = 1 − 0.67 = 0.33
n2 1200
∵ n1p1 = 1000 × 0.80 = 800 > 5, n1q1 = 1000 × 0.20 = 200 > 5
n 2 p2 = 1200 × 0.67 = 800 > 5, n 2q 2 = 1200 × 0.33 = 400 > 5
Therefore, (1−α) 100% confidence limits for the difference in proportions are
given by
p1q1 p2q 2
( p1 − p2 ) ∓ zα / 2 +
n1 n2
(i) For 95% confidence interval, 1 − α = 0.95 ⇒ α = 0.05 and α/2 = 0.025. Also
z α / 2 = z 0.025 = 1.96.
So 95% confidence limits for the difference of the proportions of alcohol
drinkers of the two cities A and B are
0.80 × 0.20 0.67 × 0.33
( 0.80 − 0.67 ) ∓ 1.96 +
1000 1200

= 0.13 ∓ 1.96 0.00016 + 0.00018


= 0.13 ∓ 1.96 × 0.018
= 0.13 ∓ 0.035 = 0.095, 0.165
(ii) Similarly, for 99% confidence interval, 1 − α = 0.99 ⇒ α = 0.01 and
α/2 = 0.005. Also z α / 2 = z 0.005 = 2.58.
So 99% confidence limits for the difference of the proportions of alcohol
drinkers of the two cities A and B are
p1q1 p2q 2
( p1 − p2 ) ∓ z0.005 +
n1 n2
= 0.13 ∓ 2.58 × 0.018
= 0.13 ∓ 0.046 = 0.084 and 0.176
98
If you have followed this example, you would certainly be able to do these Interval Estimation
exercises. for Two Populations

E3) In a large population 30% of a random sample of 1200 persons had


blue-eyes and 20% of a random sample of 900 persons had the same
blue-eyes in another population. Obtain 95% confidence limits for the
difference of proportions of the blue-eye persons in two populations.
E4) Random samples of 500 men and 600 women were asked whether they
would like to have a fly over near their residences. 250 men and 400
women were in favour of the proposal. Find 99% confidence limits for
the difference of proportions of men and women in favour of this
proposal.

8.4 CONFIDENCE INTERVAL FOR RATIO OF


TWO POPULATION VARIANCES
In Sections 8.2 and 8.3, we have discussed about the interval estimation for
difference of two population means and proportions respectively. Now, one
may be interested to peruse for the interval estimate for the ratio of variances of
the two normal populations. For example, a quality control engineer wants to
obtain the interval estimate for the ratio of variances of the quality of the two
products, an economist may wish to know the interval estimate for the ratio of
variability in incomes of the two populations, etc.
Suppose there are two normal populations, say, population-I and population-II.
Let X1 , X 2 , ..., X n1 be a random sample of size n1 taken from normal
population-I with mean µ1 and variance σ12, and also let Y1 , Y2 , ..., Yn2 be
another random sample of size n2 taken from normal population-II with mean
µ2 and variance σ22.
As we have seen in Section 4.2 of Unit 4 of this course that
n1

∑(X − X)
2

i =1
i
=
( n1 − 1) S12 ~ χ (2n1 −1)
σ12 σ12

and
n2

∑(Y − Y)
2

i =1
i
=
( n 2 − 1) S22 ~ χ (2n 2 −1)
σ 2
2 σ 2
2

Also, the variate


χ (2n1 −1) / (n1 − 1) S12σ 22
F= = ~ F(n1 −1,n 2 −1)
χ (2n 2 −1) / (n 2 − 1) S22σ12

follows the F-distribution with (n1 – 1, n2 – 1) df.


Therefore, the probability density function of F variate is given by

99
Estimation ( n1 −1) / 2
 n1 − 1 
 
n −1 F( 1 )
n −1 / 2 −1
f (F) =  2  ; 0 < F <∞
 n1 − 1 n 2 − 1  n1 + n 2 − 2
B ,   n −1 F  2
 2 2  1 + 1 
 n2 −1 
Since distribution of F is independent of parameters so it can be taken as
pivotal quantity.
F-distribution has a relation 1
1 Introduce constants F(n1−1,n 2 −1), α / 2 and F( n1 −1,n2 −1), (1−α / 2) = which are
F( ν1 , ν2 ), (1−α ) = F( n2 −1,n1 −1), α / 2
F( ν2 , ν1 ), α
values of variate F, such that
 1 
P ≤ F ≤ F( n1 −1,n2 −1),α / 2  = 1 − α … (15)
 F( n2 −1,n1 −1), α / 2 

where, F( n1 −1,n 2 −1), α / 2 and F( n1 −1,n2 −1), (1−α / 2 ) are the values of the variate F
having area of α/2 under the right tail and left tail respectively of the
probability curve of F-variate with (n1 – 1, n2 – 1) df . These values can be read
from the F-table given in Appendix at the end of the Block 1of this course.
Now, by putting the value of F in equation (15), we get
 1 S2 σ 2 
P ≤ 12 22 ≤ F( n1 −1,n2 −1), α / 2  = 1 − α
 F( n2 −1,n1 −1), α / 2 S2 σ1 

Dividing each term in the above inequality by S12 / S22 , we have

 1 S22 σ22 S22 


P ≤ 2 ≤ F( n1 −1,n2 −1), α / 2 2  = 1 − α
 F( n2 −1,n1 −1), α / 2 S1 σ1
2
S1 

By taking reciprocal of each term in the above inequality, we have


 S2 σ 2 S12 / S22  ∵ by reciprocaling,
P  F( n 2 −1,n1 −1), α / 2 12 ≥ 12 ≥  = 1− α  the inequality is 
 S2 σ 2 F( n1 −1,n 2 −1), α / 2   reversed 

This inequality can be written as
 S 2 / S2 σ2 S2 
P 1 2
≤ 12 ≤ F( n 2 −1,n1 −1), α / 2 12  = 1 − α
 F( n1 −1,n2 −1), α / 2 σ 2 S2 

Hence, the required (1-α) 100% confidence interval for ratio of population
variances is given by
 S2 / S 2 S12 
 1 2
, F( n 2 −1,n1 −1), α / 2 2  … (16)
F S2 
 ( n1 −1,n 2 −1), α / 2 

Therefore, corresponding (1− α) 100% confidence limits are given by


S12 / S22 S12
and F( n 2 −1,n1 −1), α / 2 2 … (17)
F( n1 −1,n 2 −1), α / 2 S2

100
The following example will make you user friendly with the way how above Interval Estimation
concepts can be used in a numerical problem: for Two Populations

Example 6: The following data relate to the number of items produced per
shift by two workers A and B for a number of days:
A 26 37 40 35 30 30 40 26 30 35 45
B 19 22 24 27 24 18 20 19 25

Assuming that the number of the item produced by both the workers follows
normal distribution, estimate 95% confidence interval for σ12 / σ 22 , where
σ12 and σ 22 are the population variances of the number of units produced by
workers A and B respectively.
Solution: We know that (1−α) 100% confidence interval for ratio of population
variances is given by
 S2 / S 2 S12 
 1 2
, F( n 2 −1,n1 −1), α / 2 2 
F S2 
 ( n1 −1,n 2 −1), α / 2 
For 95% confidence interval, 1 − α = 0.95 ⇒ α = 0.05 and α/2 = 0.025.
Therefore, 95% confidence interval is
 S2 / S2 S2 
 1 2 , F(8,10),0.025 12 
 F(10,8),0.025 S2 

Calculation for S12 and S22 :


Items
produced
(X − X) ( X − X)
2 Items
produced
(Y − Y) ( Y − Y)
2

by A = ( X − 34 ) by B = ( Y − 22 )
(X) (Y)
26 -8 64 19 −3 9
37 3 9 22 0 0
40 6 36 24 2 4
35 1 1 27 5 25
30 -4 16 24 2 4
30 -4 16 18 −4 16
40 6 36 20 −2 4
26 -8 64 19 −3 9
30 -4 16 25 3 9
35 1 1
45 11 121
Total = 374 380 198 80

From above calculation, we have


1 1
X=
n1
∑ X = × 374 = 34,
11

1 1
Y=
n2
∑ Y = × 198 = 22
9

101
1
( X − X ) = × 380 = 38
1
Estimation

2
S12 =
n1 − 1 10

1
∑ ( Y − Y ) = × 80 = 10
1
2
S22 =
n2 − 1 8

S12 38
= = 3.8
S22 10

From F-table, we have


F( n1 −1,n2 −1), α / 2 = F(10, 8), 0.025 = 3.34 and F( n2 −1,n1 −1), α / 2 = F(8, 10), 0.025 = 5.82.

So 95% confidence interval for ratio of population variances of the number of


units produced by workers A and B can be obtained as

 S2 / S2 S12 
 1 2
, F(8,10 ),0.025 2 
 F(10,8),0.025 S2 

 3.8 
⇒ , 5.82 × 3.8
 3.34 
⇒ [1.14, 22.12 ]

You will become user friendly with the use of the concepts in numerical
problems after going through these exercises.
E5) Two samples are drawn from two normal populations as given below:
Sample I 61 66 67 85 78 63 85 86 88 91
Sample II 60 65 71 74 76 82 85 87

Construct 95% confidence interval for ratio of population variances.


E6) The following information about two samples drawn from two normal
populations is

∑( X − X ) ∑( Y − Y )
2 2
n1 = 6, = 60.2 and n2 = 8, = 58.4
Construct 90% confidence interval for ratio of population variances.

With this we are at the end of this unit. We now summarise our discussion.

8.5 SUMMARY
In this unit, we have discussed following points:
1. The need of confidence interval for two populations.
2. The method of construction of confidence interval for difference of means
of two normal populations when variances are known and unknown.
3. The method of construction of confidence interval for difference of means
of two normal populations when observations are paired.
4. The method of construction of confidence interval for difference of
proportions of two populations.

102
5. The method of construction of confidence interval for ratio of variances of Interval Estimation
two normal populations. for Two Populations

8.6 SOLUTIONS/ ANSWERS


E1) Here, we are given that
n1 = 2500, X = 68.50, S1 = 2.52

n 2 = 1600, Y = 70.25, S2 = 2.58


Since population variances are unknown, therefore, we use (1-α) 100%
confidence limits for the difference of population mean when
population variances are unknown which are given by

S12 S22
(X − Y) ∓ z 0.025 +
n1 n 2
For 90% confidence interval, 1 − α = 0.90 ⇒ α = 0.10 and α/2 = 0.05.
Also z α / 2 = 1.645.
Since in this case, Y > X therefore, we take ( Y − X ) in place of
( X − Y) . For 90% confidence
interval
1 − α = 0.90 ⇒ α = 0.10
Thus, 90% confidence limits are given by and α/2 = 0.05. Also
zα/2 = 1.645.
S12 S22
( Y − X) ∓ z 0.05 +
n1 n 2

( 2.52) ( 2.58)
2 2

= ( 70.25 − 68.50) ∓ 1.645 × +


2500 1600

6.35 6.66
= 1.75 ∓ 1.645 × +
2500 1600

=1.75 ∓ 1.645 × 0.0025 + 0.0042


= 1.75 ∓ 1.645 × 0.08

= 1.75 ∓ 0.13

= 1.62 and 1.88

Hence, the required 90% confidence interval is

[1.62, 1.88]
E2) We know that (1-α) 100% confidence limits for difference means are
given by

( X − Y ) ∓ t( n1 + n 2 − 2 ), α / 2
Sp
1
+
1
n1 n 2

Case I: When samples are independent

103
Estimation Calculation for X, Y and Sp :
X (X − X) ( X − X)
2 Y (Y − Y) ( Y − Y)
2

10 -3 9 7 -4 16
12 -1 1 13 2 4
16 3 9 12 1 1
13 0 0 12 1 1
12 -1 1 10 -1 1
16 3 9 17 6 36
12 -1 1 12 1 1
9 -4 16 6 -5 25
16 3 9 12 1 1
14 1 1 9 -2 4

∑X ∑( X − X )
2
∑Y ∑( Y − Y )
2

=130 =110
=56 = 90

From the table, we have


1 130
X=
n1
∑ X=
10
= 13,

1 110
Y=
n2
∑ Y=
10
= 11

S2p =
1 ∑ ( X − X ) 2 + ∑ ( Y − Y ) 2 
n1 + n 2 − 2  

1 146
= ( 56 + 90 ) =
18 18
S2p = 8.11

For 99% confidence ⇒ Sp = 2.85


interval
1 − α = 0.99 ⇒ α = 0.01 From the t-table, we have t ( n1 + n 2 − 2),0.005 = t (18), 0.005 = 2.88.
and α/2 = 0.005.
Hence, 99% confidence limits for difference of increase in weights
due to food A and B are

( X − Y ) ∓ t ( n1 + n2 −2),0.005 Sp n1 + n1
1 2

1 1
= (13 − 11) ∓ t (18),0.005 × 2.85 +
10 10

= 2 ∓ 2.88 × 2.85 × 0.45 ∵ t (18),0.005 = 2.85 


 
= 2 ∓ 3.69
Case II: When samples are not independent i.e. same set of pigs are put
on diets A and B, then 99% confidence limits for difference of increase
in weights due to food A and B are
104
SD Interval Estimation
D ∓ t n −1,0.005 for Two Populations
n
Calculation D and SD :
X Y D= X−Y ( D − D) ( D − D)
2

10 7 3 1 1
12 13 -1 -3 9
16 12 4 2 4
13 12 1 -1 1
12 10 2 0 0
16 17 -1 -3 9
12 12 0 -2 4
9 6 3 1 1
16 12 4 2 4
14 9 5 3 9
∑ D = 20 ∑( D − D)
2
= 42

Form the table, we have


1 1
D=
n
∑ D = × 20 = 2
10
1
∑ ( D − D ) = × 42 = 4.67
1
2
S2D =
n −1 9
S2D = 4.67

⇒ SD = 2.16
From the t-table, we have t ( n −1), 0.005 = t ( 9),0.005 = 3.25.

Thus, 99% confidence limits are


SD 2.16
D ∓ t n −1,α / 2 = 2 ∓ t (9),0.005 ×
n 10
2.16
= 2 ∓ 3.25 × = 2 ∓ 2.22
3.16
E3) Here, we are given that
n 1 = 1200 , p1 = 30 % = 0.30
n 2 = 900, p 2 = 20% = 0.20
q1 = 1 − p1 = 1 − 0.30 = 0.70
q 2 = 1 − p 2 = 1 − 0.20 = 0.80
We know that (1−α) 100% confidence limits for the difference in
proportions is given by
p1q1 p2q 2
( p1 − p2 ) ∓ zα / 2 +
n1 n2

105
Estimation Therefore 95% confidence limits for the difference in proportions of
blue-eyes in two populations are
p1q1 p2q 2
( p1 − p2 ) ∓ z0.025 +
n1 n2

0.30 × 0.70 0.20 × 0.80


= ( 0.30 − 0.20 ) ∓ 1.96 +
1200 900
= 0.10 ∓ 1.96 0.000175 + 0.000178
= 0.10 ∓ 1.96 × 0.019
= 0.10 ∓ 0.037
E4) Here, we are given that
n1 = 500, X = 250, n 2 = 600, Y = 400
X1 250
p1 =
= = 0.50, q1 = 1 − p1 = 1 − 0.50 = 0.50
n1 500
X 400
p2 = 2 = = 0.67, q 2 = 1 − p2 = 1 − 0.67 = 0.33
n 2 600
We know that (1−α) 100% confidence limits for the difference in
proportions is given by
p1q1 p2q 2
( p1 − p2 ) ∓ zα / 2 +
n1 n2

Since p2 is greater than p1 therefore, we take ( p 2 − p1 ) in place of


( p1 − p 2 ) then confidence limits are

p1q1 p2q 2
( p2 − p1 ) ∓ zα / 2 +
n1 n2
Therefore, 99% confidence limits for the difference in proportions of
For 99% confidence
interval men and women in favour of the proposal are
1 − α = 0.99 ⇒ α = 0.01
p1q1 p2q 2
and α/2 = 0.005. Also ( p2 − p1 ) ∓ z0.005 +
z0.005 = 2.58. n1 n2

0.50 × 0.50 0.67 × 0.33


= ( 0.67 − 0.50 ) ∓ 2.58 +
500 600

= 0.17 ∓ 2.58 × 0.0005 + 0.0004

= 0.17 ∓ 2.58 0.0009 = 0.17 ∓ 2.58 × 0.03


= 0.17 ∓ 0.08
E5) We know that (1-α) 100% confidence interval for ratio of population
variances is given by
 S2 / S2 S12 
 1 2
, F( n 2 −1,n1 −1), α / 2 2 
 F( n1 −1,n2 −1), α / 2 S2 

106
Therefore, 95% confidence interval is given by Interval Estimation
for Two Populations
 S2 / S 2 S2 
 1 2 , F( 7,9 ),0.025 12 
 F( 9,7 ), 0.025 S2 
 For 95% confidence
interval
Calculation for S12 and S22 : 1 − α = 0.95 ⇒ α = 0.05
and α/2 = 0.025
Sample I
(X)
(X − X) ( X − X)
2 Sample II
(Y)
(Y − Y) ( Y − Y)
2

61 −16 256 60 −15 225


66 −11 121 65 -10 100
67 −10 100 71 −4 16
85 8 64 74 −1 1
78 1 1 76 1 1
63 −14 196 82 7 49
85 8 64 85 10 100
86 9 81 87 12 144
88 11 121
91 14 196
∑X ∑( X − X) ∑ Y = 600
2
∑( Y − Y )
2

= 770
=1200 =636
Therefore,
1 1
X=
n1
∑ X = × 770 = 77
10
1 1
Y=
n2
∑ Y = × 600 = 75
8
1
∑ ( X − X ) = × 1200 = 133.33
1
2
S12 =
n1 − 1 9
1
∑ ( Y − Y ) = × 636 = 90.86
1
2
S22 =
n2 − 1 7
Therefore, we consider
S12 133.33
= = 1.47
S22 90.86
From the F-table for α = 0.05, we have
F(7,9),0.025 = 4.20 and F(9,7),0.025 = 4.82

Hence, the 95% confidence interval for ratio of population variances is


given by
 S2 / S2 S2 
 1 2 , F(7,9 ),0.025 12 
 F(9,7 ),0.025 S2 

 1.47 
= , 4.20 × 1.47 
 4.82 
= [0.30 , 6.17 ]

107
Estimation E6) Here, we are given

∑( X − X )
2
n1 = 6, = 60.2

∑( Y − Y)
2
n2 = 8, = 58.4

We know that (1-α) 100% confidence interval for ratio of population


variances is given by
 S2 / sS2 S2 
 1 2
, F( n 2 −1,n1 −1), α / 2 12 
 F( n1 −1,n2 −1), α / 2 S2 

For 90% confidence Therefore, 90% confidence interval is given by
interval
1 − α = 0.90 ⇒ α = 0.10  S2 / S2 S2 
and α/2 = 0.05.  1 2 , F( 7,5),0.05 12 
 F(5,7 ),0.05 S2 

From the F-table for α = 0.05, we have
F (5, 7), 0.05 = 3.97 and F (7, 5), 0.05 = 4.88
Also

( X − X ) = × 60.2 = 12.04
1 1

2
S12 =
n1 − 1 5

( Y − Y ) = × 58.4 = 8.34
1 1

2
S22 =
n2 − 1 7

S12 12.04
= = 1.44
S22 8.34
Therefore, 90% confidence interval is given by
 S2 / S2 S2 
 1 2 , F(7,5),0.05 12 
 F(5,7 ),0.05 S2 

 1.44 
= , 4.88 × 1.44 
 3.97 
= [0.36, 7.03 ]

108

You might also like