You are on page 1of 41

Estimation of Difference of

Means
Business Statistics/ Statistical
Inference
Case of Two Independent Populations
The Concept
Consider a random sample mean for some random
sample 1 belonging to a sampling distribution of
mean of population 1. The expected value of which
is the mean of the population 1, for which the
sampling distribution of mean is made. The
variance of is given by i.e. where is the standard
deviation of sample 1 from population 1 and is the
sample size of the sample 1 from this sampling
distribution of mean .
The Concept
Consider another random sample mean for some
random sample 2 belonging to a sample distribution
of mean of population 2. The expected value of
which is the mean of the population 2 for which the
sampling distribution of mean is made. The variance
of is given by i.e. where where is the standard
deviation of sample 2 from population 2 and is the
sample size of the sample 2 from this sampling
distribution of mean .
The Concept
• The two situations can be depicted as follows
Sampling distribution
of mean=

𝜇1 𝑋1

Sampling distribution
of mean =

𝜇2 𝑋2
Population Cloud 1
Sample 1
𝑋1
𝑥1 𝜇1
𝑠 22 𝑠2
𝑣𝑎𝑟𝑖𝑎𝑛𝑐𝑒 2= ⇒𝑆𝐸𝑀 2=
𝑁2 √𝑁2
Population Cloud 2

𝑥2 𝜇2 Sample 2
𝑋2
±
+ Population Cloud 3

𝜇3=𝜇1±𝜇2 + is used when two sampling distribution are added

- is used when two sampling distribution are subtracted


Whether sampling
distributions are added
or subtracted, law of
variances states that
they are always
summed

+
Estimating Population Mean
• Remember that for one sample the estimation of is given as,
=
Here value is called critical value for a CI or

• Lets we have means from sampling distribution 1 and sampling distribution


2 and they are and respectively.

• If the two sampling distributions are added to obtain then the estimation is
given using two sample means from two separate population as follows,

Here value is called critical value for a CI or or

• If the two sampling distributions are subtracted to obtain then the


estimation is given using two sample means from two separate population
as follows, Here value is called critical value for a CI or or
General Format of Interval Estimation
• The general form of interval estimated value using some confidence interval CI:

Population estimateestimated value critical value std.dev of the estimate


or
Population estimateestimate value margin of error

For example:

Population estimatesample mean critical value standard error of mean

Standard error of mean (SEM) is the standard deviation of sampling distribution of


mean
General Format

Confidence
Interval (CI)
Or
interval
𝛼 𝛼
2 2
−𝑡 𝑐 𝑡=0 +𝑡 𝑐
−𝑡 𝛼 +𝑡 𝛼
2 2

is also called the level of significance and it is always equal to 1-CI


Example
• Let

Confidence
Interval (CI)
Or
interval=
𝛼 𝛼
2 2
−𝑡 𝑐 𝑡=0 +𝑡 𝑐
−𝑡 0.025 +𝑡 0.025
Two cases and DF
• In independent populations case, with reference
to SEM we can have two cases

– Equal variances
– Unequal variances
Case of Equal variances
• If sample variances are equal then equation
can be simplified from

and u=, to,

The degree of freedom or DF in this case is


calculated as follows,
Case of Equal Variances
• However it does not happen that we have exact same values of variances,
rather we can have an approximate.

• An accepted rule is that equal variances can be assumed if ratio of sample


standard deviations satisfy:

• Further more, in this case we have to pool or group the variances as one
single value. This pooled standard deviation or (as it is called) is given by,

and hence the estimation formula becomes,


Estimating Population Mean
• If the two sampling distributions are added to
obtain then the estimation is given using two
sample means from two separate populations as
follows,

Here value is called critical value for a CI or with DF=

• If the two sampling distributions are subtracted to


obtain then the estimation is given using two
sample means from two separate population as
follows,
Here value is called critical value for a CI or with DF=
Case of Unequal Variances and DF
• When condition of equal variances is not
satisfied then we have no change in the
estimation formula just the degree of freedom
or DF is calculated differently.

• This new formula of DF is given as follows:


Estimating Population Mean
• If the two sampling distributions are added to
obtain then the estimation is given using two
sample means from two separate population as
follows,
Here value is called critical value for a CI or with DF as above

• If the two sampling distributions are subtracted


to obtain then the estimation is given using
two sample means from two separate
population as follows,
Here value is called critical value for a CI or with DF as above
Case of One Same Population
The Concept
• Consider the case that we take two samples from the same sampling
distribution of mean of single same population.

• The difference of the sample values will make a new difference dataset that
will estimate a new difference population as follows:

Here
is the mean of a new data set which contains differences of measured values
from two samples taken from same single population.
is the standard deviation of new differences dataset.
is the sample size of new differences dataset.
is the critical value at some CI or level of significance with
Find the point to point difference
of two samples and this will form
Population Cloud new difference dataset, . Mean
of all such similar difference
Sample 1 Sample 2 dataset will form Sampling
Distribution of
Difference of Means=

Sampling Distribution of
Difference of Means=

𝜇𝑑 𝑋𝑑
𝑥𝑑
Summary of Formulae for Estimation of
Difference of Means
S. No Formula to Use Condition
When we have two samples
from two separate
1 populations and the two
samples have equal
variances.

When we have two samples


2 from two separate
populations and the two
samples have unequal
variances.

When we have two samples


3 from one single and same
population.
Comments
• The first two cases are similar in the fact that they involve two
separate populations and two samples means. The two sample
means , their sample variances and sample standard deviations
are calculated separately. They are then used in the estimation
formula.
– This is also called independent means or between groups case

• The last case is different from the first two. Firstly note that
population is same and secondly the sample values are
subtracted and this gives a new differences dataset. After
obtaining this new differences dataset, its sample mean, variance
and standard deviation are used in the estimation formula.
– This is also called dependent means or paired groups case or repeated
measures
Examples
Estimation of Difference of Means
Example 1
• In a packing plant, a machine packs cartons with jars. It is supposed that a new machine will
pack faster on the average than the machine currently used. An experiment was conducted
to record the time taken to pack 10 cartons by new and present/ old machine. The results in
seconds, are shown in the following table. Give 99% CI for the difference between the mean
time it takes the new machine to pack 10 cartons and the mean time it takes the old/ present
machine to pack 10 cartons.
Sample 1(New Machine) Sample 2 (Old machine)
42.1 42.7
41.0 43.6
41.3 43.8
41.8 43.3
42.4 42.5
42.8 43.5
43.2 43.1
42.3 41.7
41.8 44.0
42.7 44.1
and and
Example 1
• Here two samples from two separate populations are taken
– Therefore it is independent means case

• Check of variances is made by using ratio of sample standard


deviations as follows:

• This is the case of equal variances.

• Hence the difference between the mean time it takes the new
machine to pack 10 cartons and the mean time it takes the old/
present machine to pack 10 cartons can be estimated by using,

and first finding as,


Example 1

and secondly by finding value as;

For value we need to know two things:


1) CI=> 99% (given) at two tails
2) DF which is
Using table we get
Example 1
Example 1
Finally,

The 99% CI is

i.e. we are 99% confident that is between


Example 1
• In the context of the question, it means that
estimation mean difference of packing time
between two machines at 99% CI lies in the interval
(in seconds)
Negative sign shows that new machine is faster then
present/ old machine through this interval.

New machine can pack 2.01 s to 0.17 s earlier than


old/ present machine.
Example 2
• Independent random samples of 17 sophomores and 13 juniors attending a
large university yield the following data on grade point averages. At 5% level
of significance find the estimation of the difference of two population means
corresponding to sophomores and juniors respectively. Take the case of
unequal variances.
Example 2
• Here two samples from two separate populations are taken
– Therefore it is independent means case

• The sample statistics for two samples from two populations sophomores and juniors written
with subscripts 1 and 2 respectively are as follows:

• It is provided in the question that this is the case of unequal variances.

• Hence the difference between the mean GPAs is estimated as follows:


Example 2

Which gives

For value we need to know two things:


1) CI=> 95% (given) at two tails
2) DF which we need to calculate
Example 2
• DF is calculated as follows:

( )
2 2 2
𝑠1 𝑠2
+
𝑁1 𝑁2
𝐷 𝐹=

( ) ( )
2 2 2 2
1 𝑠1 1 𝑠2
+
𝑁 1−1 𝑁1 𝑁 2− 1 𝑁2

• After obtaining DF we can find required value from the table as follows:
Example 2
Example 2
• Hence the value is 2.056.
• Therefor the calculation is proceeded as follows:

At 95% CI Sophomores have GPA lesser by 0.4437


and greater by 0.1637 as compared to Juniors.
Example 3
• Trace metals in drinking water affect the flavor and an unusually
high concentration can pose a health hazard. Ten pairs of data
were taken measuring zinc concentration in bottom water and
surface water. Provide a 95% CI estimation of mean for the
difference of mean zinc concentration between bottom water
and surface water. The data obtained is as follows:
Example 3
• Here two samples are taken from same
population.

• We first find new difference dataset as follows.


Example 3
• For reference the new difference dataset and its statistics.
Sample 1 Sample 2 Differences
Dataset

0.430 0.415 0.015


0.266 0.238 0.028
0.567 0.390 0.177
0.531 0.410 0.121
0.707 0.605 0.102
0.716 0.609 0.107
0.651 0.632 0.019
0.589 0.523 0.066
0.469 0.411 0.058
0.723 0.612 0.111
Example 3
• The mean of it is

• The standard deviation of the new difference


dataset is =0.0523

• The DF is

• At 95% CI at two tails and DF=9 gives (as shown in


the following table)
Example 3
Example 3
• Hence the interval estimation of difference of
means, , is obtained as follows,

At 95% CI zinc content is 0.043 to 0.1176 is higher


in bottom surface to top surface.

You might also like