You are on page 1of 46

CHAPTER 10

ESTIMATION AND
HYPOTHESIS TESTING:
TWO POPULATIONS
(PART 1)
Opening Example
 Inchapter 9 we discussed the estimation and hypothesis-
testing procedures for involving a single population.

This chapter extends the discussion of hypothesis-testing


procedures to the difference between two population means.

For example, we may want to test the hypothesis whether the


mean price of houses in California is different than from house
prices in New York.

We may want to test the hypothesis that the number of adult


men who do not smoke is different from the proportion of all
adult women who are non-smokers.
Independent versus Dependent Samples
Two samples drawn from two populations are
independent if the selection of one sample from one
population does not affect the selection of the second
sample from the second population.

Suppose we want to estimate the difference between the


mean salaries of all male and all female executives. To do
so, we draw two samples, one from the population of male
executives and another from the population of female
executives. These two samples are independent because
they are drawn from two different populations, and the
samples have no effect on each other.
Independent versus Dependent Samples
Two samples drawn from two populations are
dependent if the selection of one sample from one
population affects the selection of the second sample from
the second population.

Suppose we want to estimate the difference between the


mean weights of all participants before and after a weight
loss program. To accomplish this, suppose we take a
sample of 40 participants and measure their weights before
and after the completion of this program. Note that these
two samples include the same 40 participants. This is an
example of two dependent samples.
Denotations
Suppose we select two (independent) samples from two different
populations that are referred to as population 1 and population 2.
We can use the following denotations.
Inferences about the difference between two
population means for Independent Samples:
 and known
If
  these 3 conditions are satisfied:
1. The two samples are independent
2. The standard deviations σ1 and σ2 of the two
populations are known/given
3. At least one of the following two conditions is fulfilled:
i. Both samples are large (i..e., n1 ≥ 30 and n2 ≥ 30)
ii. If either one or both sample sizes are small, then
both populations from which the samples are drawn are
normally distributed

We can then conclude that


The sampling distribution of is (- (approximately) normally
distributed and we use the normal distribution/z table.
 Sampling Distribution of (-
  the conditions listed on the previous page are
When

satisfied, the sampling distribution of (- has a mean
of
 
𝜇¯𝑥 − 𝑥¯ =𝜇1 − 𝜇2
1 2

and standard deviation is:

 
2 2
𝜎 𝜎
𝜎 𝑥¯ − 𝑥¯ =
1 2
√+
𝑛1 𝑛2
1 2
Figure 10.1

.
Hypothesis Testing About (μ1 – μ2
 and known)
It   often necessary to compare the means of two
 is
populations. For example, we may want to know if the mean
price of houses in Chicago is different than that in Los
Angeles. Similarly, we may be interested in knowing if, on
average, Bangladeshi children spend fewer hours in school
than Indian children do. In both these cases, we will perform
a test of hypothesis about ()

The alternative hypothesis in a test of hypothesis may be


that the means of the two populations are different, or that
the mean of the first population is greater than the mean
of the second population, or that the mean of the first
population is less than the mean of the second population.
These three situations are described next.
Hypothesis Testing About (μ1 – μ2)
 and known)
1. Testing an alternative hypothesis that the means of
two populations are different is equivalent to μ1 ≠
μ2, which is the same as μ1 - μ2 ≠ 0.

2. Testing an alternative hypothesis that the mean of


the first population is greater than the mean of the
second population is equivalent to μ1 > μ2, which is
the same as μ1 - μ2 > 0.

3. Testing an alternative hypothesis that the mean of


the first population is less than the mean of the
second population is equivalent to μ1 < μ2, which is
the same as μ - μ < 0.
Hypothesis Testing About μ1 – μ2
 and known)
Test
   Statistic z for (-

When using the normal distribution, the value of the


test statistic z for (- ) is computed as

  𝑥1− ¯
(¯ 𝑥2 )−( 𝜇1 − 𝜇2)
𝑧=
𝜎 ¯𝑥 − ¯𝑥
1 2

The value of μ1 – μ2 is substituted from H0. The value


of is calculated as mentioned earlier.
Example 10-4
A survey of low- and middle-income households showed that
consumers aged 65 years and older had an average credit
card debt of $10,235 and consumers in the 50 to 64-year
age group had an average credit card debt of $9342 at the
time of the survey. Suppose that these averages were based
on random samples of 1200 and 1400 people for the two
groups, respectively. Further assume that the population
standard deviations for the two groups were $2800 and
$2500, respectively. Let μ1 and μ2 be the respective
population means for the two groups, people aged 65 years
and older and people in the 50 to 64 year age group.

Using the critical value approach test at the 1% significance


level whether the credit card debts for the two groups are
different.
Example 10-4: Solution
  

Step 1: State the Null & Alternative Hypotheses


We are to test whether the two population means are
different. The two possibilities are as follows:
The mean credit card debts for people of the two age
groups are not different. In other words, μ1 = μ2 which can
be written as μ1 – μ2 = 0
The mean credit card debts for people of the two age
groups are different. That is, μ1 μ2 , which can be written as
μ1 – μ2 ≠ 0
Example 10-4: Solution
Considering these two possibilities, the null and alternative
hypotheses are, respectively,

H0: μ1 – μ2 = 0 (The two population means are not


different.)
H1: μ1 – μ2 ≠ 0 (The two population means are different.)

Step 2: Select the distribution to use.


Here
 Population standard deviations, σ1 and σ2, are known

 Both samples are large; n1 > 30 and n2 > 30


Therefore, we use the normal distribution to perform the
hypothesis test.
Example 10-4: Solution
  3: Determine the rejection and nonrejection
Step

regions.
The significance level is given to be = 0.01.
The sign in the alternative hypothesis indicates that
the test is two-tailed.
The area in each tail of the normal distribution curve is
= α / 2 = .01 / 2 = .005
The critical values of z are 2.58 and -2.58 (you can also
choose 2.57)
Figure 10.2
Example 10-4: Solution
Step 4: Calculate the value of the test statistic.
 
104.8695

From H0
Example 10-4: Solution
Step
   5: Make a decision
Because the value of the test statistic z = 8.515 falls inthe
rejection region, we reject the null hypothesis .

Therefore, we conclude that that the mean credit card debt


for consumers aged 65 and above is different than the
mean credit card debt of consumers in the age range of 50
to 64-year.
Inferences about the difference between two
population means for Independent Samples:
 and unknown and assumed to be equal
If these 3 conditions are satisfied:

1. The two samples are independent


2. The standard deviations σ1 and σ2 of the two populations
are unknown but are assumed to be equal.
3. At least one of the following two conditions is fulfilled:
i. Both samples are large (i..e., n1 ≥ 30 and n2 ≥ 30)
ii. If either one or both sample sizes are small, then
both populations from which the samples are drawn are
normally distributed

We can then conclude that

We will use the t-distribution in order to conduct the


hypotheses testing.
Hypothesis Testing About (μ1 – μ2)
 and unknown and assumed to be equal)
Test
   Statistic t for (- )

The value of the test statistic t for (- ) is computed as

  ( ¯𝒙 𝟏 − ¯
𝒙 𝟐) −(𝝁 𝟏 − 𝝁 𝟐)
𝒕=
𝒔¯𝒙 − ¯𝒙
𝟏 𝟐

The value of μ1 – μ2 in this formula is substituted from the


null hypothesis and is calculated as explained in the next
slide.
Pooled Standard Deviation for Two
Samples
   the standard deviations of the two populations are equal, we
When
can use σ for both and . Because σ is unknown, we replace it by
its point estimator which is called the pooled sample standard
deviation (hence, the subscript p). The value of is computed by
using the information from the two samples as follows
  (𝒏𝟏 − 𝟏) 𝒔 𝟐𝟏 +(𝒏𝟐 − 𝟏 ) 𝒔𝟐𝟐
𝒔 𝒑=
√ 𝒏𝟏 +𝒏𝟐 − 𝟐

n1, n2 = sample sizes


, =variances of the samples
(n1 -1) = degrees of freedom for sample 1
(n2 -1) = degrees of freedom for sample 2
(n1+ n2-2)= degrees of freedom for the two samples taken together.
sp = estimator of σ.
Estimator of the Standard Deviation of (- )
 
 When is used as an estimator of σ, the standard deviation of
is estimated by .The value of is calculated by using the
following formula.

 
1 1
𝑠 ¯𝑥 − ¯𝑥 =𝑠 𝑝
1 2
√ +
𝑛 1 𝑛2
Example 10-6
A sample of 14 cans of Brand I diet soda gave the
mean number of calories of 23 per can with a standard
deviation of 3 calories. Another sample of 16 cans of
Brand II diet soda gave the mean number of calories
of 25 per can with a standard deviation of 4 calories.
At the 1% significance level using the critical value
approach, can you conclude that the mean number of
calories per can are different for these two brands of
diet soda? Assume that the calories per can of diet
soda are normally distributed for each of the two
brands and that the standard deviations for the two
populations are equal.
Example 10-6: Solution
   and be the mean numbers of calories per can for diet soda
Let
of Brand I and Brand II, respectively, and let and be the
means of the respective samples. From the given information,
= 14 = 23 =3
= 16 = 25 = 4
=.01

Step 1: State the null and alternative


We are to test for the difference in the mean numbers of
calories per can for the two brands. The null and alternative
hypotheses are, respectively,
Example 10-6: Solution
H0: μ1 – μ2 = 0
(The mean numbers of calories are not different.)
H1: μ1 – μ2 ≠ 0
(The mean numbers of calories are different.)

Step 2: Select the distribution to use


 The two samples are independent.
 The σ1 and σ2 are unknown but equal.
 The sample sizes are small but both populations are
normally distributed.
 We will use the t distribution.
Example 10-6: Solution
Step 3: Rejection and non rejection region
The ≠ sign in the alternative hypothesis indicates that the
test is two-tailed.
α = .01.
Area in each tail = α / 2 = .01 / 2 = .005
df = n1 + n2 – 2 = 14 + 16 – 2 = 28
Critical values of t are -2.763 and 2.763.
Figure 10.3
Example 10-6: Solution
Step 4: Calculate the test statistic

  
2 2
(𝑛1 −1)𝑠 +(𝑛2 −1)𝑠
𝑠 𝑝=
√ 1
𝑛1 +𝑛2 −2
2
= √(14−1)¿¿¿
Example 10-6: Solution

Step 5: Make a Decision


The value of the test statistic t = -1.531
 It falls in the nonrejection region.

Therefore, we fail to reject the null hypothesis.

Consequently, we conclude that there is no difference in


the mean numbers of calories per can for the two brands of
diet soda.
Example 10-7
A sample of 40 children from New York State showed
that the mean time they spend watching television is
28.50 hours per week with a standard deviation of 4
hours. Another sample of 35 children from California
showed that the mean time spent by them watching
television is 23.25 hours per week with a standard
deviation of 5 hours.

Using a 2.5% significance level, can you conclude that


the mean time spent watching television by children in
New York State is greater than that for children in
California? Assume that the standard deviations for the
two populations are equal.
Example 10-7: Solution
  40
= = 28.5 =4
= 35 = 23.25 =5
=.025

Step 1:
The mean time spent watching television by children in
New York State is equal to that for children in California.
This can be written as μ1 = μ2 or μ1 – μ2 = 0

Hence H0: μ1 – μ2 = 0

The mean time spent watching television by children in New


York State is greater than that for children in California. This
can be written as μ1 > μ2 or μ1 – μ2 > 0
Hence H1: μ1 – μ2 > 0
Example 10-7: Solution
Step 2:
 The two samples are independent.
 Standard deviations of the two populations are unknown
but assumed to be equal.
 Both samples are large.
 We use the t distribution to make the test.

Step 3:
α = .025
Area in the right tail of the t distribution = α = .025
df = n1 + n2 – 2 = 40 + 35 – 2 = 73
Critical value of t is 1.993
Figure 10.4
Example 10-7: Solution
Step 4:

  
2 2
(𝑛1 −1)𝑠 +(𝑛2 −1)𝑠
𝑠 𝑝=
√ 1
𝑛1 +𝑛2 −2
2
= √(40−1)¿¿¿
Example 10-7: Solution
Step 5:
The value of the test statistic t = 5.048.
 It falls in the rejection region.

Therefore, we reject the null hypothesis H0

Hence, we conclude that children in New York State spend


more time, on average, watching TV than children in
California.
Inferences about the difference between two
population means for Independent Samples: and
 unknown and assumed to be unequal
 What if all other assumptions of the previous section
hold true, but the population standard deviations are
not only unknown but also unequal?

In this case, the procedures used to conduct the


hypotheses test about μ1 – μ2 remain exactly like the
ones we learned, except for two differences.

1. We use a different degrees of freedom formula and


2. The standard deviation of - is not calculated using
the pooled standard deviation .
Degrees of Freedom
If
 these 3 conditions are satisfied:
 
1. The two samples are independent
2. The standard deviations and of the two populations are
unknown and unequal
3. At least one of the following two conditions is fulfilled:
I. Both samples are large (i.e., n1 ≥ 30 and n2 ≥ 30)
II. If either one or both sample sizes are small, then both
populations from which the samples are drawn are normally
distributed.

We can then conclude that


The t distribution is used to with a special degrees of freedom
formula which is provided in the next slide
Degrees of Freedom
  2 2 2
𝑠1 𝑠 2

𝑑𝑓 =
( +
𝑛1 𝑛2 )
2 2 2 2
𝑠 𝑠
( ) ( )
1
𝑛1
+
𝑛2
2

𝑛1 − 1 𝑛2 −1

The number given by this formula is always


rounded down for df.
Hypothesis Testing About μ1 – μ2
Test
   Statistic t for (- )

The value of the test statistic t is computed as

 
( ¯𝑥 1 − 𝑥¯2)−(𝜇 1 − 𝜇 2)
𝑡=
𝑠¯𝑥 − ¯𝑥
1 2

The value of μ1 – μ2 in this formula is substituted from the null


hypothesis and is calculated as explained in the next slide.
 Estimate of the Standard Deviation of (- )
Because the standard deviations of the two populations are
not known and not assumed to be equal , we use the
following formula to calculate the standard deviation

 
2 2
𝑠 𝑠

𝑠 ¯𝑥 − ¯𝑥 = +
1 2
1
𝑛1 𝑛 2
2
Example 10-9
A sample of 14 cans of Brand I diet soda gave the
mean number of calories per can of 23 with a standard
deviation of 3 calories. Another sample of 16 cans of
Brand II diet soda gave the mean number of calories
as 25 per can with a standard deviation of 4 calories.

Test at the 1% significance level using the critical value


approach whether the mean numbers of calories per
can of diet soda are different for these two brands.
Assume that the calories per can of diet soda are
normally distributed for each of these two brands and
that the standard deviations for the two populations are
not equal.
Example 10-9: Solution
  14
= = 23 =3
= 16 = 25 = 4
=.01

Step 1:
H0: μ1 – μ2 = 0
(The mean numbers of calories are not different)
H1: μ1 – μ2 ≠ 0
(The mean numbers of calories are different)
Example 10-9: Solution
Step 1:
H0: μ1 – μ2 = 0
(The mean numbers of calories are not different)
H1: μ1 – μ2 ≠ 0
(The mean numbers of calories are different)
Step 2:
 The two samples are independent.
 Standard deviations of the two populations are unknown

and unequal.
 Both populations are normally distributed.
 We use the t distribution to make the test.
Example 10-9: Solution
Step 3:
The ≠ in the alternative hypothesis indicates that the test is
two-tailed.
α = .01
Area in each tail = α / 2 = .01 / 2 =.005

2
  𝑠1
2
𝑠2
2

𝑑𝑓 =
( 𝑛1
+
𝑛2 ) =¿ ¿ ¿
2 2 2 2
𝑠 𝑠
( )
𝑛1
1

+
( )
𝑛2
2

𝑛1 − 1 𝑛2 − 1
The critical values of t are -2.771 and 2.771
Figure 10.5
Example 10-9: Solution
Step 4:
 

2 2
𝑠 𝑠
Step 5:
2
√ 1
𝑠 ¯𝑥 − 𝑥¯ = + =√ ¿ ¿ ¿
1
𝑛1 𝑛2
The test statistic t = -1.560
It falls in the nonrejection region.
2

Therefore, we fail to reject the null hypothesis.

Hence, there is no difference in the mean numbers of


calories per can for the two brands of diet soda.

You might also like