You are on page 1of 57

SESSION 9*

TWO-SAMPLE INFERENCE
WMY Chapter 5
Parts 8-9
WMY Chapter 6
Part 5

*Some slides from Prof Yang Zhenlin


Recap

2
Motivation
In virtually every area of human activities, search is
continuously underway to develop modes of action or to
modify and revise existing techniques. The new methods or
techniques need to be compared with the old ones to see if
they are really "better".
Examples:
• Agriculture fields trials: To see if a new strain of seeds
produces higher yield compared to a current major variety.
• Drug evaluation: To see if the new drug is more efficient in
curing diseases.
• Effect of advertising campaign: To see whether the
campaign has an effect on daily sales of a certain product.
Methods for comparison could be interval estimation or testing
hypothesis, requiring knowledge of the sampling distribution. 3
Learning Objectives
 Inference for Two Means (Known Variances)
 Inference for Two Means (Unknown Variances)
 Inference for Two Variances
 lnference for Paired Samples

4
Hypothesis Testing for Two Population
Means
Two Population Means, Independent Samples

Lower-tail test: Upper-tail test: Two-tail test:

H0: μ1 = μ2 H0: μ1 = μ2 H0: μ1 = μ2


H1: μ1 < μ2 H1: μ1 > μ2 H1: μ1 ≠ μ2
i.e., i.e., i.e.,
H0: μ1 – μ2 = 0 H0: μ1 – μ2 = 0 H0: μ1 – μ2 = 0
H1: μ1 – μ2 < 0 H1: μ1 – μ2 > 0 H1: μ1 – μ2 ≠ 0
5
If Population is Normal or Applying CLT
Independent samples are drawn from each of two populations,
yielding 2 sample means x1 and x2 respectively.
If population 1 has mean μ1 and standard deviation , the
sampling distribution of x1 is also normally distributed with
μ X  μ1 σX 
σ1
1 1
n1 and
If population 2 has mean μ2 and standard deviation , the
sampling distribution of x2 is also normally distributed with
σ2
μ X  μ2 σX 
2 2
n2 and

This assumes that both sampling with replacement and


sampling without replacement from an infinite population 6
Confidence Interval for Two Population
Means
The distribution of X 1  X 2 is normal if:
• The two samples are independent
• The parent populations are normally distributed.

The mean and standard deviation of X 1  X 2 are given by


2 2
𝜇𝑋 =𝜇 1 − 𝜇 2
𝜎 𝜎 1 2
1 − 𝑋2 𝜎𝑋 − 𝑋2 = +
1
𝑛1 𝑛2
7
Sampling Distribution of the Difference
Between Two Population Means
Applying the laws of expected value and variance we have:

E ( X 1  X 2 )  E ( X 1 )  E ( X 2 )  1   2
 2
 2
Var ( X 1  X 2 )  Var ( X 1 )  Var ( X 2 )  1
 2
n n

We can define: Z  ( X 1  X 2 )  ( 1   2 )
 12  22

n1 n2
8
C.I. for Two Normal Population Means
with Known Variances
If, from the first population , is the mean of a sample of size ,
and similarly from second population independent of the first,
is the mean of a sample of size , then

100(1-α)% C.I. for of two independent normal populations is


the Z-interval given by

𝑥1 − 𝑥 2 ± 𝑧 𝛼/ 2
√ 𝜎 12 𝜎 22
+
𝑛1 𝑛2
9
Example 1
In one industry, the worker's wages are normally distributed
with variance 0.50. In a second industry, the worker's wages
are normally distributed with variance 0.25. From the first
industry, 20 workers are selected randomly and their mean
wage is calculated to be $5.00, while from the second
industry, 10 workers are selected randomly and their mean
wage is calculated to be $4.00. Find a 95% confidence
interval for the difference of the mean wages of the two
industries.

10
Example 1 (continued)
Solution:
i ¿ 𝑛1=20 ,𝑛 2=1 0 , 𝑥1=5.00 , 𝑥 2=4 .00𝜎,21=0.50 , 𝜎 22=0.25
𝛼 =0.05 , 𝑧 0.025 =1.96

√ √
2 2
𝜎 𝜎
1 2 0.50 0.25
ii ¿ 𝑧 ¿ 𝛼/ 2 + =1.96 + =0.4383
𝑛1 𝑛2 20 10
iii ¿ 𝑥 ¿1 − 𝑥2 ± 0.4383=1.00 ± 0.4383
iv) The 95% C.I. is (0.5617, 1.4383), which says that the true
difference of the mean wages of the two industries is
unknown, but we are 95% confident that it lies in the interval
(0.5617, 1.4383).
Since the C.I. does not contain the zero value, we conclude
with 95% confidence that the mean wage in industry 1 is
different from that of industry 2. 11
C.I. for Two Non-Normal Population
Means with Known Variances
If the two populations are not both normally distributed, but
n1≥ 30 and n2≥30, the distribution of X 1  X 2 is approximately
normal by the Central Limit Theorem.

100(1-α)% C.I. for of two independent normal populations is


the Z-interval given by

𝑥1 − 𝑥 2 ± 𝑧 𝛼/ 2
√ 𝜎 12 𝜎 22
+
𝑛1 𝑛2

12
Example 2 (homework)
A study is made comparing the prices asked for existing one
family homes in two adjacent communities. In College Heights,
the mean asking price for a random sample of 50 homes is
$142,000. In University Gardens, the mean asking price for a
random sample of 35 homes is $168,000. The standard
deviations of asking prices of the two communities were known
to be $30,000 for College Heights and $40,000 for University
Gardens. Calculate a 98% C.I. for the difference in mean
asking prices.

13
Example 2 (continued)
Solution:
i) Since both sample sizes are ≥ 30, the use of the Z-interval
is justified.
ii)
𝜎 1=30,000 , 𝜎 2=4 0,000
𝑧 0.01=2.3263

√ 𝜎 21 𝜎 22

2 2
30 4 0
𝒊𝒊𝒊 ¿ 𝑧 𝛼 /2 + =2.3263 ( 1000 ) + =18,563
𝑛1 𝑛 2 50 35
iv ¿ 𝑥 1 − 𝑥 2 ±18.593=− 26,000 ±18,563
98% C.I. is given by (-44,563, -7,437)
14
Testing Two Normal Population Means
with Known Variances
Null Hypothesis 𝐻 0 : 𝜇 1 −𝜇 2=𝑑 0 where is a known
value
𝑥 1 − 𝑥2 − 𝑑0
𝑧=


Test Statistic
2 2
𝜎 𝜎
1 2
+
𝑛1 𝑛 2
Alternative Hypothesis Reject H0 at the α significance level if
or

15
Hypothesis Testing for Two Population
Means
Lower-tail test: Upper-tail test: Two-tail test:

a a a/2 a/2

-za za -za/2 za/2


Reject H0 if z < -za Reject H0 if z > za Reject H0 if z < -za/2
or z > za/2

16
Example 3
In the competition for top students, Vivford and Regins
publish their admission cut-off entry points in separate
releases, which indicate a standard deviation of 3.1 points
and 2.4 points for Vivford and Regins respectively.
To compare both colleges, a pre-college student advisor
conducts a research project with the help of his staff. 35
current students at Vivford are randomly chosen and they
have a sample mean of 9.2 points. A sample of 37 current
students at Regins yield a sample mean of 10.3 points.
Is the student advisor presented with evidence to prove there
is a difference in the average admission cut-off entry points
between Vivford and Regins, at 10% significance level?
17
Example 3 (continued)
Solution:
Let the mean of Vivford and Regins be μ1 and μ2 respectively.

H0: μ1 = μ2
H1: μ1 ≠ μ2 (2 tailed test)

With n1=35 ≥ 30, n2=37 ≥ 30, by the Central Limit Theorem,


 x  x   μ  μ 
and with σ1 and σ2 known,
z
1 2 1 9.2  10.3  0
2
2 2
σ1 σ 2

(3.1) 2 (2.4) 2
Test statistic = = 
n1 n 2 35 37
= -1.68

18
Example 3 (continued)
Solution (cont’d):
Since P - value  2 P ( Z | - 1.68 | )  0.1
We reject the null hypothesis at 10% significance level.
Conclusion: There is sufficient evidence to say there is a
difference in the average admission cut-off entry points
between Vivford and Regins, at 10% significance level.
Example 3 (continued)

Do not reject H0
Reject H0 Reject H0
/2 = .05 /2 = .05

.0465 .0465

-1.645 0 1.645
Z = -1.68 Z = 1.68
Reject H0 since p-value = .093 <  = .10

20
Learning Objectives
 Inference for Two Means (Known Variances)
 Inference for Two Means (Unknown Variances)
 Inference for Two Variances
 lnference for Paired Samples

21
C.I. for Two Non-normal Population
Means with Unknown Variances
When both variances are unknown, they can be replaced by
the corresponding sample variances and when both and .
The approximate C.I. for is


2 2
𝑠 𝑠
1 2
𝑥1 − 𝑥 2 ± 𝑧 𝛼/ 2 +
𝑛1 𝑛 2

22
Example 4
A study is conducted comparing average starting salaries
offered to new B.A. recipients at two universities. A sample of
42 students from one school are offered an average of $1360
per month with a standard deviation of $320 while a sample of
48 students from the other school are offered an average of
$1320 with a standard deviation of $375. Construct a 95% C.I.
for the difference in the mean starting salaries.

23
Example 4 (continued)
Solution:
i) Since both sample sizes ≥ 30, we can use normal
approximations.
ii ¿ 𝑛1 =42 , 𝑛2=4 8 , , 𝑥 1=1360 , 𝑥 2=1320 ,
𝑠1= 320 , 𝑠 2=3 75

𝛼=0.05 , 𝑧 0.025 =1.96

√ 𝑠 21 𝑠22

2 2
320 375
iii ¿ 𝑧 𝛼 /2 + =1.96 + =143.60
𝑛1 𝑛2 42 48
iv ¿ 𝑥 1 − 𝑥 2 ±143.60= 40 ± 143.60

95% C.I. is given by (-103.60, 183.60)


24
Testing Two Non-normal Population
Means with Unknown Variances

When both sample sizes are larger than or equal to 30, the
two-sample Z-test can be applied to compare means of any two
independent populations.

When the population variances are unknown, replace them


by sample variances.

25
Example 5 (homework)

(Refer to Example 4) Perform a test of the hypothesis that the


average starting salaries offered to new B.A. recipients at two
universities are same against the alternative hypothesis that
they are different. Use α = 0.05. Is the conclusion consistent
with that of the earlier part of Example 3?

26
Example 5 (continued)
Solution: H 0 : 𝜇 1=𝜇 2
H 1 : 𝜇1 ≠ 𝜇2
𝑛1=42 , 𝑛2=4 8 , 𝑥 1=1360 , 𝑥 2=1320 , 𝑠1=320 , 𝑠 2=3 75
𝛼=0.05 , 𝑧 0.025 =1.96

𝑥 1 − 𝑥2 − 𝑑0 1360 −1320 − 0
𝑧= = =0.546

√ √
2 2 2 2
𝑠 𝑠 320 375
1
+ 2
+
𝑛1 𝑛2 42 48
Since -1.96 < 0.546 < 1.96, we do not reject H0 at α=0.05
level of significance. We conclude that there is no difference
in mean salaries, which is consistent with the earlier part of
Example 3. 27
Testing Two Normal Population Means
with Variances Unknown but Equal
Consider again two normal populations and . Independent
samples of size and are drawn from the two populations,
resulting sample means are and and sample variances
and .
When sample sizes and are small and population variances
are unknown, the Z-interval previously used is no longer valid.
Similar to the one-sample case, we seek a t-interval

Assuming , we obtain a standard normal variable of the form

z
x 1 
 x 2   μ1  μ 2  becomes
𝑍=
( 𝑋 1 − 𝑋 2 ) −(𝜇 1 − 𝜇2 )


2 2
σ1 σ 2 1 1
 𝜎 +
𝑛 1 𝑛2
n1 n 2 28
Pooled variances
As the two populations have the same variance, we ‘pool’ the
two sample information to give a more accurate (unbiased)
estimator of this common variance, called the pooled
estimator of the common variance. Calculate the pooled
variance estimate by:
n2 = 15
n1 = 10
Pooled (n1  1) s1  (n2  1) s2
2 2
Variance s p 
2
2
S S 2
S 2

n1  n2  2
p 2
1
Estimator

Example: s12 = 25; s22 = 30; n1 = 10; n2 = 15. Then,


(10  1)(25)  (15  1)(30)
s 
2
p  28.04347
10  15  2 29
Two Normal Population Means with
Variances Unknown but Equal

( 𝑥1 − 𝑥 2 ) − ( 𝜇 1 − 𝜇2 )
𝑡=
𝑠𝑝
√ 1
𝑛1
+
1
𝑛2
is called the pooled two-sample t statistic. The test statistic
is a t value with n1 + n2 – 2 degrees of freedom.
The confidence interval for μ1 – μ2 is:

x 1 
 x 2  t/2, n1  n 2 -2 2
p
1
s  
1 

 n1 n 2 
30
Testing Two Normal Population Means
with Variances Unknown but Equal
When variances are unknown, the common practice is to
assume they are equal, and apply the pooled t-test.
Null Hypothesis 𝐻 0 : 𝜇 1 −𝜇 2=𝑑 0 Assumption:
Variances are equal
Test Statistic (𝑥 ¿ ¿ 1 − 𝑥2 )−𝑑 0
𝑡= ¿

Alternative Hypothesis
𝑠𝑝
√ 1 1
+
𝑛1 𝑛2
Reject H0 at the α significance level if
or

31
Example 6
You are a financial analyst for a brokerage firm. Is there
a difference in dividend yield between stocks listed on the
NYSE & NASDAQ? You collect the following data:
NYSE NASDAQ
Number 21 25
Sample mean 3.27 2.53
Sample std dev 1.30 1.16

Assuming both populations are


approximately normal with equal
variances, is there a difference in
average yield ( = 0.05)?

32
Example 6 (continued)

H0: μ1 - μ2 = 0 i.e. (μ1 = μ2) Reject H0 Reject H0


H1: μ1 - μ2 ≠ 0 i.e. (μ1 ≠ μ2)
 = 0.05
.025 .025
d.f. = 21 + 25 - 2 = 44
Critical Values: t0.025,44 = ± 2.0154 -2.0154 0 2.0154 t

33
Example 6 (continued)
s 
2 n 1  1s1
2
 n 2  1s 2
2

21  11.302  25  11.162
 1.5021
p
(n1  1)  (n 2  1) (21 - 1)  (25  1)

Test Statistic: Reject H0 Reject H0


3.27  2.53
t 2.040
1 1  .025 .025
1.5021  
 21 25 
-2.0154 0 2.0154 t

2.040
Decision: Reject H0 at α = 0.05
Conclusion: There is evidence of a difference in means.
34
Example 7
Suppose you wish to compare a new method of teaching
reading to "slow learners" to the current standard method. You
decide to base this comparison on the results of a reading test
given at the end of a learning period of 6 months. Of a random
sample of 20 slow learners, 8 are taught by the new method
and 12 by standard method. The results are summarized
below. Estimate the true mean difference between the test
scores for the new method and the standard method using a
90% C.I. What assumptions must be made in order that the
estimate be valid?

New method:
Standard method:
35
Example 7 (continued)
Solution:
i) Assume: a) two population test scores are normally
distribution; b) variances of the two populations are the
same; c) two samples are independent
ii) 𝑛 =8 , 𝑛 =12, 𝑥 =76.9 , 𝑥 = 72.7 ,
1 2 1 2

𝑠1=4.85 , 𝑠2 =6.35 , d . f .=𝑛 1+ 𝑛2 − 2=18

For α=0.10, 𝑡 𝛼/ 2 ,𝑛
1 +𝑛2 − 2 =𝑡 0.05 ,18 =1.734

36
Example 7 (continued)
2 2
2 ( 1
𝑛 − 1 ) 1 ( 2 ) 2 7× 4.85 +11× 6.35
𝑠 + 𝑛 −1 𝑠 2 2
iii ¿𝑠 𝑝 = = =33.7892
𝑛1 +𝑛2 −2 8+12−2
iv ¿ 90% C . I . :

1 2

(𝑥 ¿ ¿1− 𝑥2 )±𝑡 𝛼/ 2 ,𝑛 +𝑛 − 2 𝑠 𝑝
1 1
+ ¿
𝑛1 𝑛 2


1 1
¿ (76.9 −72.7)±1.734 33.7892 +
8 12
The 90% C.I. is (-0.40, 8.80)
(
=4.20 ± 4.60)
v) At α=0.10 significance level, there is no sufficient
evidence from the data to show a difference. 37
Learning Objectives
 Inference for Two Means (Known Variances)
 Inference for Two Means (Unknown Variances)
 Inference for Two Variances
 Inference for Dependent Samples

38
Testing two normal population variances
2
2 2 𝜎 1
Null Hypothesis 𝐻 0 : 𝜎 = 𝜎 (¿ 1 2 2
=1)
2
𝜎 2
𝑠
1
2
𝑠1 / 𝜎2
Test Statistic 𝑓= 2 because
𝑓 𝑛 −1 , 𝑛 −1 = 2
1
2
𝑠
2
1 2
𝑠2 / 𝜎 2

Alternative Hypothesis Reject H0 at the α significance level if


or

and are the lower and upper point of


39
F Distribution

40
Example 8

Refer to Example 6.

NYSE NASDAQ
Number 21 25
Mean 3.27 2.53
Std dev 1.30 1.16

Is there a difference in the variances


between the NYSE NASDAQ at the  =
0.10 level?

41
Example 8 (continued)
Form the hypothesis test:
H0: σ12 = σ22 (there is no difference between variances)
H1: σ12 ≠ σ22 (there is a difference between variances)

Find the F critical values for  = .10/2:


Degrees of Freedom: f/2, n1 1 , n 2 1

Numerator (NYSE has the  f 0.10/2, 20, 24  2.03


larger standard deviation):
n1 – 1 = 21 – 1 = 20 d.f.

Denominator:
n2 – 1 = 25 – 1 = 24 d.f. 42
Example 8 (continued)
We cannot read the lower critical value directly from the f
table; use the relation

To obtain = 2.0825, then take the reciprocal 1/2.0825

a = .05 𝑓 (20, 24) ❑

Rejection
region

0.4802 2.0267 Rejection region 43


Example 8 (continued)
The test statistic is: H0: σ12 = σ22
H1: σ12 ≠ σ22
s12 1.30 2
f 2  2
 1.256
s 2 1.16
/2 = .05

F
Do not Reject H0
reject H0
f = 1.256 is not in the rejection
region, so we do not reject H0 0.48 2.03

Conclusion: There is not sufficient evidence


of a difference in variances at  = .10
44
CI for two normal population variances
A 100(1-α)% confidence interval for is

( ( ))
2 2

) (
𝑠1 1 𝑠1 1
,
𝑠2
2
𝑓 𝛼/ 2 ,𝑛1 − 1, 𝑛2 − 1 𝑠 22 𝑓 1 − 𝛼/ 2 ,𝑛 1 − 1, 𝑛2 −1

and are the lower and upper point of

To obtain the lower point, use the relation

45
Example 9
In Example 7, we used the pooled t–statistic to compare the
mean reading scores of two groups of slow learners who had
been taught to read using two different methods. The pooled–t
was base on the assumption that the population variances of
the test scores were equal for the two methods. Check this
assumption using α = 0.10

46
Example 9 (continued)
Solution:
i ¿ 𝑛1=8 ,𝑛 2=12 , 𝑠 1=4.85 , 𝑠 2=6.35 , d . f . s are 7∧11
and
2

( ) ( )
𝑠1 1
2
4.85 1
ii ¿ = =0.1938;
𝑠 2 𝑓 𝛼 / 2 ,7 , 11
2 2
6.35 3.01
2

( )
𝑠1 1 4.85
2
= ×3.605=2.103
𝑠 2 𝑓 1 − 𝛼/ 2 ,7 , 11
2 2
6.35
iii) The 90% C.I. for is between 0.1938 and 2.103.
iv) Since the interval covers the value 1, meaning variances
are equal, there is no sufficient evidence against the
assumption of equal variances.
47
Learning Objectives
 Inference for Two Means (Known Variances)
 Inference for Two Means (Unknown Variances)
 Inference for Two Variances
 Inference for Dependent Samples

48
Paired Samples

49
Reducing Variability

The range of
observations
sample A

The range of
observations
sample B

The values each sample consists of might markedly vary... 50


Reducing Variability
Differences

...but the differences between pairs of observations


might be quite close to one another, resulting in a small
variability of the differences.
The range of
0 the
differences 51
Paired t-test
 D  E ( X 1  X 2 )  E ( X 1 )  E ( X 2 )  1   2
Assumption:
Null Hypothesis 𝐻 0 : 𝜇 𝐷 ( ¿ 𝜇1 −𝜇 2)=𝑑 0 Observations
Test Statistic 𝑑 − 𝑑0 from population
𝑡= are normal
𝑠 𝑑 / √𝑛
Alternative Hypothesis Reject H0 at the α significance level if
or

Where are the (sample) pair differences, with mean and variance
defined as and .
52
Example 10
Manufacturers wish to determine whether an adjustment to a
machine setting will improve mean output by more than 10
units. They randomly selected 15 machines and recorded the
outputs before and after adjustment. The 15 pairs of outputs
give =13.3, and sd = 4.2. Perform a relevant test. What
assumption did you make to perform this test?

53
Example 10 (continued)
Solution:
𝐻 0 : 𝜇 𝐷 =10 vs 𝐻 1 : 𝜇 𝐷 > 10

𝑡 0.05 ,14 =1.761

𝑑 − 𝑑0 13.3 −10
𝑡= = =3.04
𝑠 𝑑 / √ 𝑛 4.2/ √ 15

Reject H0 at 5% level of significance. Data gives sufficient


evidence that an adjustment to a machine setting will
improve mean output by more than 10 units.

We assume the population of paired differences are from a


normal population in order to perform a t-test. 54
Example 11
A medical researcher wishes to determine if a pill has the
undesirable side effect of reducing the blood pressure of the
user. The study involves recording the initial blood pressure of
15 college-age women. After they used the pill for six months,
their blood pressures are again recorded. Construct a 95%
confidence interval to draw inferences about whether the
medicine has any effect on blood pressure based on the
observations given below.
Subject 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
Before (x) 70 80 72 76 76 76 72 78 82 64 74 92 74 68 84
After (y) 68 72 62 70 58 66 68 52 64 72 74 60 74 72 74
d=x-y 2 8 10 6 18 10 4 26 18 -8 0 32 0 -4 10

55
Example 11 (continued)
Solution:
i) Assuming paired differences constitute a random sample from ,
use of the following C.I. is valid.

Now, , d.f. = 14, and


𝑛 𝑛
1 1
ii ¿ 𝑑= ∑ 𝑑𝑖 =8.80,∧𝑠 𝑑 = ∑ ( 𝑑 𝑖 − 𝑑 ) =10.98
2

15 𝑖=1 14 𝑖=1
𝑠𝑑
iii) The 96% C.I. for is computed as 𝑑 ± 𝑡 0.025 ,14 =8.80 ± 6.08
√𝑛
(2.72, 14.88)

Since 0 does not fall into the 95% confidence interval, we conclude
at the 5% level that medicine has an effect on blood pressure.
56
TEXTBOOK REFERENCES

Chapter 5: One- and Two-Sample Estimation Problems


Relevant Sections: 8-9
Section Remarks
8 Excluding: “Unknown and Unequal Variances”
9

Chapter 6: One- and Two-Sample Tests of Hypotheses


Relevant Sections: 5
Section Remarks
5 Excluding: “Unknown but Unequal Variances”,
“Problem of Interaction in a Paired t-Test”,
“Annotated Computer Printout for Paired t-Test”

For more information on the F-distribution, you may refer to Chapter 4


Section 7 57

You might also like