Professional Documents
Culture Documents
TWO-SAMPLE INFERENCE
WMY Chapter 5
Parts 8-9
WMY Chapter 6
Part 5
2
Motivation
In virtually every area of human activities, search is
continuously underway to develop modes of action or to
modify and revise existing techniques. The new methods or
techniques need to be compared with the old ones to see if
they are really "better".
Examples:
• Agriculture fields trials: To see if a new strain of seeds
produces higher yield compared to a current major variety.
• Drug evaluation: To see if the new drug is more efficient in
curing diseases.
• Effect of advertising campaign: To see whether the
campaign has an effect on daily sales of a certain product.
Methods for comparison could be interval estimation or testing
hypothesis, requiring knowledge of the sampling distribution. 3
Learning Objectives
Inference for Two Means (Known Variances)
Inference for Two Means (Unknown Variances)
Inference for Two Variances
lnference for Paired Samples
4
Hypothesis Testing for Two Population
Means
Two Population Means, Independent Samples
√
2 2
𝜇𝑋 =𝜇 1 − 𝜇 2
𝜎 𝜎 1 2
1 − 𝑋2 𝜎𝑋 − 𝑋2 = +
1
𝑛1 𝑛2
7
Sampling Distribution of the Difference
Between Two Population Means
Applying the laws of expected value and variance we have:
E ( X 1 X 2 ) E ( X 1 ) E ( X 2 ) 1 2
2
2
Var ( X 1 X 2 ) Var ( X 1 ) Var ( X 2 ) 1
2
n n
We can define: Z ( X 1 X 2 ) ( 1 2 )
12 22
n1 n2
8
C.I. for Two Normal Population Means
with Known Variances
If, from the first population , is the mean of a sample of size ,
and similarly from second population independent of the first,
is the mean of a sample of size , then
𝑥1 − 𝑥 2 ± 𝑧 𝛼/ 2
√ 𝜎 12 𝜎 22
+
𝑛1 𝑛2
9
Example 1
In one industry, the worker's wages are normally distributed
with variance 0.50. In a second industry, the worker's wages
are normally distributed with variance 0.25. From the first
industry, 20 workers are selected randomly and their mean
wage is calculated to be $5.00, while from the second
industry, 10 workers are selected randomly and their mean
wage is calculated to be $4.00. Find a 95% confidence
interval for the difference of the mean wages of the two
industries.
10
Example 1 (continued)
Solution:
i ¿ 𝑛1=20 ,𝑛 2=1 0 , 𝑥1=5.00 , 𝑥 2=4 .00𝜎,21=0.50 , 𝜎 22=0.25
𝛼 =0.05 , 𝑧 0.025 =1.96
√ √
2 2
𝜎 𝜎
1 2 0.50 0.25
ii ¿ 𝑧 ¿ 𝛼/ 2 + =1.96 + =0.4383
𝑛1 𝑛2 20 10
iii ¿ 𝑥 ¿1 − 𝑥2 ± 0.4383=1.00 ± 0.4383
iv) The 95% C.I. is (0.5617, 1.4383), which says that the true
difference of the mean wages of the two industries is
unknown, but we are 95% confident that it lies in the interval
(0.5617, 1.4383).
Since the C.I. does not contain the zero value, we conclude
with 95% confidence that the mean wage in industry 1 is
different from that of industry 2. 11
C.I. for Two Non-Normal Population
Means with Known Variances
If the two populations are not both normally distributed, but
n1≥ 30 and n2≥30, the distribution of X 1 X 2 is approximately
normal by the Central Limit Theorem.
𝑥1 − 𝑥 2 ± 𝑧 𝛼/ 2
√ 𝜎 12 𝜎 22
+
𝑛1 𝑛2
12
Example 2 (homework)
A study is made comparing the prices asked for existing one
family homes in two adjacent communities. In College Heights,
the mean asking price for a random sample of 50 homes is
$142,000. In University Gardens, the mean asking price for a
random sample of 35 homes is $168,000. The standard
deviations of asking prices of the two communities were known
to be $30,000 for College Heights and $40,000 for University
Gardens. Calculate a 98% C.I. for the difference in mean
asking prices.
13
Example 2 (continued)
Solution:
i) Since both sample sizes are ≥ 30, the use of the Z-interval
is justified.
ii)
𝜎 1=30,000 , 𝜎 2=4 0,000
𝑧 0.01=2.3263
√ 𝜎 21 𝜎 22
√
2 2
30 4 0
𝒊𝒊𝒊 ¿ 𝑧 𝛼 /2 + =2.3263 ( 1000 ) + =18,563
𝑛1 𝑛 2 50 35
iv ¿ 𝑥 1 − 𝑥 2 ±18.593=− 26,000 ±18,563
98% C.I. is given by (-44,563, -7,437)
14
Testing Two Normal Population Means
with Known Variances
Null Hypothesis 𝐻 0 : 𝜇 1 −𝜇 2=𝑑 0 where is a known
value
𝑥 1 − 𝑥2 − 𝑑0
𝑧=
√
Test Statistic
2 2
𝜎 𝜎
1 2
+
𝑛1 𝑛 2
Alternative Hypothesis Reject H0 at the α significance level if
or
15
Hypothesis Testing for Two Population
Means
Lower-tail test: Upper-tail test: Two-tail test:
a a a/2 a/2
16
Example 3
In the competition for top students, Vivford and Regins
publish their admission cut-off entry points in separate
releases, which indicate a standard deviation of 3.1 points
and 2.4 points for Vivford and Regins respectively.
To compare both colleges, a pre-college student advisor
conducts a research project with the help of his staff. 35
current students at Vivford are randomly chosen and they
have a sample mean of 9.2 points. A sample of 37 current
students at Regins yield a sample mean of 10.3 points.
Is the student advisor presented with evidence to prove there
is a difference in the average admission cut-off entry points
between Vivford and Regins, at 10% significance level?
17
Example 3 (continued)
Solution:
Let the mean of Vivford and Regins be μ1 and μ2 respectively.
H0: μ1 = μ2
H1: μ1 ≠ μ2 (2 tailed test)
18
Example 3 (continued)
Solution (cont’d):
Since P - value 2 P ( Z | - 1.68 | ) 0.1
We reject the null hypothesis at 10% significance level.
Conclusion: There is sufficient evidence to say there is a
difference in the average admission cut-off entry points
between Vivford and Regins, at 10% significance level.
Example 3 (continued)
Do not reject H0
Reject H0 Reject H0
/2 = .05 /2 = .05
.0465 .0465
-1.645 0 1.645
Z = -1.68 Z = 1.68
Reject H0 since p-value = .093 < = .10
20
Learning Objectives
Inference for Two Means (Known Variances)
Inference for Two Means (Unknown Variances)
Inference for Two Variances
lnference for Paired Samples
21
C.I. for Two Non-normal Population
Means with Unknown Variances
When both variances are unknown, they can be replaced by
the corresponding sample variances and when both and .
The approximate C.I. for is
√
2 2
𝑠 𝑠
1 2
𝑥1 − 𝑥 2 ± 𝑧 𝛼/ 2 +
𝑛1 𝑛 2
22
Example 4
A study is conducted comparing average starting salaries
offered to new B.A. recipients at two universities. A sample of
42 students from one school are offered an average of $1360
per month with a standard deviation of $320 while a sample of
48 students from the other school are offered an average of
$1320 with a standard deviation of $375. Construct a 95% C.I.
for the difference in the mean starting salaries.
23
Example 4 (continued)
Solution:
i) Since both sample sizes ≥ 30, we can use normal
approximations.
ii ¿ 𝑛1 =42 , 𝑛2=4 8 , , 𝑥 1=1360 , 𝑥 2=1320 ,
𝑠1= 320 , 𝑠 2=3 75
√ 𝑠 21 𝑠22
√
2 2
320 375
iii ¿ 𝑧 𝛼 /2 + =1.96 + =143.60
𝑛1 𝑛2 42 48
iv ¿ 𝑥 1 − 𝑥 2 ±143.60= 40 ± 143.60
When both sample sizes are larger than or equal to 30, the
two-sample Z-test can be applied to compare means of any two
independent populations.
25
Example 5 (homework)
26
Example 5 (continued)
Solution: H 0 : 𝜇 1=𝜇 2
H 1 : 𝜇1 ≠ 𝜇2
𝑛1=42 , 𝑛2=4 8 , 𝑥 1=1360 , 𝑥 2=1320 , 𝑠1=320 , 𝑠 2=3 75
𝛼=0.05 , 𝑧 0.025 =1.96
𝑥 1 − 𝑥2 − 𝑑0 1360 −1320 − 0
𝑧= = =0.546
√ √
2 2 2 2
𝑠 𝑠 320 375
1
+ 2
+
𝑛1 𝑛2 42 48
Since -1.96 < 0.546 < 1.96, we do not reject H0 at α=0.05
level of significance. We conclude that there is no difference
in mean salaries, which is consistent with the earlier part of
Example 3. 27
Testing Two Normal Population Means
with Variances Unknown but Equal
Consider again two normal populations and . Independent
samples of size and are drawn from the two populations,
resulting sample means are and and sample variances
and .
When sample sizes and are small and population variances
are unknown, the Z-interval previously used is no longer valid.
Similar to the one-sample case, we seek a t-interval
z
x 1
x 2 μ1 μ 2 becomes
𝑍=
( 𝑋 1 − 𝑋 2 ) −(𝜇 1 − 𝜇2 )
√
2 2
σ1 σ 2 1 1
𝜎 +
𝑛 1 𝑛2
n1 n 2 28
Pooled variances
As the two populations have the same variance, we ‘pool’ the
two sample information to give a more accurate (unbiased)
estimator of this common variance, called the pooled
estimator of the common variance. Calculate the pooled
variance estimate by:
n2 = 15
n1 = 10
Pooled (n1 1) s1 (n2 1) s2
2 2
Variance s p
2
2
S S 2
S 2
n1 n2 2
p 2
1
Estimator
( 𝑥1 − 𝑥 2 ) − ( 𝜇 1 − 𝜇2 )
𝑡=
𝑠𝑝
√ 1
𝑛1
+
1
𝑛2
is called the pooled two-sample t statistic. The test statistic
is a t value with n1 + n2 – 2 degrees of freedom.
The confidence interval for μ1 – μ2 is:
x 1
x 2 t/2, n1 n 2 -2 2
p
1
s
1
n1 n 2
30
Testing Two Normal Population Means
with Variances Unknown but Equal
When variances are unknown, the common practice is to
assume they are equal, and apply the pooled t-test.
Null Hypothesis 𝐻 0 : 𝜇 1 −𝜇 2=𝑑 0 Assumption:
Variances are equal
Test Statistic (𝑥 ¿ ¿ 1 − 𝑥2 )−𝑑 0
𝑡= ¿
Alternative Hypothesis
𝑠𝑝
√ 1 1
+
𝑛1 𝑛2
Reject H0 at the α significance level if
or
31
Example 6
You are a financial analyst for a brokerage firm. Is there
a difference in dividend yield between stocks listed on the
NYSE & NASDAQ? You collect the following data:
NYSE NASDAQ
Number 21 25
Sample mean 3.27 2.53
Sample std dev 1.30 1.16
32
Example 6 (continued)
33
Example 6 (continued)
s
2 n 1 1s1
2
n 2 1s 2
2
21 11.302 25 11.162
1.5021
p
(n1 1) (n 2 1) (21 - 1) (25 1)
2.040
Decision: Reject H0 at α = 0.05
Conclusion: There is evidence of a difference in means.
34
Example 7
Suppose you wish to compare a new method of teaching
reading to "slow learners" to the current standard method. You
decide to base this comparison on the results of a reading test
given at the end of a learning period of 6 months. Of a random
sample of 20 slow learners, 8 are taught by the new method
and 12 by standard method. The results are summarized
below. Estimate the true mean difference between the test
scores for the new method and the standard method using a
90% C.I. What assumptions must be made in order that the
estimate be valid?
New method:
Standard method:
35
Example 7 (continued)
Solution:
i) Assume: a) two population test scores are normally
distribution; b) variances of the two populations are the
same; c) two samples are independent
ii) 𝑛 =8 , 𝑛 =12, 𝑥 =76.9 , 𝑥 = 72.7 ,
1 2 1 2
For α=0.10, 𝑡 𝛼/ 2 ,𝑛
1 +𝑛2 − 2 =𝑡 0.05 ,18 =1.734
36
Example 7 (continued)
2 2
2 ( 1
𝑛 − 1 ) 1 ( 2 ) 2 7× 4.85 +11× 6.35
𝑠 + 𝑛 −1 𝑠 2 2
iii ¿𝑠 𝑝 = = =33.7892
𝑛1 +𝑛2 −2 8+12−2
iv ¿ 90% C . I . :
1 2
√
(𝑥 ¿ ¿1− 𝑥2 )±𝑡 𝛼/ 2 ,𝑛 +𝑛 − 2 𝑠 𝑝
1 1
+ ¿
𝑛1 𝑛 2
√
1 1
¿ (76.9 −72.7)±1.734 33.7892 +
8 12
The 90% C.I. is (-0.40, 8.80)
(
=4.20 ± 4.60)
v) At α=0.10 significance level, there is no sufficient
evidence from the data to show a difference. 37
Learning Objectives
Inference for Two Means (Known Variances)
Inference for Two Means (Unknown Variances)
Inference for Two Variances
Inference for Dependent Samples
38
Testing two normal population variances
2
2 2 𝜎 1
Null Hypothesis 𝐻 0 : 𝜎 = 𝜎 (¿ 1 2 2
=1)
2
𝜎 2
𝑠
1
2
𝑠1 / 𝜎2
Test Statistic 𝑓= 2 because
𝑓 𝑛 −1 , 𝑛 −1 = 2
1
2
𝑠
2
1 2
𝑠2 / 𝜎 2
40
Example 8
Refer to Example 6.
NYSE NASDAQ
Number 21 25
Mean 3.27 2.53
Std dev 1.30 1.16
41
Example 8 (continued)
Form the hypothesis test:
H0: σ12 = σ22 (there is no difference between variances)
H1: σ12 ≠ σ22 (there is a difference between variances)
Denominator:
n2 – 1 = 25 – 1 = 24 d.f. 42
Example 8 (continued)
We cannot read the lower critical value directly from the f
table; use the relation
Rejection
region
F
Do not Reject H0
reject H0
f = 1.256 is not in the rejection
region, so we do not reject H0 0.48 2.03
( ( ))
2 2
) (
𝑠1 1 𝑠1 1
,
𝑠2
2
𝑓 𝛼/ 2 ,𝑛1 − 1, 𝑛2 − 1 𝑠 22 𝑓 1 − 𝛼/ 2 ,𝑛 1 − 1, 𝑛2 −1
45
Example 9
In Example 7, we used the pooled t–statistic to compare the
mean reading scores of two groups of slow learners who had
been taught to read using two different methods. The pooled–t
was base on the assumption that the population variances of
the test scores were equal for the two methods. Check this
assumption using α = 0.10
46
Example 9 (continued)
Solution:
i ¿ 𝑛1=8 ,𝑛 2=12 , 𝑠 1=4.85 , 𝑠 2=6.35 , d . f . s are 7∧11
and
2
( ) ( )
𝑠1 1
2
4.85 1
ii ¿ = =0.1938;
𝑠 2 𝑓 𝛼 / 2 ,7 , 11
2 2
6.35 3.01
2
( )
𝑠1 1 4.85
2
= ×3.605=2.103
𝑠 2 𝑓 1 − 𝛼/ 2 ,7 , 11
2 2
6.35
iii) The 90% C.I. for is between 0.1938 and 2.103.
iv) Since the interval covers the value 1, meaning variances
are equal, there is no sufficient evidence against the
assumption of equal variances.
47
Learning Objectives
Inference for Two Means (Known Variances)
Inference for Two Means (Unknown Variances)
Inference for Two Variances
Inference for Dependent Samples
48
Paired Samples
49
Reducing Variability
The range of
observations
sample A
The range of
observations
sample B
Where are the (sample) pair differences, with mean and variance
defined as and .
52
Example 10
Manufacturers wish to determine whether an adjustment to a
machine setting will improve mean output by more than 10
units. They randomly selected 15 machines and recorded the
outputs before and after adjustment. The 15 pairs of outputs
give =13.3, and sd = 4.2. Perform a relevant test. What
assumption did you make to perform this test?
53
Example 10 (continued)
Solution:
𝐻 0 : 𝜇 𝐷 =10 vs 𝐻 1 : 𝜇 𝐷 > 10
𝑑 − 𝑑0 13.3 −10
𝑡= = =3.04
𝑠 𝑑 / √ 𝑛 4.2/ √ 15
55
Example 11 (continued)
Solution:
i) Assuming paired differences constitute a random sample from ,
use of the following C.I. is valid.
√
𝑛 𝑛
1 1
ii ¿ 𝑑= ∑ 𝑑𝑖 =8.80,∧𝑠 𝑑 = ∑ ( 𝑑 𝑖 − 𝑑 ) =10.98
2
15 𝑖=1 14 𝑖=1
𝑠𝑑
iii) The 96% C.I. for is computed as 𝑑 ± 𝑡 0.025 ,14 =8.80 ± 6.08
√𝑛
(2.72, 14.88)
Since 0 does not fall into the 95% confidence interval, we conclude
at the 5% level that medicine has an effect on blood pressure.
56
TEXTBOOK REFERENCES