0% found this document useful (0 votes)
26 views76 pages

Statistical Inference For Two Samples

Probability and statistics for Statistical Inference for Two Samples

Uploaded by

Li Thanjira
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
26 views76 pages

Statistical Inference For Two Samples

Probability and statistics for Statistical Inference for Two Samples

Uploaded by

Li Thanjira
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

8

Statistical Inference for Two Samples


Determining the Target Parameters
How would you try to answer these questions?
Who gets higher grades: males or females? Which program is faster to learn: Word or Excel?

Parameter Key Words or Phrases Type of Data

Mean difference; differences in


1 – 2 Quantitative
averages

Differences between proportions,


p1 – p2 percentages, fractions, or rates; Qualitative
compare proportions

(𝝈𝟏 )𝟐 Ration of variances; differences in


variability or spread; compare Quantitative
(𝝈𝟐 )𝟐 variation

A random sample of size 𝑛1 drawn from A random sample of size 𝑛2 drawn from
population 1 with mean μ1 and variance 𝜎12 . population 2 with mean μ2 and variance 𝜎22 .

Assumptions
1. Let 𝑋11 , 𝑋12 , … , 𝑋1𝑛1 be a random sample from population 1.
2. Let 𝑋21 , 𝑋22 , … , 𝑋2𝑛2 be a random sample from population 2.
3. The two populations X1 and X2 are independent.
4. Both X1 and X2 are normal.
Hypothesis Tests on the Difference in Means
Variances Known
Two independent populations.
Population 1 2 Population
1 2

1 2

Select simple random Compute X1 – X2 for Select simple random


sample, n1. Compute X1 every pair of samples sample, n2. Compute X2

Sampling
Astronomical number Distribution
of X1 – X2 values

 -
1 2

Test of Hypothesis for (µ1 – µ2)

𝑥lj 1 − 𝑥lj 2 − 𝐷0 𝜎12 𝜎22
Test statistic: 𝑧= ; 𝜎 = +
𝜎 𝑥lj 1 −𝑥lj 2 𝑥lj 1 −𝑥lj 2
𝑛1 𝑛2

is normally distributed with N(0,1).


One-Tailed Test
H0: (1 – 2)  𝐷0 [or H0: (1 – 2)  𝐷0 ]
H1: (1 – 2)  𝐷0 [or H1: (1 – 2)  𝐷0 ]
where 𝐷 0 = Hypothesized difference between the means
(the difference is often hypothesized to be equal to 0)
Rejection region: z < –z [or z > z when H1: (1 – 2) > 𝐷0 ]
Two-Tailed Test
H0: (1 – 2) = 𝐷0
H1: (1 – 2) ≠ 𝐷0
where 𝐷 0 = Hypothesized difference between the means
(the difference is often hypothesized to be equal to 0)
Rejection region: |z| > z2

Confidence Interval for (μ1 – μ2)


One-Sided Confidence Bounds
𝜎12 𝜎22
Upper Confidence Bound 𝜇1 − 𝜇2 ≤ 𝑥lj 1 − 𝑥lj 2 + 𝑧𝛼 +
𝑛1 𝑛2

𝜎12 𝜎22
Lower Confidence Bound 𝑥lj 1 − 𝑥lj 2 − 𝑧𝛼 + ≤ 𝜇1 − 𝜇2
𝑛1 𝑛2
Two-Sided Confidence Bounds

̄ ̄

Type II Error and Choice of Sample Size

The β-error for the two-sided alternative:

1  2 − 𝐷 1  2 − 𝐷
𝛽=Φ 𝑧 − − Φ −𝑧 −
𝜎 𝜎 𝜎 𝜎
+ +
𝑛 𝑛 𝑛 𝑛

The β-error for the one-sided alternative:

1  2 − 𝐷
𝛽 =Φ 𝑧 − ; 𝑯 right
𝜎 𝜎
+
𝑛 𝑛
1 − 2 − 𝐷0
𝛽right = 1 − Φ 𝑧𝛼 − ; 𝑯1 left
2
𝜎12 𝜎22
+
𝑛1 𝑛2

Two-sided alternative:
For the two-sided alternative hypothesis with significance level , the sample size n1 = n2 = n
required to detect a true difference in means of 𝐷 = 1 − 2 with power at least 1 −  is

One-sided alternative:
For a one-sided alternative hypothesis with significance level , the sample size n1 = n2 = n
required to detect a true difference in means of 𝐷 ( 𝐷0 ) with power at least 1 −  is

Choice of Sample Size (equal n)


𝜎𝑥lj 1 −𝑥lj 2 𝜎𝑥lj 1 −𝑥lj 2 𝑧𝛼/2 2
(𝑥lj 1 − 𝑥lj 2 ) ± 𝑍𝛼 = (𝑥lj 1 − 𝑥lj 2 ) ± 𝑒 then 𝑒 = 𝑍𝛼 𝑛= (𝜎12 + 𝜎22 )
2 𝑛 2 𝑛 𝑒
where e is a desired margin of error. Remember to round up if n is not an integer. This
ensures that the level of confidence does not drop below 100(1 − )%.
Summary on Hypothesis Tests

Null hypothesis: H0 : 1 − 2 = 𝐷0
𝑋ሜ 1 − 𝑋ሜ 2 − 𝐷0
Test statistic: 𝑍0 =
σ12 σ22
+
𝑛1 𝑛2

Rejection Criterion For


Alternative Hypotheses p-value
Fixed-Level Tests
Probability above |z0| and probability below −
H1: 1 − 2 ≠ 𝐷0 |z0|, P = 2[1 − (|z0|)]
z0  z2 or z0  −z2
Probability above z0,
H1: 1 − 2 > 𝐷0 P = 1 − (z0)
z0  z
Probability below z0,
H1: 1 − 2 < 𝐷0 P = (z0)
z0  −z
s=3.2586 s=3.2664
Example: Par, Inc.
Par, Inc. is a manufacturer of golf equipment and has developed a new golf ball that has
been designed to provide “extra distance.” In a test of driving distance using a mechanical
driving device, a sample of Par golf balls was compared with a sample of golf balls made by
Rap, Ltd., a competitor.
Based on data from driving distance tests, the two population standard deviations are known
with  1 = 15 yards and  2 = 20 yards. Let us develop a 95% confidence interval estimate of
the difference between the mean driving distances of the two brands of golf ball.
Point estimate: of 1 − 2 = 𝑥lj 1 − 𝑥lj 2 = 275 − 258 = 17 yards Sample #1 Sample #2
where: Par, Inc. Rap, Ltd.
1 = mean distance for the population of Par, Inc. Sample Size 120 balls 80 balls
2 = mean distance for the population of Rap, Ltd. Sample Mean 275 yards 258 yards
Interval Estimation:

𝜎12 𝜎22 (15)2 (20)2


𝑥lj 1 − 𝑥lj 2 ± 𝑧𝛼/2 + = 17 ± 1.96 + = 17 + 5.14 or 11.86 to 22.14 yards
𝑛1 𝑛2 120 80
We are 95% confident that the difference between the mean driving distances of Par, Inc.
balls and Rap, Ltd. balls is 11.86 to 22.14 yards. Can we conclude, using  = 0.01, that the
mean driving distance of Par, Inc. golf balls is greater than the mean driving distance of Rap,
Ltd. golf balls?
Hypothesis Tests: p–value and Critical Value Approaches
1. Develop the hypotheses. H0: 1 - 2 < 0
H1: 1 - 2 > 0
where:
1 = mean distance for the population of Par, Inc. Population 1 Population 2
2 = mean distance for the population of Rap, Ltd. Par, Inc. Golf Balls Rap, Ltd. Golf Balls
1 = mean driving distance 2 = mean driving distance
2. Specify the level of significance.  = 0.01 of Par golf balls of Rap golf balls

3. Compute the value of the test statistic.


m1 – 2 = difference between
the mean distances
(𝑥lj 1 −𝑥lj 2 )−𝐷0 (235−218)−0
𝑧= = Simple random sample of n1
𝜎2 2 (15)2 (20)2 Simple random sample of n1
1 +𝜎2 + 80 Par golf balls 𝒙lj 𝟏 = sample mean Rap golf balls 𝒙lj 𝟐 = sample mean
𝑛1 𝑛2 120 distance for the Par golf balls. distance for the Rap golf balls.

17 x1 - x2 = Point Estimate of m1 – 2
= = 6.49
2.62
p–value Approach
4. Compute the p–value. For z = 6.49, the p–value < 0.0001.
5. Determine whether to reject H0. Because p–value <  = 0.01, we reject H0.
At the 0.01 level of significance, the sample evidence indicates the mean driving distance of
Par, Inc. golf balls is greater than the mean driving distance of Rap, Ltd. golf balls.
Critical Value Approach
4. Determine the critical value and rejection rule. For  = 0.01, z.01 = 2.33 Reject H0 if z > 2.33
5. Determine whether to reject H0. Because z = 6.49 > 2.33, we reject H0.
The sample evidence indicates the mean driving distance of Par, Inc. golf balls is greater than
the mean driving distance of Rap, Ltd. golf balls.
Example: Large-Sample Confidence Interval
You’re a financial analyst for Charles Schwab. You want to estimate
the difference in dividend yield between stocks listed on NYSE and
NASDAQ. You collect the following data:
NYSE NASDAQ
Number 121 125
Mean 3.27 2.53
Std Dev 1.30 1.16 © 1984-1994 T/Maker Co.

What is the 95% confidence interval for the difference between the mean dividend yields?
Large-Sample Confidence Interval

𝜎12 𝜎22 (1.3)2 (1.16)2


𝑥lj 1 − 𝑥lj 2 ± 𝑧𝛼 + = (3.27 − 2.53) ± 1.96 +
2 𝑛1 𝑛2 121 125
= 0.43 ≤ 𝜇1 − 𝜇2 ≤ 1.05
You’re a financial analyst for Charles Schwab. You want to find out if there is a difference in
dividend yield between stocks listed on NYSE and NASDAQ. Is there a difference in average
yield ( = 0.05)?
H0: 1 - 2 = 0 (1 = 2) Test Statistic: Reject H 0 Reject H 0
H1: 1 - 2  0 (1  2) (3.27 − 2.53) − 0
n1 = 121 , n2 = 125 𝑧= = 4.71 0.025 0.025
1.692 1.162
Critical Value:  = 0.05 121
+
125
Decision: Reject at  = 0.05
0 z
Conclusion: There is evidence of a difference in means. -1.96 1.96
Example: Thinking Challenge
You’re an economist for the Department of Education. You want to find out if there is a
difference in spending per pupil between urban and rural high schools. You collect the
following:
Urban Rural
Number 35 35
Mean $ 6,012 $ 5,832
Std Dev $ 602 $ 497
Is there any difference in population means ( = 0.10)?
H0: 1 - 2 = 0 (1 = 2) Test Statistic: Reject H 0 Reject H
H1: 1 - 2  0 (1  2) 0
(6012 − 5832) − 0
n1= 35 , n2 = 35 𝑧= = 1.36 0.05 0.05
6022 4972
Critical Value:  = 0.1 35
+
35
Decision: Do not reject at  = 0.1 z
Conclusion: There is no evidence of a difference in means. -1.645 0 1.645
Tests on the Difference in Mean, Variance Known

𝐷0
𝐷0

𝐷0

𝐷0

𝐷0
Hypotheses Tests on the Difference in Means
Variances Unknown
Conditions Required for Valid Small-Sample Inferences

Case 1: 𝝈𝟐𝟏 = 𝝈𝟐𝟐 = 𝝈𝟐


We wish to test: H0: 1 − 2 = 𝐷 0
H1: 1 − 2  𝐷 0
(𝑛1 − 1)𝑆12 + (𝑛2 − 1)𝑆22
The pooled estimator of 2, denoted by 𝑆𝑝2 , is defined by 𝑆𝑝2 =
𝑛1 + 𝑛2 − 2
𝑋ሜ 1 − 𝑋ሜ 2 − (𝜇1 − 𝜇2 )
Test statistic: 𝑇=
1 1
𝑆𝑝 +
𝑛1 𝑛2

has a t-distribution with n1 + n2 − 2 degrees of freedom.

Confidence Interval (Independent Samples)


If 𝑥lj 1 , 𝑥lj 2 , 𝑠12 and 𝑠22 are the sample means and variances of two random samples of sizes n1
and n2, respectively, from two independent normal populations with unknown but equal
variances, then a 100(1 - )% confidence interval on the difference in means 1 - 2 is

1 1 1 1
𝑥lj 1 − 𝑥lj 2 − 𝑡𝛼/2,𝑛1 +𝑛2 −2 𝑠𝑝 + ≤ 𝜇1 − 𝜇2 ≤ 𝑥lj 1 − 𝑥lj 2 + 𝑡𝛼/2,𝑛1 +𝑛2 −2 𝑠𝑝 +
𝑛1 𝑛2 𝑛1 𝑛2

where 𝑠𝑝 is the pooled estimate of the common population standard deviation, and
𝑡α/2, 𝑛1 +𝑛2 −2 is the upper /2 percentage point of the t-distribution with n1 + n2 - 2 degrees of
freedom.
One-Tailed Test : H0: (1 – 2)  𝐷 0 [or H0: (1 – 2)  𝐷 0 ]
H1: (1 – 2)  𝐷 0 [or H1: (1 – 2)  𝐷 0 ]
𝑥lj 1 − 𝑥lj 2 − 𝐷0
Test statistic : 𝑡=
1 1
𝑠𝑝2 +
𝑛1 𝑛2
Rejection region : t < –t [or t > t when H1 : (1 – 2) > 𝐷 0 ]
where t is based on (n1 + n2 – 2) degrees of freedom.
𝑥lj 1 − 𝑥lj 2 − 𝐷0
Two-Tailed Test : H0: (1 – 2) = 𝐷 0 Test statistic : 𝑡=
1 1
H1: (1 – 2) ≠ 𝐷 0 𝑠𝑝2 +
𝑛1 𝑛2

Rejection region : |t| > t /2 where t/2 is based on (n1 + n2 – 2) degrees of freedom.
Case 1:Test Statistic for the Difference in Means

Case 2: 𝟐
𝟏
𝟐
𝟐

Confidence Interval (Independent Samples)

If and are the sample means and variances of two random samples of sizes n1
and n2, respectively, from two independent normal populations with unknown and unequal
variances, then a 100(1 - )% confidence interval on the difference in means 1 - 2 is
/ , / ,

where  is given degrees of freedom given by

Test statistic H0 :

where t is based on degrees of freedom.


Note: The value of  will generally not be an integer. Round  down to the nearest
integer to use the t-table.

Case 2:Test Statistic for the Difference in Means


What Should You Do if the Assumptions Are Not Satisfied?
If you are concerned that the assumptions are not satisfied, use the Wilcoxon rank sum test
for independent samples to test for a shift in population distributions.
Example: Two catalysts are being analyzed to determine how they affect the mean yield of a
chemical process. Specifically, catalyst 1 is currently in use, but catalyst 2 is acceptable. Since
catalyst 2 is cheaper, it should be adopted, providing it does not change the process yield. A
test is run in the pilot plant and results in the data shown in Table. Is there any difference
between the mean yields? Use  = 0.05, and assume equal variances.
The seven-step hypothesis-testing procedure is as follows:
1. Parameter of interest : The parameters of interest are 1 and 2, the mean process yield
using catalysts 1 and 2, respectively. Observation
Catalyst 1 Catalyst 2
Number
2. Null hypothesis : H0: 1 = 2 1 91.50 89.19
3. Alternative hypothesis : H1: 1  2 2 94.18 90.95
3 92.18 90.46
4. Test statistic : The test statistic is 4 95.39 93.21
5 91.79 97.19
𝑥lj 1 − 𝑥lj 2 − 0 6 89.07 97.04
𝑡0 = 7 94.72 91.07
1 1
𝑠𝑝 𝑛 + 𝑛 8 89.21 92.75
1 2 𝑥lj 1 = 92.255 𝑥lj 2 = 92.733
5. Reject H0 if : Reject H0 if the p-value is less than 0.05. s1 = 2.39 s2 = 2.98
6. Computations : From Table we have 𝑥lj 1 = 92.255
, s1 = 2.39, n1= 8, 𝑥lj 2 = 92.733, s2 = 2.98, and n2 = 8.
(𝑛1 − 1)𝑠12 + (𝑛2 − 1)𝑠22 (7)(2.39)2 + 7(2.98)2
Therefore 𝑠𝑝2 = = = 7.30
𝑛1 + 𝑛2 − 2 8 + 8 − 2
𝑥lj 1 − 𝑥lj 2 92.255 − 92.733
𝑠𝑝 = 7.30 = 2.70 and 𝑡0 = = = −0.35
1 1 1 1
2.70 + 2.70 +
𝑛1 𝑛2 8 8
7. Conclusions: From the Appendix, we can find t0.40,14 = 0.258 and t0.25,14 = 0.692. Since,
0.258  0.35  0.692 , we conclude that lower and upper bound on the p-value are
0.50  P  0.80. Therefore, since the p-value exceeds  = 0.05, the null hypothesis cannot
be rejected.
Interpretation : At 5% level of significance, we do not have strong evidence to conclude that
catalyst 2 results in a mean yield that differs from the mean yield when catalyst 1 is used.
Example: Two training procedures are compared by measuring the time that it takes trainees
to assemble a device. A different group of trainees are taught using each method. Is there a
difference in the two methods? Use  = 0.01 and assume equal variances.
Null hypothesis : 𝑯0 : 𝜇1 − 𝜇2 = 0 Time to
Method 1 Method 2
Assemble
Alternative hypothesis : 𝑯1 : 𝜇1 − 𝜇2 ≠ 0 Sample size 10 12

𝑥lj 1 − 𝑥lj 2 − 0 Sample mean 35 31
Test statistic : 𝑡=
1 1
𝑠2 + Sample Std
𝑛1 𝑛2 Dev
4.9 4.5

2
(𝑛1 − 1)𝑠12 + (𝑛2 − 1)𝑠22 9(4. 92 ) + 11(4. 52 )
Calculation: 𝑠 = = = 21.942
𝑛1 + 𝑛2 − 2 20
35 − 31
𝑡= = 1.994
1 1
21.942 +
10 12
𝒑 − 𝐯𝐚𝐥𝐮𝐞: two−sided ⇒ 𝑝−value = 2𝑃(𝑡 > 1.99)
0.025 < 𝑃(𝑡 > 1.99) < 0.05 ⇒ 0.05 < 𝑝 − value < 0.1

df = n1 + n2 – 2 = 10 + 12 – 2 = 20
Decision: since the p-value is greater than  = 0.01, H0 is not rejected.
Conclusion: there is insufficient evidence to indicate a difference in the population means.
Example : Cement Hydration
Ten samples of standard cement had an average weight percent calcium of 𝑥lj 1 = 90.0
with a sample standard deviation of s1 = 5.0, and 15 samples of the lead-doped cement had
an average weight percent calcium of 𝑥lj 2 = 87.0 with a sample standard deviation of s2 = 4.0.
Assume that weight percent calcium is normally distributed with same standard deviation.
Find a 95% confidence interval on the difference in means, 1 - 2, for the two types of
cement.
The pooled estimate of the common standard deviation is found as follows:
(𝑛1 − 1)𝑠12 + (𝑛2 − 1)𝑠22 9(5.0)2 + 14(4.0)2
𝑠𝑝2 = = = 19.52
𝑛1 + 𝑛2 − 2 10 + 15 − 2

𝑠𝑝 = 19.52 = 4.4
The 95% confidence interval is found
1 1 1 1
𝑥lj 1 − 𝑥lj 2 − 𝑡0.025,23 𝑠𝑝 + ≤ 𝜇1 − 𝜇2 ≤ 𝑥lj 1 − 𝑥lj 2 + 𝑡0.025,23 𝑠𝑝 +
𝑛1 𝑛2 𝑛1 𝑛2
Upon substituting the sample values and using t0.025,23 = 2.069,
1 1 1 1
90.0 − 87.0 − 2.069 4.4 + ≤ 𝜇1 − 𝜇2 ≤ 90.0 − 87.0 + 2.069 4.4 +
10 15 10 15

which reduces to -0.72  1 - 2  6.72


Example: You’re a financial analyst for Charles Schwab. You want to estimate the difference
in dividend yield between stocks listed on the NYSE and NASDAQ. You collect the following
data:
NYSE NASDAQ
Number 11 15
Mean 3.27 2.53
Std Dev 1.30 1.16
Assuming normal populations and equal variances, what is the 95%
confidence interval for the difference between the mean dividend yields?
df = n1 + n2 – 2 = 11 + 15 – 2 = 24 t.025 = 2.064
© 1984-1994 T/Maker Co.

𝑛1 − 1 𝑠12 + 𝑛2 − 1 𝑠22 11 − 1 1.30 2 + 15 − 1 1.16 2


𝑠𝑝2 = = = 1.489
𝑛1 + 𝑛2 − 2 11 + 15 − 2
Confidence Interval
1 1
3.27 − 2.53 ± 2.064 1.489 + ⇒ −0.26 ≤ 𝜇1 − 𝜇2 ≤ 1.74
11 15

Assuming normal populations, and population variances are equal. Is there a difference in
average yield ( = 0.05)?
H0: 1 – 2 = 0 (1 = 2) and H1: 1 – 2  0 (1  2)
df = 11 + 15 – 2 = 24 Critical Value(s) :  = 0.05 Reject H 0 Reject H0
𝑛1 − 1 𝑠12 + 𝑛2 − 1 𝑠22
𝑠𝑝2 = 0.025 0.025
𝑛1 + 𝑛2 − 2
11 − 1 1.30 2 + 15 − 1 1.16 2
= = 1.489
11 + 15 − 2 -2.064 0 2.064 t

𝑥lj 1 − 𝑥lj 2 − 𝜇1 − 𝜇2 3.27 − 2.53 − 0
Test Statistic: 𝑡= = = 1.53
1 1 1 1
𝑠𝑝2 + 1.489 +
𝑛1 𝑛2 11 15

Decision: Do not reject at  = 0.05


Conclusion: There is no evidence of a difference in means.
Example: Thinking Challenge
You’re a research analyst for General Motors. Assuming equal variances, is there a difference
in the average miles per gallon (mpg) of two car models ( = 0.05)? You collect the following:
Sedan Van
Number 15 11
Mean 22.00 20.27
Std Dev 4.77 3.64
H0: 1 – 2 = 0 (1 = 2) and H1: 1 – 2  0 (1  2)
df = 11 + 15 – 2 = 24 Critical Value(s) :  = 0.05
𝑛1 − 1 ⋅ 𝑠12 + 𝑛2 − 1 ⋅ 𝑠22 15 − 1 ⋅ 4.77 2 + 11 − 1 ⋅ 3.64 2
𝑠𝑝2 = = = 18.793
𝑛1 + 𝑛2 − 2 15 + 11 − 2
Test Statistic: Reject H 0 Reject H0

𝑥lj 1 − 𝑥lj 2 − 𝜇1 − 𝜇2 22.00 − 20.27 − 0 0.025


𝑡= = = 1.00 0.025
1 1 1 1
𝑠𝑝2 ⋅ + 18.793 ⋅ +
𝑛1 𝑛2 15 11
-2.064 0 2.064 t
Decision: Do not reject at  = 0.05
Conclusion: There is no evidence of a difference in means.
Example: An industrial consultant has suggested a modification of the existing method for
producing semiconductors. She claims that this modification will increase the number of
semiconductors a worker can produce in a day. To test the effectiveness of her ideas,
management has set up a small study. A group of 50 workers have been randomly divided
into two groups. One of the groups, consisting of 30 workers, has been trained in the
modification proposed by the consultant. The other group, acting as a control, has been
trained in a different modification. These two modifications are considered by
management to be roughly equal in complexity of learning and in time of implementation. In
addition, management is quite certain that the alternative (to the one proposed by the
consultant) modification would not have any real effect on productivity. Neither group was
told whether it was learning the consultant’s proposal or not.
The workers were then monitored for a period of time with the following results.
For those trained in the technique of the consultant:
The average number of semiconductors produced per worker was 242.
The sample variance was 62.2.
For those workers in the control group:
The average number of semiconductors produced per worker was 234.
The sample variance was 58.4.
Are these data sufficient to prove that the consultant’s modification will increase productivity?
Let x denote the mean number of semiconductors that would be produced over the period of
the study by workers trained in the method of the consultant. Let y denote the mean number
produced by workers given the alternative technique.
To prove the consultant’s claim that x > y, we need to test

Thus, the data are significant enough to prove that the consultant’s modification was more
effective than the one used by the control group.
Example: To test the effectiveness of a new cholesterol-lowering medication, 100 volunteers
were randomly divided into two groups of size 50 each. Members of the first group were
given pills containing the new medication, while members of the second, or control group
were given pills containing lovastatin, one of the standard medications for lowering blood
cholesterol. All the volunteers were instructed to take a pill every 12 hours for the next 3
months. None of the volunteers knew which group they were in.
Suppose that the result of this experiment was an average reduction of 8.2 with a sample
variance of 5.4 in the blood cholesterol levels of those taking the old medication, and an
average reduction of 8.8 with a sample variance of 4.5 of those taking the newer medication.
Do these results prove, at the 5 percent level, that the new medication is more effective than
the old one?
Solution
Let μx denote the mean cholesterol reduction of a volunteer who is given the new medication,
and let μy be the equivalent value for one given the control.
2
𝑠12 𝑠2
+ 2
𝑛1 𝑛2
𝑣=
𝑠12 /𝑛1 2 𝑠 2 /𝑛 2
+ 2 2
𝑛1 − 1 𝑛2 − 1

Since the p-value is greater than 0.05, the evidence is not strong enough to establish, at the
5 percent level of significance, that the new medication is more effective than the old.
Example: Twenty-two volunteers at a cold-research institute caught a cold after having been
exposed to various cold viruses. A random selection of 10 volunteers were given tablets
containing 1 gram of vitamin C. These tablets were taken 4 times a day. The control
group, consisting of the other 12 volunteers, was given placebo tablets that looked and
tasted exactly like the vitamin C ones. This was continued for each volunteer until a doctor,
who did not know whether the volunteer was receiving vitamin C or the placebo, decided that
the volunteer was no longer suffering from the cold. The length of time the cold lasted was
then recorded and assume equal variances.
Do these data prove that taking 4 grams of vitamin C daily reduces the time that a cold lasts?
At what level of significance? At the end of this experiment, the following data resulted:
Solution:
To prove the foregoing hypothesis, we need to reject the null hypothesis in a test of

where c is the mean time a cold lasts when the vitamin C tablets are taken and p is the
mean time when the placebo is taken.
Since, from Table, t20,.05 = 1.725, the null hypothesis is rejected at the 5 percent level of
significance. That is, the evidence is significant, at the 5 percent level, in establishing that
vitamin C reduces the mean time that a cold persists.
Example: Arsenic concentration in public drinking water supplies is a potential health risk. An
article in the Arizona Republic (May 27, 2001) reported drinking water arsenic concentrations
in parts per billion (ppb) for 10 metropolitan Phoenix communities and 10 communities in rural
Arizona. The data follow:

Metro Phoenix Rural Arizona


(𝒙𝟏 = 𝟏𝟐. 𝟓 , s1 = 7.63) (𝒙𝟐 = 𝟐𝟕. 𝟓 , s2 = 15.3)
Phoenix, 3 Rimrock, 48
Chandler, 7 Goodyear, 44
Gilbert, 25 New River, 40
Glendale, 10 Apache Junction, 38
Mesa, 15 Buckeye, 33
Paradise Valley, 6 Nogales, 21
Peoria, 12 Black Canyon City, 20
Scottsdale, 25 Sedona, 12
Tempe, 15 Payson, 1
Sun City, 7 Casa Grande, 18

Determine if there is any difference in mean arsenic concentrations between metropolitan


Phoenix communities and communities in rural Arizona.
The seven-step procedure is:
1. Parameter of interest : The parameters of interest are the mean arsenic concentrations
for the two geographic regions, say, 1 and 2, and we are interested in determining
whether 1 − 2 = 0.
2. Non hypothesis : H0 : 1 − 2 = 0, or H0 : 1 = 2
3. Alternative hypothesis : H1 : 1  2
𝑥lj 1 − 𝑥lj 2 − 0
4. Test statistic : The test statistic is 𝑡0∗ =
𝑠12 + 𝑠22
𝑛
5. Reject H0 if : The degrees of freedom on are found as 𝑣 = 18
Therefore, using  = 0.05 and a fixed-significance-level test, we would reject H0 : 1 = 2 if
𝑡0∗ > 𝑡0.025,18 = 2.101 or if 𝑡0∗ < −𝑡0.025,18 = −2.101 .

6. Computations: Using the sample data we find


𝑥lj 1 − 𝑥lj 2 12.5 − 27.5
𝑡0∗ = = = −2.77
7.63 2 15.3 2
𝑠12 𝑠22 +
+ 10 10
𝑛1 𝑛2

7. Conclusions: Because 𝑡0∗ = −2.77 < 𝑡0.025,18 = −2.101, we reject the null hypothesis.

Interpretation: We can conclude that mean arsenic concentration in the drinking water in
rural Arizona is different from the mean arsenic concentration in metropolitan Phoenix
drinking water.
Exercise: Sample weights (in pounds) of newborn babies born in two adjacent counties in
western Pennsylvania yielded the following data:
n = 53 m = 44
𝑋ሜ = 6.8 𝑌ሜ = 7.2
S2 = 5.2 S2 = 4.9

Consider a test of the hypothesis that the mean weight of newborns is the same in both
counties. What is the resulting p value? How would you express your conclusions to an
intelligent person who has not yet studied statistics?

Exercise: To determine the effectiveness of a new method of teaching reading to young


children, a group of 20 nonreading children were randomly divided into two groups of 10 each.
The first group was taught by a standard method and the second group by an experimental
method. At the end of the school term, a reading examination was given to each of the
students, with the following summary statistics resulting:

Are these data strong enough to prove, at the 5 percent level of significance, that the
experimental method results in a higher mean test score?
Paired-Difference in Mean Test
Sometimes the assumption of independent samples is intentionally violated, resulting in a
matched-pairs or paired-difference test.
Pair 1 2 … n
With a matched-sample design each
Population
sampled item provides a pair of data values. 1
x11 x12 … x1n
This design often leads to a smaller
sampling error than the independent-sample Population
x21 x22 … x2n
2
design because variation between sampled
items is eliminated as a source of sampling Difference d1= x11- x21 d2= x12- x22 … dn= x1n- x2n
error.
Conditions Required for Valid Large-Sample Inferences about d
1. A random sample of differences is selected from the target population of differences.
2. The sample size nd is large (i.e., nd ≥ 30); due to the Central Limit Theorem, this condition
guarantees that the test statistic will be approximately normal regardless of the shape
of the underlying probability distribution of the population.

𝑑ሜ − 𝐷0 𝑑ሜ − 𝐷0
Large Sample Test statistic : 𝑧= 𝜎 ≈ 𝑠
𝑑 𝑑
𝑛𝑑 𝑛𝑑

One-Tailed Test : H0: d  𝐷 0 [or H0: d  𝐷 0 ]


H1: d  𝐷 0 [or H1: d  𝐷 0 ]
Rejection region : z < –z [or z > z when H1 : (d > 𝐷 0 ]
Two-Tailed Test : H0 : d = 𝐷 0
H1 : d ≠ 𝐷 0
Rejection region : |z| < z2

Conditions Required for Valid Small-Sample Inferences about d

𝑑ሜ − 𝐷0
Small Sample Test statistic : 𝑡= 𝑠
𝑑
𝑛𝑑

One-Tailed Test : H0: d  𝐷 0 [or H0: d  𝐷 0 ]


H1: d  𝐷 0 [or H1: d  𝐷 0 ]
Rejection region : t < –t [or t > t when H1: (d > 𝐷 0 ]
where t is based on (nd – 1) degrees of freedom.

Two-Tailed Test : H0: d = 𝐷 0


H1: d ≠ 𝐷 0
Rejection region : |t| > t2
where t2 is based on (nd – 1) degrees of freedom.
A Confidence Interval for D from Paired Samples
If 𝑑ሜ and sD are the sample mean and standard deviation of the difference of n random pairs
of normally distributed measurements, a 100(1 - )% confidence interval on the difference in
means D = 1 - 2 is
𝑑ሜ − 𝑡𝛼/2,𝑛−1 𝑠𝐷 / 𝑛 ≤ 𝜇𝐷 ≤ 𝑑ሜ + 𝑡𝛼/2,𝑛−1 𝑠𝐷 / 𝑛

where tα/2,n-1 is the upper /2% point of the t distribution with n - 1 degrees of freedom.
𝜎𝑑 𝑠𝑑
For the Large Sample 𝑑ሜ ± 𝑧𝛼 ≈ 𝑑ሜ ± 𝑧𝛼
2 𝑛𝑑 2 𝑛𝑑
Summary Paired t-Test

𝐷0
𝐷0

𝐷0

𝐷0
𝐷0
Example: Shear Strength of Steel Girder
An article in the Journal of Strain Analysis [1983, Vol. 18(2)] reports a comparison of
several methods for predicting the shear strength for steel plate girders. Data for two of
these methods, the Karlsruhe and Lehigh procedures, when applied to nine specific
girders, are shown in the table below.
Girder Karlsruhe Method Lehigh Method Difference dj

S1/1 1.186 1.061 0.125


S2/1 1.151 0.992 0.159
S3/1 1.322 1.063 0.259
S4/1 1.339 1.062 0.277
S5/1 1.2 1.065 0.135
S2/1 1.402 1.178 0.224
S2/2 1.365 1.037 0.328
S2/3 1.537 1.086 0.451
S2/4 1.559 1.052 0.507

Determine whether there is any difference (on the average) for the two methods.
The seven-step procedure is:
1. Parameter of interest: The parameter of interest is the difference in mean shear
strength for the two methods.
2. Null hypothesis : H0 : D = 0
3. Alternative hypothesis : H1 : D  0 𝑑ሜ
4. Test statistic : The test statistic is 𝑡0 =
𝑠𝑑 / 𝑛
5. Reject H0 if: Reject H0 if the p-value is < 0.05.
6. Computations: The sample average and standard deviation of the differences dj are
𝑑ሜ = 0.2769 and sd = 0.1350, and so the test statistic is
𝑑ሜ 0.2769
𝑡0 = = = 6.15
𝑠𝑑 / 𝑛 0.1350/ 9

7. Conclusions : Because t0.0005.8 = 5.041 and the value of the test statistic t0 = 6.15
exceeds this value, the p-value is less than 2(0.0005) = 0.001. Therefore, we conclude
that the strength prediction methods yield different results.
Interpretation: The data indicate that the Karlsruhe method produces, on the average,
higher strength predictions than does the Lehigh method.

Example: Parallel Park Cars

The journal Human Factors (1962, pp. 375-380) reported a study in which n = 14 subjects
were asked to parallel park two cars having very different wheel bases and turning radii. The
time in seconds for each subject was recorded and is given in the Table. From the column of
observed differences, we calculate 𝑑ҧ = 1.21 and 𝑠𝑑 = 12.68. Find the 90% confidence
interval for 𝜇𝐷 = 𝜇1 − 𝜇2 .
Notice that the confidence interval on D includes zero. This implies that, at the 90% level of
confidence, the data do not support the claim that the two cars have different mean parking
times 1 and 2. That is, the value D = 1 - 2 = 0 is not inconsistent with the observed
data.
𝑑ሜ − 𝑡0.05,13 𝑠𝐷 / 𝑛 ≤ 𝜇𝐷 ≤ 𝑑ሜ + 𝑡0.05,13 𝑠𝐷 / 𝑛

1.21 − 1.771 12.68 / 14 ≤ 𝜇𝐷 ≤ 1.21 + 1.771 12.68 / 14


−4.79 ≤ 𝜇𝐷 ≤ 7.21
Subject 1(x1j) 2(x2j) (dj)
1 37.0 17.8 19.2
2 25.8 20.2 5.6
3 16.2 16.8 -0.6
4 24.2 41.4 -17.2
5 22.0 21.4 0.6
6 33.4 38.4 -5.0
7 23.8 16.8 7.0
8 58.2 32.2 26.0
9 33.6 27.8 5.8
10 24.4 23.2 1.2
11 23.4 29.6 -6.2
12 21.2 20.6 0.6
13 36.2 32.2 4.0
14 29.8 53.8 -24.0

Example: You work in Human Resources. You want to see if there is a difference in test
scores after a training program. You collect the following test score data:
Find a 90% confidence interval for the mean Name Before (1) After (2)
difference in test scores. Sam 85 94
Tamika 94 87
Confidence Interval Brian 78 79
Mike 87 88
df = nd – 1 = 4 – 1 = 3 , t.05 = 2.353
𝑆𝑑 6.53
𝑑ሜ ± 𝑡𝛼 ⇒ −1 ± 2.353 ⇒ −8.68 ≤ 𝜇𝑑 ≤ 6.68
2 𝑛𝑑 4
Aft
Observation Before Difference
er
Sam 85 94 –9

Tamika 94 87 7

Brian 78 79 –1

Mike 87 88 –1

Total d = –1 sd = 6.53 –4

You work in Human Resources. You want to see if a training program is effective. At the 0.10
level of significance, was the training effective?
Null Hypothesis
1. Was the training effective? H0 : d = 0 (d = B – A)
2. Effective means ‘Before’ < ‘After’. H1 : d < 0
df = 4 – 1 = 3
3. Statistically, this means B < A.
Critical Value :  = 0.1
4. Rearranging terms gives B – A < 0.
5. Defining d = B – A and substituting into (4) gives d  0.
6. The alternative hypothesis is H1: d  0. Reject H 0

𝑑ሜ − 𝐷0 −1 − 0
Test Statistic: 𝑡= = = −0.306
𝑆𝑑 6.53 0.10
𝑛𝑑 4
Decision: Do not reject at  = 0.10
-1.638 0 t
Conclusion: There is no evidence training was effective.
(1) (2)
Example: Thinking Challenge store Client Competitor
You’re a marketing research analyst. You want to 1 $ 10 $ 11
compare a client’s calculator to a competitor’s. You 2 8 11
3 7 10
sample 8 retail stores. At the 0.01 level of significance, 4 9 12
does your client’s calculator sell for less than their 5 11 11
competitor’s? 6 10 13
7 9 12
H0 : d = 0 (d = 1 – 2) df = 8 – 1 = 7 8 8 10
H1 : d < 0 Critical Value :  = 0.01 Reject H0

𝑑ሜ − 𝑑0 −2.25 − 0
Test Statistic: 𝑡= 𝑠 = = −5.486
𝑑 1.16 0.01
𝑛𝑑 8
Decision : Reject at  = 0.01
Conclusion: There is evidence client’s brand (1) sells for less. -2.998 0 t

Example: The management of a chain of stores wanted to determine whether advertising


tended to increase its sales of women’s shoes. To do so, management determined the
number of shoe sales at six stores during a
two-week period. While there were no
advertisements in the first week, advertising was
begun at the beginning of the second week.
Assuming that any change in sales is due solely to
the advertising, do the resulting data prove that
advertising increases the mean number of sales?
Use the 1 percent level of significance.
Solution: Letting Di denote the increase in sales at store i, we need to check if the data are
significant enough to establish that d > 0. Hence, we should test
H0: d ≤ 0 against H1: d > 0
Using the data values 8, 6, 22, 15, 17, 5
TS = 4.349 p-value = 0.0038
Thus the hypothesis that advertising does not result in increased sales is rejected at any
significance level greater than or equal to 0.0038. Therefore, it is rejected at the 1 percent level
of significance.
Example: Express Deliveries
A Chicago-based firm has documents that must be quickly distributed to district offices
throughout the U.S. The firm must decide between two delivery services, UPX (United
Parcel Express) and INTEX (International Express), to transport its documents. In testing the
delivery times of the two services, the firm sent two reports to a random sample of its district
offices with one report carried by UPX and the other report carried by INTEX. Do the data on
the next slide indicate a difference in mean delivery times for the two services? Use a .05
level of significance.
p –value and Critical Value Approaches
1. Develop the hypotheses. H0: d = 0
H1: d  0
Let d = the mean of the difference values for the two delivery services for the population of
district offices.
2. Specify the level of significance.  = 0.05
3. Compute the value of the test statistic. Delivery Time (Hours)
District Office UPX INTEX Difference
σ 𝑑𝑖 (7 + 6+. . . +5)
𝑑ሜ = = = 2.7 Seattle 32 25 7
𝑛 10 Los Angeles 30 24 6
Boston 19 15 4
ሜ 2
σ(𝑑𝑖 − 𝑑) 76.1 Cleveland 16 15 1
𝑠𝑑 = = = 2.9 New York 15 13 2
𝑛−1 9 18 15 3
Houston
𝑑ሜ − 𝜇𝑑 2.7 − 0 Atlanta 14 15 -1
𝑡= 𝑠 = = 2.94 St. Louis 10 8 2
𝑑 2.9 Milwaukee 7 9 -2
𝑛 10 Denver 16 11 5

p –value approach
4. Compute the p–value.
For t = 2.94 and df = 9, the p–value is between 0.02 and 0.01. (This is a two-tailed test, so
we double the upper-tail areas of 0.01 and 0.005.)
5. Determine whether to reject H0. Because p–value <  = 0.05, we reject H0.
We are at least 95% confident that there is a difference in mean delivery times for the two
services?
Critical Value Approach
4. Determine the critical value and rejection rule.
For  = 0.05 and df = 9, t.025 = 2.262. Reject H0 if t > 2.2
5. Determine whether to reject H0. Because t = 2.94 > 2.262, we reject H0.
We are at least 95% confident that there is a difference in mean delivery times for the two
services?
Paired Versus Unpaired Experiments

The pros and cons of pairing can now be summarized as follow


1. If there is great heterogeneity between experimental units and a large correlation within
experimental units (large positive r), then the loss in degrees of freedom will be
compensated by the increased precision associated with pairing, so a paired experiment is
preferable to an independent-samples experiment.
2. If the experimental units are relatively homogeneous and the correlation within pairs is not
large, the gain in precision due to pairing will be outweighed by the decrease in degrees of
freedom, so an independent-samples experiment should be used.

Exercise:
Inference on Two Population Proportions

Large-Sample Test of Hypothesis

Test statistic : where and

where

One-Tailed Test : H0 : (p1 – p2) = 0


H1 : (p1 – p2) < 0 [or H1 : (p1 – p2) > 0 ]
Rejection region : z < –z [or z > z when H1: (p1 – p2) > 0 ]

Two-Tailed Test : H0 : (p1 – p2) = 0


H1 : (p1 – p2) ≠ 0
Rejection region : |z| < z
Large-Sample (1 – )% Confidence Interval

𝑝Ƹ1 𝑞ො1 𝑝Ƹ 2 𝑞ො2 𝑝Ƹ1 𝑞ො1 𝑝Ƹ 2 𝑞ො2


𝑝Ƹ1 − 𝑝Ƹ 2 − 𝑧α/2 + ≤ 𝑝1 − 𝑝2 ≤ 𝑝Ƹ1 − 𝑝Ƹ 2 + 𝑧α/2 +
𝑛1 𝑛2 𝑛1 𝑛2

where z/2 is the upper /2 percentage point of the standard normal distribution.
Approximate Tests on the Difference of Two Population Proportions
Type II Error and Choice of Sample Size
If the alternative hypothesis is two sided, the b-error is

𝑧 / 𝑝̄ 𝑞̄(1/𝑛 + 1/𝑛 ) − (𝑝 − 𝑝 ) −𝑧 ⁄ 𝑝̄ 𝑞̄ 1⁄𝑛 + 1⁄𝑛 − 𝑝 −𝑝


𝛽=Φ −Φ
𝜎 𝜎

𝑝𝑞 𝑝 𝑞 𝑥 +𝑥
where 𝜎 ̂ ̂ = + and 𝑝̂ =
𝑛 +𝑛
.
𝑛 𝑛

𝑧 𝑝̄ 𝑞̄(1/𝑛 + 1/𝑛 ) − (𝑝 − 𝑝 )
If the alternative hypothesis is H1: p1  p2, 𝛽 =Φ
𝜎

−𝑧 𝑝̄ 𝑞̄(1/𝑛 + 1/𝑛 ) − (𝑝 − 𝑝 )
and if the alternative hypothesis is H1: p1  p2, 𝛽 =1−Φ
𝜎

For the two-sided alternative, the common sample size is

[𝑧 / (𝑝 + 𝑝 )(𝑞 + 𝑞 )/2 + 𝑧 𝑝 𝑞 +𝑝 𝑞 ]
𝑛=
(𝑝 − 𝑝 )

where q1 = 1  p1 and q2 = 1  p2.


Determination of Sample Size for Estimating µ1 – µ2

To estimate (1 – 2) with a given margin of error ME and with confidence level (1 – ), use
the following formula to solve for equal sample sizes that will achieve the desired reliability:

𝑧𝛼 2
𝜎12 + 𝜎22
2
𝑛1 = 𝑛2 =
(ME)2

You will need to substitute estimates for the values of 𝜎12 and 𝜎22 before solving for the
sample size. These estimates might be sample variances 𝑠12 and 𝑠22 from prior sampling (e.g.,
a pilot sample) or from an educated (and conservatively large) guess based on the range
that is, s ≈ R/4.

Determination of Sample Size for Estimating p1 – p2

To estimate (p1 – p2) with a given margin of error ME and with confidence level (1 – ), use
the following formula to solve for equal sample sizes that will achieve the desired reliability:
2
𝑧𝛼 𝑝1 𝑞1 + 𝑝2 𝑞2
2
𝑛1 = 𝑛2 =
(ME)2

You will need to substitute estimates for the values of p1 and p2 before solving for the sample
size. These estimates might be based on prior samples, obtained from educated guesses, or,
most conservatively, specified as p1 = p2 = 0.5.
Discussions
The ideal way to test the hypothesis that the results of two different treatments are identical is
to randomly divide a group of people into a set that will receive the first treatment and one
that will receive the second. However, such randomization is not always possible. For
instance, if we want to study whether drinking alcohol increases the risk of prostate cancer,
we cannot instruct a randomly chosen sample to drink alcohol. An alternative way to study the
hypothesis is to use an observational study that begins by randomly choosing a set of
drinkers and one of nondrinkers. These sets are followed for a period of time and the
resulting data are then used to test the hypothesis that members of the two groups have the
same risk for prostate cancer.
Example: St. John's Wort
Extracts of St. John's Wort are widely used to treat depression. An article in the April 18,
2001, issue of the Journal of the American Medical Association compared the efficacy of a
standard extract of St. John's Wort with a placebo in 200 outpatients diagnosed with major
depression. Patients were randomly assigned to two groups; one group received the St.
John's Wort, and the other received the placebo. After eight weeks, 19 of the placebo-treated
patients showed improvement, and 27 of those treated with St. John's Wort improved. Is there
any reason to believe that St. John's Wort is effective in treating major depression? Use  =
0.05. The seven-step hypothesis testing procedure leads to the following results:
1. Parameter of interest: The parameters of interest are p1 and p2, the proportion of patients
who improve following treatment with St. John's Wort (p1) or the placebo (p2).
2. Null hypothesis : H0 : p1 = p2
3. Alternative hypothesis : H1 : p1 > p2
𝑝Ƹ1 − 𝑝Ƹ 2
4. Test Statistic : The test statistic is 𝑧0 =
1 1
𝑝(1
Ƹ − 𝑝)Ƹ +
𝑛1 𝑛2
where 𝑝Ƹ1 = 27/100 = 0.27 , 𝑝Ƹ 2 = 19/100 = 0.19 , n1 = n2 = 100, and
𝑥1 + 𝑥2 19 + 27
𝑝Ƹ = = = 0.23
𝑛1 + 𝑛2 100 + 100
5. Reject H0 if : Reject H0 : p1 = p2 if the p-value is less than 0.05.
0.27 − 0.19
6. Computations : The value of the test statistic is 𝑧0 = = 1.34
1 1
0.23(0.77) +
100 100
7. Conclusions : Since z0 = 1.34, the p-value is P = 1 − (1.34) = 0.09,
we cannot reject the null hypothesis.
Interpretation: There is insufficient evidence to support the claim that St. John's Wort is
effective in treating major depression.
Example: Consider the process of manufacturing crankshaft hearings. Suppose that a
modification is made in the surface finishing process and that, subsequently, a second
random sample of 85 bearings is obtained. The number of defective bearings in this
second sample is 8. Therefore, because n1 = 85 , n2 = 85 , and 𝑝Ƹ1 = 10/85 = 0.1176, 𝑝Ƹ 2 =
8/85 = 0.0941. An 95% confidence interval on the difference in the proportion of detective
bearings produced under the two processes.

𝑝Ƹ1 𝑞ො1 𝑝Ƹ 2 𝑞ො2 𝑝Ƹ1 𝑞ො1 𝑝Ƹ 2 𝑞ො2


𝑝Ƹ1 − 𝑝Ƹ 2 − 𝑧0.025 + ≤ 𝑝1 − 𝑝2 ≤ 𝑝Ƹ1 − 𝑝Ƹ 2 + 𝑧0.025 +
𝑛1 𝑛2 𝑛1 𝑛2

0.1176(0.8824) 0.0941(0.9059) 0.1176(0.8824) 0.0941(0.9059)


0.1176 − 0.0941 − 1.96 + ≤ 𝑝1 − 𝑝2 ≤ 0.1176 − 0.0941 + 1.96 +
85 85 85 85
This simplifies to −0.0685  p1 − p2  0.1155
Interpretation: This confidence interval includes zero. Based on the sample data, it seems
unlikely that the changes made in the surface finish process have reduced the proportion of
defective crankshaft bearings being produced.
Example: In criminal proceedings a convicted defendant is sometimes sent to prison by the
presiding judge and is sometimes not. A question has arisen in legal circles as to whether a
judge’s decision is affected by
(1) whether the defendant pleaded guilty or
(2) whether he or she pleaded innocent but was subsequently found guilty.
The following data refer to individuals, all having previous prison records, convicted of
second-degree robbery. 74, out of 142 who pleaded guilty, went to prison
61, out of 72 who pleaded not guilty, went to prison
Do these data indicate that a convicted individual’s chance of being sent to prison depends on
whether she or he had pleaded guilty?
Solution: Let p1 denote the probability that a convicted individual who pleaded guilty will be
sent to prison, and let p2 denote the corresponding probability for one who pleaded innocent
but was adjudged guilty.
For such a small p-value the null hypothesis will be rejected.
Example: A manufacturer has devised a new method for producing computer chips. He feels
that this new method will reduce the proportion of chips that turn out to have defects. To verify
this, 320 chips were produced by the new method and 360 by the old. The result was that 76
of the former and 94 of the latter were defective. Is this significant enough evidence for the
manufacturer to conclude that the new method will produce a smaller proportion of defective
chips? Use the 5 percent level of significance.
Let p1 denote the probability that a chip produced by the old method will be defective, and let
p2 denote the corresponding probability for a chip produced by the new method. To conclude
that p1 > p2, we need to reject H0 when testing

Since z0.05 = 1.645, we cannot reject the null hypothesis at the 5 percent level of significance.

A value of TS at least as large as the one observed will occur 24 percent of the time when the
two probabilities are equal.
Example: As personnel director, you want to test the perception of fairness of two methods
of performance evaluation. 63 of 78 employees rated Method 1 as fair. 49 of 82 rated
Method 2 as fair. Find a 99% confidence interval for the difference in perceptions.
63
Confidence Interval 𝑝Ƹ1 = = 0.808 𝑞ො1 = 1 − 0.808 = 0.192
78
49
𝑝Ƹ 2 = = 0.598 𝑞ො2 = 1 − 0.598 = 0.402
82
0.808 0.192 0.598 0.402
0.808 − 0.598 ± 2.58 + ⇒ 0.029 ≤ 𝑝1 − 𝑝2 ≤ 0.391
78 82
H0 : p1 – p2 = 0 and H1 : p1 – p2  0
n1 = 78 , n2 = 82 Reject H 0 Reject H 0

Critical Value :  = 0.01


0.005 0.005
𝑥1 63 𝑥2 49
𝑝Ƹ1 = = = 0.808 , 𝑝Ƹ 2 = = = 0.598
𝑛1 78 𝑛2 82
z
𝑥1 + 𝑥2 63 + 49 -2.58 0 2.58
𝑝Ƹ = = = 0.70
𝑛1 + 𝑛2 78 + 82
𝑝Ƹ1 − 𝑝Ƹ 2 − 𝑝1 − 𝑝2 0.808 − 0.598 − 0
Test Statistic: 𝑧≈ = = 2.90
1 1 1 1
𝑝Ƹ ⋅ 1 − 𝑝Ƹ + 0.70 ⋅ 1 − 0.70 +
𝑛1 𝑛2 78 82
Decision : Reject at  = 0.01
Conclusion: There is evidence of a difference in proportions.
Example: Market Research Associates
Market Research Associates is conducting research to evaluate the effectiveness of a client’s
new advertising campaign. Before the new campaign began, a telephone survey of 150
households in the test market area showed 60 households “aware” of
the client’s product. The new campaign has been initiated with TV and newspaper
advertisements running for three weeks. A survey conducted immediately after the new
campaign showed 120 of 250 households “aware” of the client’s product. Does the data
support the position that the advertising campaign has provided an increased awareness of
the client’s product?
p1 = proportion of the population of households “aware” of the product after the new
campaign
p2 = proportion of the population of households “aware” of the product before the new
campaign
𝑝1ҧ = sample proportion of households “aware” of the product after the new campaign
𝑝2ҧ = sample proportion of households “aware” of the product before the new campaign
120 60
𝑝lj1 − 𝑝lj 2 = − = 0.48 − 0.40 = 0.08
250 150
Interval Estimation For  = 0.05, z.025 = 1.96:

0.48(0.52) 0.40(0.60)
0.48 − 0.40 ± 1.96 + = 0.08 + 1.96(.0510) = 0.08 + 0.10
250 150
Hence, the 95% confidence interval for the difference in before and after awareness of the
product is -0.02 to 0.18.
Can we conclude, using a 0.05 level of significance, that the proportion of households aware
of the client’s product increased after the new advertising campaign?
Hypothesis Tests
p-value and Critical Value Approaches
1. Develop the hypotheses. H0 : p1 - p2 < 0
H1 : p1 - p2 > 0
2. Specify the level of significance.  = 0.05
250(0.48) + 150(0.40) 180
3. Compute the value of the test statistic. 𝑝lj = = = 0.45
250 + 150 400
1 1
𝑠𝑝lj 1 −𝑝lj2 = 0.45(.55)( + ) = 0.0514
250 150

(0.48 − 0.40) − 0 0.08


𝑧= = = 1.56
0.0514 0.0514
p–value Approach
4. Compute the p–value. For z = 1.56, the p–value = 0.0594
5. Determine whether to reject H0. Because p–value >  = 0.05, we cannot reject H0.
We cannot conclude that the proportion of households aware of the client’s product
increased after the new campaign.
Critical Value Approach
4. Determine the critical value and rejection rule. For a = 0.05, z.05 = 1.645
5. Determine whether to reject H0. Because 1.56 < 1.645, we cannot reject H0.

Exercise: A random sample of 220 female and 210 male coffee drinkers were questioned. The result
was that 71 of the women and 58 of the men indicated a preference for decaffeinated coffee. Do these
data establish, at the 5 percent level of significance, that the proportion of female coffee drinkers who
prefer decaffeinated coffee differs from the corresponding proportion for men? What is the p-value?
Inferences on the Variances of Two Normal Populations
Hypothesis Tests on the Ratio of Two Variances
Let 𝑋11 , 𝑋12 , … , 𝑋1𝑛1 be a random sample from a normal population with mean µ1 and variance
σ12 and let 𝑋21 , 𝑋22 , … , 𝑋2𝑛2 be a random sample from a second normal population with mean
µ2 and variance σ22 . Assume that both normal populations are independent. Let 𝑆12 and 𝑆22 be
the sample variances. Then the ratio
𝑆12 /𝜎12
𝐹= 2 2
𝑆2 /𝜎2

has an F distribution with n1 − 1 numerator degrees of freedom and n2 − 1 denominator


degrees of freedom.
Conditions Required for a Valid F-Test
1. Both sampled populations are normally distributed.
2. The samples are random and independent.
The F Distribution
We wish to test the hypotheses: 𝑯𝟎 : 𝜎12 = 𝜎22
𝑯𝟏 : 𝜎12 ≠ 𝜎22

Let W and Y be independent chi-square random variables with u and v degrees of freedom
respectively. Then the ratio
𝑊/𝑢
𝐹=
𝑌/𝑣
has the probability density function
𝑢 + 𝑣 𝑢 𝑢/2 (𝑢/2)−1
Γ 𝑥
𝑓(𝑥) = 2 𝑣 , 0<𝑥<∞
𝑢 𝑣 𝑢 𝑢+𝑣 /2
Γ Γ 𝑥 + 1
2 2 𝑣
and is said to follow the distribution with u degrees of freedom in the numerator and v degrees
of freedom in the denominator. It is usually abbreviated as Fu,v.
F-Test for Equal Population Variances
𝑠12
Test statistic: 𝐹= 2
𝑠2
One-Tailed Test : H0 : 𝜎12 = 𝜎22
H1 : 𝜎12 > 𝜎22 (or H1 : 𝜎12 < 𝜎22 )
Rejection region : 𝐹 > 𝐹𝛼 (or 𝐹 < 𝐹1−𝛼 )
where F is based on v1 and v2 which are the degrees of freedom for the numerator and
denominator sample variances, respectively.
Two-Tailed Test : H0 : 𝜎12 = 𝜎22
H1 : 𝜎12 ≠ 𝜎22
Rejection region : 𝐹 > 𝐹𝛼/2 and 𝐹 < 𝐹1−𝛼/2
where F/2 is based on v1 and v2 which are the degrees of freedom for the numerator and
denominator sample variances, respectively.

The upper 5 percentage point of F5,10 is


Confidence Interval on the Ratio of Two Variances

If 𝑠12 and 𝑠22 are the sample variances of random samples of sizes n1 and n2, respectively ,
from two independent normal populations with unknown variances σ12 and σ22 , then a
100(1 − )% confidence interval on the ratio σ𝟐𝟏 /σ𝟐𝟐 is

𝑠12 𝜎12 𝑠12


𝑓1−𝛼/2,𝑛2 −1,𝑛1 −1 ≤ 2 ≤ 2 𝑓𝛼/2,𝑛2 −1,𝑛1 −1
𝑠22 𝜎2 𝑠2

where 𝑓α/2,𝑛2 −1,𝑛1 −1 and 𝑓1−α/2,𝑛2 −1,𝑛1 −1 are the upper and lower /2 percentage points of the
F-distribution with n2 – 1 numerator and n1 – 1 denominator degrees of freedom, respectively.
A confidence interval on the ratio of the standard deviations can be obtained by taking square
roots in the Equation.

Example: Surface Finish for Titanium Alloy


A company manufactures impellers for use in jet-turbine engines. One of the operations
involves grinding a particular surface finish on a titanium alloy component. Two different
grinding processes can be used, and both processes can produce parts at identical mean
surface roughness. The manufacturing engineer would like to select the process having the
least variability in surface roughness.
A random sample of n1 = 16 parts from the first process results in a sample standard
deviations s1 = 5.1 micro-inches, and a random sample of n2 = 11 parts from the second
process results in a sample standard deviation of s2 = 4.7 micro-inches. Find a 90%
confidence interval on the ratio of the two standard deviations, 1/2. Assuming that the two
processes are independent and that surface roughness is normally distributed.
𝑠12 𝜎12 𝑠12
𝑓0.95,15,10 ≤ 2 ≤ 2 𝑓0.05,15,10
𝑠22 𝜎2 𝑠2
(5.1)2 𝜎12 (5.1)2
0.39 ≤ 2 ≤ 2.85
(4.7)2 𝜎2 (4.7)2
or upon completing the implied calculations and taking square roots,
𝜎1
0.678 ≤ ≤ 1.832
𝜎2
Notice that we find f0.95,15,10 = 1/f0.05,10,15 = 1/2.54 = 0.39.
Interpretation: Since this confidence interval includes unity, we cannot claim that the
standard deviations of surface roughness for the two processes are different at the 90% level
of confidence.
Example: Semiconductor Etch Variability
Oxide layers on semiconductor wafers are etched in a mixture of gases to achieve the proper
thickness. The variability in the thickness of these oxide layers is a critical characteristic of the
wafer, and low variability is desirable for subsequent processing steps. Two different mixtures
of gases are being studied to determine whether one is superior in reducing the variability of
the oxide thickness. Sixteen wafers are etched in each gas. The sample standard deviations
of oxide thickness are s1 = 1.96 angstroms and s2 = 2.13 angstroms, respectively. Is there any
evidence to indicate that either gas is preferable? Use a fixed-level test with  = 0.05.
The seven-step hypothesis-testing procedure is:
1. Parameter of interest : The parameter of interest are the variances of oxide thickness σ12
and σ22 .
2. Null hypothesis : 𝑯0 : 𝜎12 = 𝜎22
3. Alternative hypothesis : 𝑯1 : 𝜎12 ≠ 𝜎22
𝑠12
4. Test statistic : The test statistic is 𝑓0 = 2
𝑠2
5. Reject H0 if : Because n1 = n2 = 16 and  = 0.05, we will reject if f0 > f0.025,15,15 = 2.86 or if
f0 < f0.975,15,15 = 1/f0.025,15,15 = 1/2.86 = 0.35.
6. Computations : Because 𝑠12 = (1.96)2 = 3.84 and 𝑠22 = (2.13)2 = 4.54, the test statistic is
𝑠12 3.84
𝑓0 = 2 = = 0.85
𝑠2 4.54
7. Conclusions : Because f0..975,15,15 = 0.35 < 0.85 < f0.025,15,15 = 2.86, we cannot reject the null
hypothesis 𝑯0 : 𝜎12 = 𝜎22 at the 0.05 level of significance.
Interpretation : There is no strong evidence to indicate that either gas results in a smaller
variance of oxide thickness.

You might also like