Statistical Inference For Two Samples
Statistical Inference For Two Samples
A random sample of size 𝑛1 drawn from A random sample of size 𝑛2 drawn from
population 1 with mean μ1 and variance 𝜎12 . population 2 with mean μ2 and variance 𝜎22 .
Assumptions
1. Let 𝑋11 , 𝑋12 , … , 𝑋1𝑛1 be a random sample from population 1.
2. Let 𝑋21 , 𝑋22 , … , 𝑋2𝑛2 be a random sample from population 2.
3. The two populations X1 and X2 are independent.
4. Both X1 and X2 are normal.
Hypothesis Tests on the Difference in Means
Variances Known
Two independent populations.
Population 1 2 Population
1 2
1 2
Sampling
Astronomical number Distribution
of X1 – X2 values
-
1 2
𝑥lj 1 − 𝑥lj 2 − 𝐷0 𝜎12 𝜎22
Test statistic: 𝑧= ; 𝜎 = +
𝜎 𝑥lj 1 −𝑥lj 2 𝑥lj 1 −𝑥lj 2
𝑛1 𝑛2
𝜎12 𝜎22
Lower Confidence Bound 𝑥lj 1 − 𝑥lj 2 − 𝑧𝛼 + ≤ 𝜇1 − 𝜇2
𝑛1 𝑛2
Two-Sided Confidence Bounds
̄ ̄
1 2 − 𝐷 1 2 − 𝐷
𝛽=Φ 𝑧 − − Φ −𝑧 −
𝜎 𝜎 𝜎 𝜎
+ +
𝑛 𝑛 𝑛 𝑛
1 2 − 𝐷
𝛽 =Φ 𝑧 − ; 𝑯 right
𝜎 𝜎
+
𝑛 𝑛
1 − 2 − 𝐷0
𝛽right = 1 − Φ 𝑧𝛼 − ; 𝑯1 left
2
𝜎12 𝜎22
+
𝑛1 𝑛2
Two-sided alternative:
For the two-sided alternative hypothesis with significance level , the sample size n1 = n2 = n
required to detect a true difference in means of 𝐷 = 1 − 2 with power at least 1 − is
One-sided alternative:
For a one-sided alternative hypothesis with significance level , the sample size n1 = n2 = n
required to detect a true difference in means of 𝐷 ( 𝐷0 ) with power at least 1 − is
Null hypothesis: H0 : 1 − 2 = 𝐷0
𝑋ሜ 1 − 𝑋ሜ 2 − 𝐷0
Test statistic: 𝑍0 =
σ12 σ22
+
𝑛1 𝑛2
17 x1 - x2 = Point Estimate of m1 – 2
= = 6.49
2.62
p–value Approach
4. Compute the p–value. For z = 6.49, the p–value < 0.0001.
5. Determine whether to reject H0. Because p–value < = 0.01, we reject H0.
At the 0.01 level of significance, the sample evidence indicates the mean driving distance of
Par, Inc. golf balls is greater than the mean driving distance of Rap, Ltd. golf balls.
Critical Value Approach
4. Determine the critical value and rejection rule. For = 0.01, z.01 = 2.33 Reject H0 if z > 2.33
5. Determine whether to reject H0. Because z = 6.49 > 2.33, we reject H0.
The sample evidence indicates the mean driving distance of Par, Inc. golf balls is greater than
the mean driving distance of Rap, Ltd. golf balls.
Example: Large-Sample Confidence Interval
You’re a financial analyst for Charles Schwab. You want to estimate
the difference in dividend yield between stocks listed on NYSE and
NASDAQ. You collect the following data:
NYSE NASDAQ
Number 121 125
Mean 3.27 2.53
Std Dev 1.30 1.16 © 1984-1994 T/Maker Co.
What is the 95% confidence interval for the difference between the mean dividend yields?
Large-Sample Confidence Interval
𝐷0
𝐷0
𝐷0
𝐷0
𝐷0
Hypotheses Tests on the Difference in Means
Variances Unknown
Conditions Required for Valid Small-Sample Inferences
1 1 1 1
𝑥lj 1 − 𝑥lj 2 − 𝑡𝛼/2,𝑛1 +𝑛2 −2 𝑠𝑝 + ≤ 𝜇1 − 𝜇2 ≤ 𝑥lj 1 − 𝑥lj 2 + 𝑡𝛼/2,𝑛1 +𝑛2 −2 𝑠𝑝 +
𝑛1 𝑛2 𝑛1 𝑛2
where 𝑠𝑝 is the pooled estimate of the common population standard deviation, and
𝑡α/2, 𝑛1 +𝑛2 −2 is the upper /2 percentage point of the t-distribution with n1 + n2 - 2 degrees of
freedom.
One-Tailed Test : H0: (1 – 2) 𝐷 0 [or H0: (1 – 2) 𝐷 0 ]
H1: (1 – 2) 𝐷 0 [or H1: (1 – 2) 𝐷 0 ]
𝑥lj 1 − 𝑥lj 2 − 𝐷0
Test statistic : 𝑡=
1 1
𝑠𝑝2 +
𝑛1 𝑛2
Rejection region : t < –t [or t > t when H1 : (1 – 2) > 𝐷 0 ]
where t is based on (n1 + n2 – 2) degrees of freedom.
𝑥lj 1 − 𝑥lj 2 − 𝐷0
Two-Tailed Test : H0: (1 – 2) = 𝐷 0 Test statistic : 𝑡=
1 1
H1: (1 – 2) ≠ 𝐷 0 𝑠𝑝2 +
𝑛1 𝑛2
Rejection region : |t| > t /2 where t/2 is based on (n1 + n2 – 2) degrees of freedom.
Case 1:Test Statistic for the Difference in Means
Case 2: 𝟐
𝟏
𝟐
𝟐
If and are the sample means and variances of two random samples of sizes n1
and n2, respectively, from two independent normal populations with unknown and unequal
variances, then a 100(1 - )% confidence interval on the difference in means 1 - 2 is
/ , / ,
Test statistic H0 :
𝑥lj 1 − 𝑥lj 2 − 0 Sample mean 35 31
Test statistic : 𝑡=
1 1
𝑠2 + Sample Std
𝑛1 𝑛2 Dev
4.9 4.5
2
(𝑛1 − 1)𝑠12 + (𝑛2 − 1)𝑠22 9(4. 92 ) + 11(4. 52 )
Calculation: 𝑠 = = = 21.942
𝑛1 + 𝑛2 − 2 20
35 − 31
𝑡= = 1.994
1 1
21.942 +
10 12
𝒑 − 𝐯𝐚𝐥𝐮𝐞: two−sided ⇒ 𝑝−value = 2𝑃(𝑡 > 1.99)
0.025 < 𝑃(𝑡 > 1.99) < 0.05 ⇒ 0.05 < 𝑝 − value < 0.1
df = n1 + n2 – 2 = 10 + 12 – 2 = 20
Decision: since the p-value is greater than = 0.01, H0 is not rejected.
Conclusion: there is insufficient evidence to indicate a difference in the population means.
Example : Cement Hydration
Ten samples of standard cement had an average weight percent calcium of 𝑥lj 1 = 90.0
with a sample standard deviation of s1 = 5.0, and 15 samples of the lead-doped cement had
an average weight percent calcium of 𝑥lj 2 = 87.0 with a sample standard deviation of s2 = 4.0.
Assume that weight percent calcium is normally distributed with same standard deviation.
Find a 95% confidence interval on the difference in means, 1 - 2, for the two types of
cement.
The pooled estimate of the common standard deviation is found as follows:
(𝑛1 − 1)𝑠12 + (𝑛2 − 1)𝑠22 9(5.0)2 + 14(4.0)2
𝑠𝑝2 = = = 19.52
𝑛1 + 𝑛2 − 2 10 + 15 − 2
𝑠𝑝 = 19.52 = 4.4
The 95% confidence interval is found
1 1 1 1
𝑥lj 1 − 𝑥lj 2 − 𝑡0.025,23 𝑠𝑝 + ≤ 𝜇1 − 𝜇2 ≤ 𝑥lj 1 − 𝑥lj 2 + 𝑡0.025,23 𝑠𝑝 +
𝑛1 𝑛2 𝑛1 𝑛2
Upon substituting the sample values and using t0.025,23 = 2.069,
1 1 1 1
90.0 − 87.0 − 2.069 4.4 + ≤ 𝜇1 − 𝜇2 ≤ 90.0 − 87.0 + 2.069 4.4 +
10 15 10 15
Assuming normal populations, and population variances are equal. Is there a difference in
average yield ( = 0.05)?
H0: 1 – 2 = 0 (1 = 2) and H1: 1 – 2 0 (1 2)
df = 11 + 15 – 2 = 24 Critical Value(s) : = 0.05 Reject H 0 Reject H0
𝑛1 − 1 𝑠12 + 𝑛2 − 1 𝑠22
𝑠𝑝2 = 0.025 0.025
𝑛1 + 𝑛2 − 2
11 − 1 1.30 2 + 15 − 1 1.16 2
= = 1.489
11 + 15 − 2 -2.064 0 2.064 t
𝑥lj 1 − 𝑥lj 2 − 𝜇1 − 𝜇2 3.27 − 2.53 − 0
Test Statistic: 𝑡= = = 1.53
1 1 1 1
𝑠𝑝2 + 1.489 +
𝑛1 𝑛2 11 15
Thus, the data are significant enough to prove that the consultant’s modification was more
effective than the one used by the control group.
Example: To test the effectiveness of a new cholesterol-lowering medication, 100 volunteers
were randomly divided into two groups of size 50 each. Members of the first group were
given pills containing the new medication, while members of the second, or control group
were given pills containing lovastatin, one of the standard medications for lowering blood
cholesterol. All the volunteers were instructed to take a pill every 12 hours for the next 3
months. None of the volunteers knew which group they were in.
Suppose that the result of this experiment was an average reduction of 8.2 with a sample
variance of 5.4 in the blood cholesterol levels of those taking the old medication, and an
average reduction of 8.8 with a sample variance of 4.5 of those taking the newer medication.
Do these results prove, at the 5 percent level, that the new medication is more effective than
the old one?
Solution
Let μx denote the mean cholesterol reduction of a volunteer who is given the new medication,
and let μy be the equivalent value for one given the control.
2
𝑠12 𝑠2
+ 2
𝑛1 𝑛2
𝑣=
𝑠12 /𝑛1 2 𝑠 2 /𝑛 2
+ 2 2
𝑛1 − 1 𝑛2 − 1
Since the p-value is greater than 0.05, the evidence is not strong enough to establish, at the
5 percent level of significance, that the new medication is more effective than the old.
Example: Twenty-two volunteers at a cold-research institute caught a cold after having been
exposed to various cold viruses. A random selection of 10 volunteers were given tablets
containing 1 gram of vitamin C. These tablets were taken 4 times a day. The control
group, consisting of the other 12 volunteers, was given placebo tablets that looked and
tasted exactly like the vitamin C ones. This was continued for each volunteer until a doctor,
who did not know whether the volunteer was receiving vitamin C or the placebo, decided that
the volunteer was no longer suffering from the cold. The length of time the cold lasted was
then recorded and assume equal variances.
Do these data prove that taking 4 grams of vitamin C daily reduces the time that a cold lasts?
At what level of significance? At the end of this experiment, the following data resulted:
Solution:
To prove the foregoing hypothesis, we need to reject the null hypothesis in a test of
where c is the mean time a cold lasts when the vitamin C tablets are taken and p is the
mean time when the placebo is taken.
Since, from Table, t20,.05 = 1.725, the null hypothesis is rejected at the 5 percent level of
significance. That is, the evidence is significant, at the 5 percent level, in establishing that
vitamin C reduces the mean time that a cold persists.
Example: Arsenic concentration in public drinking water supplies is a potential health risk. An
article in the Arizona Republic (May 27, 2001) reported drinking water arsenic concentrations
in parts per billion (ppb) for 10 metropolitan Phoenix communities and 10 communities in rural
Arizona. The data follow:
7. Conclusions: Because 𝑡0∗ = −2.77 < 𝑡0.025,18 = −2.101, we reject the null hypothesis.
Interpretation: We can conclude that mean arsenic concentration in the drinking water in
rural Arizona is different from the mean arsenic concentration in metropolitan Phoenix
drinking water.
Exercise: Sample weights (in pounds) of newborn babies born in two adjacent counties in
western Pennsylvania yielded the following data:
n = 53 m = 44
𝑋ሜ = 6.8 𝑌ሜ = 7.2
S2 = 5.2 S2 = 4.9
Consider a test of the hypothesis that the mean weight of newborns is the same in both
counties. What is the resulting p value? How would you express your conclusions to an
intelligent person who has not yet studied statistics?
Are these data strong enough to prove, at the 5 percent level of significance, that the
experimental method results in a higher mean test score?
Paired-Difference in Mean Test
Sometimes the assumption of independent samples is intentionally violated, resulting in a
matched-pairs or paired-difference test.
Pair 1 2 … n
With a matched-sample design each
Population
sampled item provides a pair of data values. 1
x11 x12 … x1n
This design often leads to a smaller
sampling error than the independent-sample Population
x21 x22 … x2n
2
design because variation between sampled
items is eliminated as a source of sampling Difference d1= x11- x21 d2= x12- x22 … dn= x1n- x2n
error.
Conditions Required for Valid Large-Sample Inferences about d
1. A random sample of differences is selected from the target population of differences.
2. The sample size nd is large (i.e., nd ≥ 30); due to the Central Limit Theorem, this condition
guarantees that the test statistic will be approximately normal regardless of the shape
of the underlying probability distribution of the population.
𝑑ሜ − 𝐷0 𝑑ሜ − 𝐷0
Large Sample Test statistic : 𝑧= 𝜎 ≈ 𝑠
𝑑 𝑑
𝑛𝑑 𝑛𝑑
𝑑ሜ − 𝐷0
Small Sample Test statistic : 𝑡= 𝑠
𝑑
𝑛𝑑
where tα/2,n-1 is the upper /2% point of the t distribution with n - 1 degrees of freedom.
𝜎𝑑 𝑠𝑑
For the Large Sample 𝑑ሜ ± 𝑧𝛼 ≈ 𝑑ሜ ± 𝑧𝛼
2 𝑛𝑑 2 𝑛𝑑
Summary Paired t-Test
𝐷0
𝐷0
𝐷0
𝐷0
𝐷0
Example: Shear Strength of Steel Girder
An article in the Journal of Strain Analysis [1983, Vol. 18(2)] reports a comparison of
several methods for predicting the shear strength for steel plate girders. Data for two of
these methods, the Karlsruhe and Lehigh procedures, when applied to nine specific
girders, are shown in the table below.
Girder Karlsruhe Method Lehigh Method Difference dj
Determine whether there is any difference (on the average) for the two methods.
The seven-step procedure is:
1. Parameter of interest: The parameter of interest is the difference in mean shear
strength for the two methods.
2. Null hypothesis : H0 : D = 0
3. Alternative hypothesis : H1 : D 0 𝑑ሜ
4. Test statistic : The test statistic is 𝑡0 =
𝑠𝑑 / 𝑛
5. Reject H0 if: Reject H0 if the p-value is < 0.05.
6. Computations: The sample average and standard deviation of the differences dj are
𝑑ሜ = 0.2769 and sd = 0.1350, and so the test statistic is
𝑑ሜ 0.2769
𝑡0 = = = 6.15
𝑠𝑑 / 𝑛 0.1350/ 9
7. Conclusions : Because t0.0005.8 = 5.041 and the value of the test statistic t0 = 6.15
exceeds this value, the p-value is less than 2(0.0005) = 0.001. Therefore, we conclude
that the strength prediction methods yield different results.
Interpretation: The data indicate that the Karlsruhe method produces, on the average,
higher strength predictions than does the Lehigh method.
The journal Human Factors (1962, pp. 375-380) reported a study in which n = 14 subjects
were asked to parallel park two cars having very different wheel bases and turning radii. The
time in seconds for each subject was recorded and is given in the Table. From the column of
observed differences, we calculate 𝑑ҧ = 1.21 and 𝑠𝑑 = 12.68. Find the 90% confidence
interval for 𝜇𝐷 = 𝜇1 − 𝜇2 .
Notice that the confidence interval on D includes zero. This implies that, at the 90% level of
confidence, the data do not support the claim that the two cars have different mean parking
times 1 and 2. That is, the value D = 1 - 2 = 0 is not inconsistent with the observed
data.
𝑑ሜ − 𝑡0.05,13 𝑠𝐷 / 𝑛 ≤ 𝜇𝐷 ≤ 𝑑ሜ + 𝑡0.05,13 𝑠𝐷 / 𝑛
Example: You work in Human Resources. You want to see if there is a difference in test
scores after a training program. You collect the following test score data:
Find a 90% confidence interval for the mean Name Before (1) After (2)
difference in test scores. Sam 85 94
Tamika 94 87
Confidence Interval Brian 78 79
Mike 87 88
df = nd – 1 = 4 – 1 = 3 , t.05 = 2.353
𝑆𝑑 6.53
𝑑ሜ ± 𝑡𝛼 ⇒ −1 ± 2.353 ⇒ −8.68 ≤ 𝜇𝑑 ≤ 6.68
2 𝑛𝑑 4
Aft
Observation Before Difference
er
Sam 85 94 –9
Tamika 94 87 7
Brian 78 79 –1
Mike 87 88 –1
Total d = –1 sd = 6.53 –4
You work in Human Resources. You want to see if a training program is effective. At the 0.10
level of significance, was the training effective?
Null Hypothesis
1. Was the training effective? H0 : d = 0 (d = B – A)
2. Effective means ‘Before’ < ‘After’. H1 : d < 0
df = 4 – 1 = 3
3. Statistically, this means B < A.
Critical Value : = 0.1
4. Rearranging terms gives B – A < 0.
5. Defining d = B – A and substituting into (4) gives d 0.
6. The alternative hypothesis is H1: d 0. Reject H 0
𝑑ሜ − 𝐷0 −1 − 0
Test Statistic: 𝑡= = = −0.306
𝑆𝑑 6.53 0.10
𝑛𝑑 4
Decision: Do not reject at = 0.10
-1.638 0 t
Conclusion: There is no evidence training was effective.
(1) (2)
Example: Thinking Challenge store Client Competitor
You’re a marketing research analyst. You want to 1 $ 10 $ 11
compare a client’s calculator to a competitor’s. You 2 8 11
3 7 10
sample 8 retail stores. At the 0.01 level of significance, 4 9 12
does your client’s calculator sell for less than their 5 11 11
competitor’s? 6 10 13
7 9 12
H0 : d = 0 (d = 1 – 2) df = 8 – 1 = 7 8 8 10
H1 : d < 0 Critical Value : = 0.01 Reject H0
𝑑ሜ − 𝑑0 −2.25 − 0
Test Statistic: 𝑡= 𝑠 = = −5.486
𝑑 1.16 0.01
𝑛𝑑 8
Decision : Reject at = 0.01
Conclusion: There is evidence client’s brand (1) sells for less. -2.998 0 t
p –value approach
4. Compute the p–value.
For t = 2.94 and df = 9, the p–value is between 0.02 and 0.01. (This is a two-tailed test, so
we double the upper-tail areas of 0.01 and 0.005.)
5. Determine whether to reject H0. Because p–value < = 0.05, we reject H0.
We are at least 95% confident that there is a difference in mean delivery times for the two
services?
Critical Value Approach
4. Determine the critical value and rejection rule.
For = 0.05 and df = 9, t.025 = 2.262. Reject H0 if t > 2.2
5. Determine whether to reject H0. Because t = 2.94 > 2.262, we reject H0.
We are at least 95% confident that there is a difference in mean delivery times for the two
services?
Paired Versus Unpaired Experiments
Exercise:
Inference on Two Population Proportions
where
where z/2 is the upper /2 percentage point of the standard normal distribution.
Approximate Tests on the Difference of Two Population Proportions
Type II Error and Choice of Sample Size
If the alternative hypothesis is two sided, the b-error is
𝑝𝑞 𝑝 𝑞 𝑥 +𝑥
where 𝜎 ̂ ̂ = + and 𝑝̂ =
𝑛 +𝑛
.
𝑛 𝑛
𝑧 𝑝̄ 𝑞̄(1/𝑛 + 1/𝑛 ) − (𝑝 − 𝑝 )
If the alternative hypothesis is H1: p1 p2, 𝛽 =Φ
𝜎
−𝑧 𝑝̄ 𝑞̄(1/𝑛 + 1/𝑛 ) − (𝑝 − 𝑝 )
and if the alternative hypothesis is H1: p1 p2, 𝛽 =1−Φ
𝜎
[𝑧 / (𝑝 + 𝑝 )(𝑞 + 𝑞 )/2 + 𝑧 𝑝 𝑞 +𝑝 𝑞 ]
𝑛=
(𝑝 − 𝑝 )
To estimate (1 – 2) with a given margin of error ME and with confidence level (1 – ), use
the following formula to solve for equal sample sizes that will achieve the desired reliability:
𝑧𝛼 2
𝜎12 + 𝜎22
2
𝑛1 = 𝑛2 =
(ME)2
You will need to substitute estimates for the values of 𝜎12 and 𝜎22 before solving for the
sample size. These estimates might be sample variances 𝑠12 and 𝑠22 from prior sampling (e.g.,
a pilot sample) or from an educated (and conservatively large) guess based on the range
that is, s ≈ R/4.
To estimate (p1 – p2) with a given margin of error ME and with confidence level (1 – ), use
the following formula to solve for equal sample sizes that will achieve the desired reliability:
2
𝑧𝛼 𝑝1 𝑞1 + 𝑝2 𝑞2
2
𝑛1 = 𝑛2 =
(ME)2
You will need to substitute estimates for the values of p1 and p2 before solving for the sample
size. These estimates might be based on prior samples, obtained from educated guesses, or,
most conservatively, specified as p1 = p2 = 0.5.
Discussions
The ideal way to test the hypothesis that the results of two different treatments are identical is
to randomly divide a group of people into a set that will receive the first treatment and one
that will receive the second. However, such randomization is not always possible. For
instance, if we want to study whether drinking alcohol increases the risk of prostate cancer,
we cannot instruct a randomly chosen sample to drink alcohol. An alternative way to study the
hypothesis is to use an observational study that begins by randomly choosing a set of
drinkers and one of nondrinkers. These sets are followed for a period of time and the
resulting data are then used to test the hypothesis that members of the two groups have the
same risk for prostate cancer.
Example: St. John's Wort
Extracts of St. John's Wort are widely used to treat depression. An article in the April 18,
2001, issue of the Journal of the American Medical Association compared the efficacy of a
standard extract of St. John's Wort with a placebo in 200 outpatients diagnosed with major
depression. Patients were randomly assigned to two groups; one group received the St.
John's Wort, and the other received the placebo. After eight weeks, 19 of the placebo-treated
patients showed improvement, and 27 of those treated with St. John's Wort improved. Is there
any reason to believe that St. John's Wort is effective in treating major depression? Use =
0.05. The seven-step hypothesis testing procedure leads to the following results:
1. Parameter of interest: The parameters of interest are p1 and p2, the proportion of patients
who improve following treatment with St. John's Wort (p1) or the placebo (p2).
2. Null hypothesis : H0 : p1 = p2
3. Alternative hypothesis : H1 : p1 > p2
𝑝Ƹ1 − 𝑝Ƹ 2
4. Test Statistic : The test statistic is 𝑧0 =
1 1
𝑝(1
Ƹ − 𝑝)Ƹ +
𝑛1 𝑛2
where 𝑝Ƹ1 = 27/100 = 0.27 , 𝑝Ƹ 2 = 19/100 = 0.19 , n1 = n2 = 100, and
𝑥1 + 𝑥2 19 + 27
𝑝Ƹ = = = 0.23
𝑛1 + 𝑛2 100 + 100
5. Reject H0 if : Reject H0 : p1 = p2 if the p-value is less than 0.05.
0.27 − 0.19
6. Computations : The value of the test statistic is 𝑧0 = = 1.34
1 1
0.23(0.77) +
100 100
7. Conclusions : Since z0 = 1.34, the p-value is P = 1 − (1.34) = 0.09,
we cannot reject the null hypothesis.
Interpretation: There is insufficient evidence to support the claim that St. John's Wort is
effective in treating major depression.
Example: Consider the process of manufacturing crankshaft hearings. Suppose that a
modification is made in the surface finishing process and that, subsequently, a second
random sample of 85 bearings is obtained. The number of defective bearings in this
second sample is 8. Therefore, because n1 = 85 , n2 = 85 , and 𝑝Ƹ1 = 10/85 = 0.1176, 𝑝Ƹ 2 =
8/85 = 0.0941. An 95% confidence interval on the difference in the proportion of detective
bearings produced under the two processes.
Since z0.05 = 1.645, we cannot reject the null hypothesis at the 5 percent level of significance.
A value of TS at least as large as the one observed will occur 24 percent of the time when the
two probabilities are equal.
Example: As personnel director, you want to test the perception of fairness of two methods
of performance evaluation. 63 of 78 employees rated Method 1 as fair. 49 of 82 rated
Method 2 as fair. Find a 99% confidence interval for the difference in perceptions.
63
Confidence Interval 𝑝Ƹ1 = = 0.808 𝑞ො1 = 1 − 0.808 = 0.192
78
49
𝑝Ƹ 2 = = 0.598 𝑞ො2 = 1 − 0.598 = 0.402
82
0.808 0.192 0.598 0.402
0.808 − 0.598 ± 2.58 + ⇒ 0.029 ≤ 𝑝1 − 𝑝2 ≤ 0.391
78 82
H0 : p1 – p2 = 0 and H1 : p1 – p2 0
n1 = 78 , n2 = 82 Reject H 0 Reject H 0
0.48(0.52) 0.40(0.60)
0.48 − 0.40 ± 1.96 + = 0.08 + 1.96(.0510) = 0.08 + 0.10
250 150
Hence, the 95% confidence interval for the difference in before and after awareness of the
product is -0.02 to 0.18.
Can we conclude, using a 0.05 level of significance, that the proportion of households aware
of the client’s product increased after the new advertising campaign?
Hypothesis Tests
p-value and Critical Value Approaches
1. Develop the hypotheses. H0 : p1 - p2 < 0
H1 : p1 - p2 > 0
2. Specify the level of significance. = 0.05
250(0.48) + 150(0.40) 180
3. Compute the value of the test statistic. 𝑝lj = = = 0.45
250 + 150 400
1 1
𝑠𝑝lj 1 −𝑝lj2 = 0.45(.55)( + ) = 0.0514
250 150
Exercise: A random sample of 220 female and 210 male coffee drinkers were questioned. The result
was that 71 of the women and 58 of the men indicated a preference for decaffeinated coffee. Do these
data establish, at the 5 percent level of significance, that the proportion of female coffee drinkers who
prefer decaffeinated coffee differs from the corresponding proportion for men? What is the p-value?
Inferences on the Variances of Two Normal Populations
Hypothesis Tests on the Ratio of Two Variances
Let 𝑋11 , 𝑋12 , … , 𝑋1𝑛1 be a random sample from a normal population with mean µ1 and variance
σ12 and let 𝑋21 , 𝑋22 , … , 𝑋2𝑛2 be a random sample from a second normal population with mean
µ2 and variance σ22 . Assume that both normal populations are independent. Let 𝑆12 and 𝑆22 be
the sample variances. Then the ratio
𝑆12 /𝜎12
𝐹= 2 2
𝑆2 /𝜎2
Let W and Y be independent chi-square random variables with u and v degrees of freedom
respectively. Then the ratio
𝑊/𝑢
𝐹=
𝑌/𝑣
has the probability density function
𝑢 + 𝑣 𝑢 𝑢/2 (𝑢/2)−1
Γ 𝑥
𝑓(𝑥) = 2 𝑣 , 0<𝑥<∞
𝑢 𝑣 𝑢 𝑢+𝑣 /2
Γ Γ 𝑥 + 1
2 2 𝑣
and is said to follow the distribution with u degrees of freedom in the numerator and v degrees
of freedom in the denominator. It is usually abbreviated as Fu,v.
F-Test for Equal Population Variances
𝑠12
Test statistic: 𝐹= 2
𝑠2
One-Tailed Test : H0 : 𝜎12 = 𝜎22
H1 : 𝜎12 > 𝜎22 (or H1 : 𝜎12 < 𝜎22 )
Rejection region : 𝐹 > 𝐹𝛼 (or 𝐹 < 𝐹1−𝛼 )
where F is based on v1 and v2 which are the degrees of freedom for the numerator and
denominator sample variances, respectively.
Two-Tailed Test : H0 : 𝜎12 = 𝜎22
H1 : 𝜎12 ≠ 𝜎22
Rejection region : 𝐹 > 𝐹𝛼/2 and 𝐹 < 𝐹1−𝛼/2
where F/2 is based on v1 and v2 which are the degrees of freedom for the numerator and
denominator sample variances, respectively.
If 𝑠12 and 𝑠22 are the sample variances of random samples of sizes n1 and n2, respectively ,
from two independent normal populations with unknown variances σ12 and σ22 , then a
100(1 − )% confidence interval on the ratio σ𝟐𝟏 /σ𝟐𝟐 is
where 𝑓α/2,𝑛2 −1,𝑛1 −1 and 𝑓1−α/2,𝑛2 −1,𝑛1 −1 are the upper and lower /2 percentage points of the
F-distribution with n2 – 1 numerator and n1 – 1 denominator degrees of freedom, respectively.
A confidence interval on the ratio of the standard deviations can be obtained by taking square
roots in the Equation.