You are on page 1of 54

5.

3 Hypothesis Testing
two-sample tests
Dr. Jyotika Doshi
Comparing
two means or two proportions
• Compare the average income of all adults in
one region of the country with the average
income of those in another region
• Compare the proportion of all men who are
vegetarians with the proportion of all women
who are vegetarians

7/10/2021 Dr. Jyotika Doshi 4


Independent Sampling
from Two Populations
• Independent: each sample is drawn without
reference to the other
• Estimate difference μ1−μ2
• Test whether Population 1 Population 2
mean : μ1 mean : μ2
μ1 and μ2 are s.d. : σ1 s.d. : σ2
significantly Sample 1 Sample 2
different size: n1 size: n2
mean : X̅1 mean : X̅2
s.d. : s1 s.d. : s2
7/10/2021 Dr. Jyotika Doshi 5
7/10/2021 Dr. Jyotika Doshi 6
• Large samples: sample size ≥ 30
• Small samples: sample size < 30
• Independent samples: each sample is drawn
without reference to the other
• Hypothesized difference : ∆ or (μ1 – μ2 )H0
– Difference under H0
– Ex. H0: μ1 – μ2 = 0 OR H0: μ1 – μ2 = 5
– Ex. H0: p1 – p2 = 0 OR H0: p1 – p2 = 0.2

7/10/2021 Dr. Jyotika Doshi 7


Large, Independent samples
σ known
• Independent: each sample is drawn without
reference to the other
• Large: each sample size more than 30
– With statistical tools, may consider > 100
• Comparing means
– σ is known, use Z test
– σ is unknown ( may use Z-test with σ̂ = s)
But better to use t-test
• Z test: ∆ is hypothesized difference under H0
– Ex. H0: μ1 – μ2 = 0 OR H0: μ1 – μ2 = 5

7/10/2021 Dr. Jyotika Doshi 8


Small, Independent samples
σ unknown
• Independent: each sample is drawn without
reference to the other
• Small: sample size < 30
– either n1 < 30 or n2 < 30
• Comparing means
• Use student’s t-test ( for equal variance)
• Use Welch’s t-test ( for unequal variance)

7/10/2021 Dr. Jyotika Doshi 9


Comparing two means
Student’s t-test (equal variance)
• small samples, σ unknown, equal variance
• Two population distributions assumed to have same
variance : σ12 = σ22
• Use pooled s.d. : s 1and s 2 pooled together, each
weighted by the number of cases in each sample
• , , d.f. = n1+n2-2

• Often hard to know whether the variances of the two


populations are equal (may apply variance ratio F test)
• pooled variance method should be used with caution

7/10/2021 Dr. Jyotika Doshi 11


Comparing two means
Welch’s t-test (unequal variance)
• Small samples, σ is unknown, unequal variance
• Two population distributions may not have same
variance
• t-statistic:

• If two samples don’t have the same means, why


should we expect them to have the same standard
deviation?
• Better to apply Welch’s t-test instead of Student’s t-test

7/10/2021 Dr. Jyotika Doshi 12


Forming hypothesis
• Similar to one sample test: 3 forms
• Null Hypothesis H0:μ1−μ2= ∆
– ∆: specified diff. in H0 (hypothesised difference)
• Alternative hypothesis H1:
Form of Ha Terminology
Ha:μ1−μ2 < ∆ Left-tailed
Ha:μ1−μ2 > ∆ Right-tailed
Ha:μ1−μ2 ≠ ∆ Two-tailed

7/10/2021 Dr. Jyotika Doshi 13


Two-sample Z-test
Comparing two means
• For Independent samples
• Two-sample Z-Test can be applied when
– Populations are normally distributed
– OR
– Sample size sufficiently large to apply CLT
– standard deviation of the population is known

7/10/2021 Dr. Jyotika Doshi 14


Difference between means: μ1−μ2
• Point estimate: X̅1 − X̅ 2
• Sampling distribution of X̅1 − X̅2
– For independent samples with large size n1 and n2,
( X̅ 1 − X̅ 2 )~ N (μ1−μ2, σ X̅1−X̅2 )
𝜎12 𝜎22
where μ(X̅1−X̅2) = μ1−μ2, σ(X̅1−X̅2) = +
𝑛1 𝑛2
• When σ1, σ2 are unknown, estimate using s1, s2
i.e. use σ̂i = si (for sufficiently large samples)
𝑋̅1 −𝑋̅2 − μ1 −μ2
• Z statistic = 𝐻0
σ X̅1−X̅2

7/10/2021 Dr. Jyotika Doshi 15


Example 1
• To compare the height of two male populations from
the United States and Sweden, a sample of 30 males
from each country is randomly selected. Mean height
of US male and Swedish male is observed as 69.70 and
71.51 inches respectively. Currently, the mean and
standard deviation for the US and Swedish populations
are known as shown below. Is the height difference
significant?

7/10/2021 Dr. Jyotika Doshi 16


Example 1 ( σ known case) …
• Step 1: H0: μ1 – μ2 = 0 vs. H1: μ1 – μ2 ≠ 0
̶ μ1 : height of US male population
̶ μ2 : height of Swedish male population
• Step 2: α = 0.05
• Step 3: Z-test, Z-statistic:

• Step 4: compute Z statistic (denote as Zcalc or Zstat)


𝜎1 2 𝜎2 2 (3.12)2 (2.44)2
σ (X̅1−X̅2) = + = +
𝑛1 𝑛2 30 30
𝑋̅1 −𝑋̅2 − μ1 −μ2 𝐻0 69.70 −70.51 − 0
Zstat = = = -2.51
σ X̅1−X̅2 σ X̅1−X̅2
7/10/2021 Dr. Jyotika Doshi 17
Example 1…
• Step 5: rejection Rule using p-value approach
• Ha:μ1−μ2≠0 (two tail test) .025 .025
95%
• Critical value approach:
Z
– For α=0.05, Ztab = ±1.96 -2.51 -1.96 0 1.96

– Zcalc = -2.51, in rejection region, Reject Null hypo. H0


• OR P-value approach: p-value = 2*0.0060 = 0.012
– P-value (0.012) < signi. (0.05)  Reject Null Hypo.
• Step 6: Conclusion
– Statistically, US and Swedish male populations are
significantly different with respect to the height.

7/10/2021 Dr. Jyotika Doshi 18


Example2
• Compare customer satisfaction levels of two competing cable
television companies Company 1 and Company 2: samples
were randomly selected and were asked to rate their cable
companies on a five-point scale, with 1 being least satisfied
and 5 most satisfied. The survey results are summarized in the
following table. Test at α = 0.01, whether the data provide
sufficient evidence to conclude that Company 1 has a higher
mean satisfaction rating than does Company 2.
Company 1 Company 2
Size 174 355
Mean 3.51 3.24
Std. dev. 0.51 0.52
7/10/2021 Dr. Jyotika Doshi 19
Example 2 (σ unknown,large sample)…
• Step 1: H0: μ1−μ2 = 0, Ha:μ1−μ2>0
• Step 2: α=0.01
• Step 3: Use Z distribution (two indep. Samples,
large sample size, use sample s.d. as estimate of
population s.d.)
• Step 4: compute Z statistic (denote as Zcalc or Zstat)
𝜎12 𝜎22 (.51)2 (.52)2
σ (X̅1−X̅2) = + = +
𝑛1 𝑛2 174 355
𝑋̅1 −𝑋̅2 − μ1 −μ2 𝐻0
3.51 −3.54 − 0
Zstat = = = 5.684
σ X̅1−X̅2 σ X̅1−X̅2
7/10/2021 Dr. Jyotika Doshi 20
Example 2…
• Step 5: rejection Rule
• Ha:μ1−μ2>0 (right tail test)
• Critical value approach:
– For α=0.01, Ztab = 2.326
– Zcalc = 5.684 > Ztab, falls in rejection region
– Reject Null hypo. H0
• P-value approach: p-value?
• Step 6: Conclusion
– The data provides sufficient evidence, at the 1% level of
significance, to conclude that the mean customer
satisfaction for Company 1 is higher than mean customer
satisfaction for Company 2

7/10/2021 Dr. Jyotika Doshi 21


Example 2…
• Step 5: using p-value approach
• P-value=P(Z > Zcalc)
= P(Z>5.684) ≈ 0.00
• P-value < α (=0.01)
– Decision: Reject H0
• Step 6: Conclusion
– The data provides sufficient evidence, at the 1% level
of significance, to conclude that the mean customer
satisfaction for Company 1 is higher than mean
customer satisfaction for Company 2

7/10/2021 Dr. Jyotika Doshi 22


Example 3
• Test whether girls on average score 10 marks more
than the boys. We have the information that the
standard deviation for girls’ score is 100 and for boys’
score is 90. Then we collect the data of 20 girls and 20
boys by using random samples and record their marks.
Mean Score for Girls (Sample Mean) is 641 and Mean
Score for Boys (Sample Mean) is 613.
– Let mean score of girls = μ1, mean score of boys = μ2
– Here, n1=n2=20, X̅ 1= 641, X̅ 2= 613, σ1=100, σ2=90
– Step1: H0: μ1 - μ2 = 10 vs. H1: μ1 - μ2 > 10
– Perform Step 2 to Step 6
– Note: Hypothesized diff . = 10

7/10/2021 Dr. Jyotika Doshi 23


Exercises
• Perform the test of hypotheses indicated, using the
data from independent samples. Use the p-value
approach.

a. Test H0:μ1−μ2=57 vs. Ha:μ1−μ2<57 @ α=0.10


– n1=117, X̅1=1309, s1=42
– n2=133, X̅2=1258, s2=37

b. Test H0:μ1−μ2=−1.5 vs.Ha:μ1−μ2≠−1.5 @ α=0.20


– n1=65, X̅1=16.9, s1=1.3
– n2=57, X̅ 2=18.6, s2=1.1

7/10/2021 Dr. Jyotika Doshi 24


Exercises…
• The amount of a certain trace element in blood is known to vary
with a standard deviation of 14.1 ppm (parts per million) for male
blood donors and 9.5 ppm for female donors. Random samples of
75 male and 50 female donors yield concentration means of 28 and
33 ppm, respectively. What is the likelihood that the population
means of concentrations of the element are the same for men and
women?
• A high school math teacher claims that students in her class will
score higher on the math portion of the ACT then students in a
colleague’s math class. The mean ACT math score for 49 students in
her class is 22.1 and the standard deviation is 4.8. The mean ACT
math score for 44 of the colleague’s students is 19.8 and the
standard deviation is 5.4. At α = 0.10, can the teacher’s claim be
supported?

7/10/2021 Dr. Jyotika Doshi 25


t-test for independent samples

Small samples, unknown pop. var.

7/10/2021 Dr. Jyotika Doshi 26


2-Sample t test: Comparing 2 Means
• In practice, σ1 and σ2 are usually unknown
– two‐sample z‐test cannot be used (small sample)
• With large samples, z-test may be applied
using sample standard deviations as estimates
for population s.d.
– May use t-test
• Requirements: Two normally distributed but
independent populations, σ is unknown

7/10/2021 Dr. Jyotika Doshi 27


Comparison of Two Population Means
unknown σ, small, independent samples
• Small sized samples (either n1 < 30 or n2 < 30)
– Central Limit Theorem does not apply
• Independent samples
• Assumption: both populations have approx.
normal probability distribution
• Under assumption of equal variance:
student’s t-test with pooled variance
• Under assumption of Unequal variance:
Welch’s t-test with estimated variance

7/10/2021 Dr. Jyotika Doshi 28


Comparing two means
small samples, σ unknown
• Student’s t-test (assumed equal variance:σ12 = σ22 )
– Use pooled s.d. , t-statistic with d.f. = n1+n2-2

• Welch’s t-test (assumed unequal variance: σ12 ≠ σ22 )

7/10/2021 Dr. Jyotika Doshi 29


Example 4: student’s t-test
Equal Variance
• A random sample of 17 police officers in Brownsville has a
mean annual income of $35,800 and a standard deviation
of $7,800. In Greensville, a random sample of 18 police
officers has a mean annual income of $35,100 and a
standard deviation of $7,375. Test the claim at  = 0.01
that the mean annual incomes in the two cities are not the
same. Assume the population variances are equal.
• 1. H0: 1 = 2 vs. Ha: 1  2
• 2.  = 0.01
• 3.

• d.f. = n1 + n2 – 2 = 17 + 18 – 2 = 33

7/10/2021 Dr. Jyotika Doshi 30


Example 4…
• 4. pooled estimate of σ = σ̂ = Sp
σ x  x  σˆ 1

1

n1  1 s12  n2  1 s 22  1

1
1 2
n1 n2 n1  n2  2 n1 n2


17  1 78002  18  1 73752  1

1
17  18  2 17 18

 7584.0355(0.3382)  2564.92

t
x1  x 2  μ1  μ2 35800  35100  0
σ x x   0.273
1 2
2564.92

7/10/2021 Dr. Jyotika Doshi 31


Example 4…
• 5. d.f. = n1 + n2 – 2 = 17 + 18 – 2 = 33
– /2 = 0.005, df=33
• critical value: tcalc = 0.273 (not in rejection area)
– Fail to reject H0 1
  0.005
1
  0.005
• P-value 2 2

– Nearer to 1, very high


t
– Fail to reject H0 -3 -2 -1 0 1
t0 = 2.576
2 3
–t0 = –2.576
–tcalc=0.273
• 6. not enough evidence at the 1% level to support the
claim that the mean annual incomes differ
– No significant diff. in mean incomes of police officers in
Brownsville and Greensville
7/10/2021 Dr. Jyotika Doshi 32
Exercise (student’s t-test)
• On an average, boys score 15 marks more
than girls in SAT exam. To test this, a sample of
students is taken from a class having 15 girls
and 20 boys. Sample mean score of girls is
610 and that for boys is 630. Standard
deviation of girls’ score of girls is 14.5 and s.d.
of boys’ score is 15. Test the hypothesis at 1%
significance.
– Single sample divided into two groups. Assume
population s.d. equal
7/10/2021 Dr. Jyotika Doshi 33
Welch’s Test for Unequal Variances
• also called Welch’s t-test, Welch’s adjusted T
or unequal variances t-test
• Modification of Student’s t-test
– degrees of freedom (computation !!! , s/w tools)
– using sample s.d. s1 and s2, no pooled s.d.

7/10/2021 Dr. Jyotika Doshi 34


Example 5: Welch’s t-test
• Consider a new computer software package developed to help
systems analysts reduce the time required to design, develop, and
implement an information system.
• To evaluate the benefits of the new software package, a random
sample of 12 systems analysts are instructed to produce the
information system by using current technology. Another sample of
12 analysts are trained in the use of the new software package and
then instructed to use it to produce the information system.
Consider sample1 of system analysts using current technology, and
sample2 of analysts using new s/w package.
• Sample 1: mean=325 hours, s.d. = 40 hours
• Sample2: mean=286 hours, s.d. = 44 hours
• The researcher in charge of the new software evaluation project
hopes that the new software package will provide a shorter mean
project completion time. Formulate and test the hypothesis.

7/10/2021 Dr. Jyotika Doshi 35


Example 5: Welch’s t-test…
• Two populations of system analysts:
– using the current technology
– using the new software package
• Population means
– mean project completion time for systems analysts using
the current technology: μ1
– mean project completion time for systems analysts using
the new s/w package: μ2
• Sample 1: n1=12, mean=325 hours, s.d. = 40 hours
• Sample2: n2= 12, mean=286 hours, s.d. = 44 hours

7/10/2021 Dr. Jyotika Doshi 36


Example 5: Welch’s t-test…
• 1. H0: 1 ≤ 2 (1 - 2 ≤ 0) vs. Ha: 1 > 2
• 2.  = 0.05
• 3, 4.

7/10/2021 Dr. Jyotika Doshi 37


Example 5: Welch’s t-test…
• 5. p-value = 0.017 < 0.05  reject H0

• 6. Conclusion:
With 5% significance, the sample results supports
the conclusion that the new software package
provides a smaller population mean completion time

7/10/2021 Dr. Jyotika Doshi 38


t-test for dependent samples

7/10/2021 Dr. Jyotika Doshi 40


Difference between means
Dependent samples
• Dependent/paired/matched samples
• Three conditions are required to conduct the test.
– The samples must be randomly selected
– The samples must be dependent (paired)
– Both populations must be normally distributed
• Sampling distribution of d̅, where d=x1-x2
– Mean of the differences between paired data entries
in the dependent samples
– d̅ is approximated by ~ t (n-1), n=number of data pairs

• Paired t-test statistic:


7/10/2021 Dr. Jyotika Doshi 41
Example 6: Paired t-test
• A reading centre claims that students will perform
better on a standardized reading test after going
through the reading course offered by their centre. The
table shows the reading scores of 6 students before
and after the course. At = 0.05, is there enough
evidence to conclude that the students’ scores after
the course are better than the scores before the
course?
Student 1 2 3 4 5 6
Score (Before) 85 96 70 76 81 78
Score (After) 88 85 89 86 92 89

7/10/2021 Dr. Jyotika Doshi 42


Example 6: paired t-test…
• 1. H0: d ≤ 0 vs. Ha: d > 0
• 2.  = 0.05
• 3. 𝒕 = 𝒅 −𝒔𝝁𝒅 𝑯𝟎
𝒅
𝒏

• 4. Student 1 2 3 4 5 6 sum
Score (Before) 85 96 70 76 81 78
Score (After) 88 85 89 86 92 89
d -3 11 -19 -10 -11 -11 -43
d2 9 121 361 100 121 121 833
7/10/2021 Dr. Jyotika Doshi 43
Example 6: paired t-test…
• 4. d̅ = ∑d/n = -43/6 = -7.167 𝒅 − 𝝁𝒅 𝑯𝟎
2 2 2 𝒕= 𝒔𝒅
sd = (∑d - n*d̅ )/(n-1)
= (833 – 6*(-7.167)2)/5 𝒏
sd = 10.245 −𝟕.𝟏𝟔𝟕−𝟎
= 𝟏𝟎.𝟐𝟒𝟓 = -1.714
• 5. 𝟔

tcalc=-1.714, ttab= 2.015  not to reject H0


• 6. Conclusion: not enough evidence to support at 5%
significance level to support that students’ scores after
the course are better than the scores before the course

7/10/2021 Dr. Jyotika Doshi 44


Testing
the Difference Between
Proportions

7/10/2021 Dr. Jyotika Doshi 45


Z-test for p1 – p2
• Point estimate: p̂ 1 – p̂ 2
• Sampling distribution of ( p̂1 – p̂ 2 ) ~ N (p1−p2, σp̂1 – p̂2 )
𝑝1𝑞1 𝑝𝑞
• σp̂1 – p̂2 = + 2 2
𝑛1 𝑛2
p̂ – p̂ − 𝒑𝟏 −𝒑𝟐 𝑯𝟎
• Z statistic = 1 2
σp̂1 –p̂2
• When p1, p2 are unknown, estimate using pooled p
• under null hypo. : p1=p2, use pooled/combined p
𝑥1+𝑥2 𝑛1𝑝1+𝑛2𝑝2 1 1
• p= = , σp̂1 – p̂2 = 𝑝𝑞 ( + )
𝑛1+𝑛2 𝑛1+𝑛2 𝑛1 𝑛2
• applies to large sample situations where n1p1, n1q1, n2p2,
n2q2 are all greater than or equal to 5

7/10/2021 Dr. Jyotika Doshi 46


Example 7: Z-test
for diff. in population proportions
• A tax preparation firm is interested in comparing the
quality of work at two of its regional offices. The firm is
interested to know the difference between the
proportion of erroneous returns prepared at each
office. Independent simple random samples from the
two offices provide the following information.
Office 1: n1 250, x1:Number of returns with errors 35
Office 2: n2 300, x2: Number of returns with errors 27
• Step 1. H0: p1-p2=0 vs. H1: p1-p2≠0
• Step 2: α = 0.05
• Step 3: Z-test for diff. between proportion
7/10/2021 Dr. Jyotika Doshi 47
Example 7…
• Step 4:
𝑥1+𝑥2 35+27
• p= = =0.1127
𝑛1+𝑛2 250+300
1 1
• σp̂1 – p̂2 = 𝑝𝑞 ( + )
𝑛1 𝑛2
p̂1 – p̂2 − 𝒑𝟏 −𝒑𝟐 𝑯𝟎
• Z statistic =
σp̂1 –p̂2
0.14−0.09 − 𝟎
= = 1.85
1 1
𝟎.𝟏𝟏𝟐𝟕 𝟏−𝟎.𝟏𝟏𝟐𝟕 ( +
250 300
)

7/10/2021 Dr. Jyotika Doshi 48


Example 7…
• Step 5: rejection rule (H1: p1-p2≠0 , Two-tail test)
– Critical value: for α = 0.05 in two tails, ztab =1.96
– zcalc =1.85 < ztab =1.96  no rejection of H0
– OR p-value?
– p-value = 2(.0322) = 0.0644 > α = 0.05
 not to reject H0
– At 1%, reject H0
• Step 6: Conclusion
– At 5% significance, no difference between the proportion
of erroneous returns prepared at each office.
– At 1% significance, there is difference between the
proportion of erroneous returns prepared at each office.

7/10/2021 Dr. Jyotika Doshi 49


Compare two population variances
– Independent samples
– Populations: approx. normal distribution
– Fisher’s F-test

7/10/2021 Dr. Jyotika Doshi 53


Compare two population variances
• Two independent random samples
• Sampling Distribution of s12/s22 when σ12=σ22 (under
H0)
– F distribution with (n1-1) d.f. for numerator and (n2-1) d.f
for denominator
• F-test or Variance ratio test
– By Fisher
– Value from 0 to infinity
– Shape depends on d.f. in numerator and denominator
• F test statistic is constructed with the larger sample
variance in the numerator, so F-value ≥ 1
– Value of the test statistic will be in the upper tail

7/10/2021 Dr. Jyotika Doshi 54


F-table
• http://socr.ucla.edu/Applets.dir/F_Table.html
• https://www.stat.purdue.edu/~jtroisi/STAT350
Spring2015/tables/FTable.pdf
• https://home.ubalt.edu/ntsbarsh/business-
stat/StatistialTables.pdf

7/10/2021 Dr. Jyotika Doshi 55


https://www.stat.purdue.edu/~jtroisi/STAT350Spring20
15/tables/FTable.pdf

F value for 5 prob. value

7/10/2021 Dr. Jyotika Doshi 56


http://socr.ucla.edu/Applets.dir/F_Table.html
F Table for α = 0.10 F value for single prob. Value
Separate table for each p

7/10/2021 Dr. Jyotika Doshi 57


Example 8: F-test
compare two variances
• A sample of 26 arrival times for the Milbank service
provides a sample variance of 48 and a sample of 16
arrival times for the Gulf Park service provides a
sample variance of 20.
– Lower the variance, higher is the consistence
– As the Milbank sample provided the larger sample
variance, denote Milbank as population 1 (higher variance
in numerator)
• Step 1. H0: σ12=σ22 vs. H1: σ12 > σ22
• Step 2: α = 0.05
• Step 3: F-test to compare variances
• Step 4: Fcalc = s12/s22 = 48/20 = 2.4, d.f. (25,15)

7/10/2021 Dr. Jyotika Doshi 58


Example 8…
• Step 5: rejection in right side (num var. ≥ deno var.)
• Critical value: F(25,15,0.05) = 2.28, F(25,15,0.01) = 3.28
– Fcalc = 2.4 > F(25,15,0.05) = 2.28  reject Ho at 5% signi.level
– Fcalc = 2.4 < F(25,15,0.01) = 3.28  do not reject Ho at 1% level
• for Fcalc = 2.4, P-value (0.025, 0.05) < 0.05  reject H0 at 5%
 do not reject H0 at 1%
• Step 6: Conclusion: signi. diff. in variances at 5% level of signi.,
BUT no signi. diff. in variances at 1%
Deno-
minator p
d.f.

7/10/2021 Dr. Jyotika Doshi 59


F-test …
• population must be approximately normally
distributed
• independent samples
• larger variance should always go in the numerator
– to force the test into a right-tailed test
– right-tailed tests are easier to calculate
• For two-tailed tests, divide alpha by 2 before finding
the right critical value
• If degrees of freedom aren’t listed in the F Table, use
the larger critical value
– This helps to avoid the possibility of Type I errors

7/10/2021 Dr. Jyotika Doshi 60


End of 2-sample test !!!

7/10/2021 Dr. Jyotika Doshi 64

You might also like