You are on page 1of 8

Comparing (means of) two groups

Here we are comparing two groups(or populations) with respect to a variable of


interest (the variable of interest is the one that we measure from each group; the basis
of our comparison). Samples (control group and experimental group; two groups [one
under treatment A; the other under treatment B].

Assumptions:
A. The populations or groups are normally distributed.
B. Sample size is less than 30
C. Standard deviations (variances) of the populations/groups are not known (that is to
say, they are drawn from the samples)
D. Samples are independent and randomly picked (or selected) from each group
(population).

At the end of the hypothesis testing, we can conclude any of the following:
a) there is no significant difference between the two groups (or populations).
b) One group is greater than the other (which in effect is also telling that one group is
less than the other).
=================================================
Hypotheses

Ho: the two groups wrt to the variable of interest do not significantly differ (μA = μB
or μA - μB = 0) [or the treatment has no effect] or a hypothesis mentioning the groups
and the variable being measured [(with respect to a variable] predicated with a phrase
equivalent to > or <. For example, the physics grades of batch A are not higher than
that of batch B (μA < μB or μA - μB < 0).

In the hypothesis: The physics grades of batch A are not higher than that of batch B,
the two groups (populations) are batches A and B; the variable of interest (the one
measured to compare the two groups) is the physics grades.

Ha: The two groups wrt to the variable of interest significantly differ ((μA≠ μB or μA -
μB ≠ 0) [or the treatment has an effect] or a hypothesis mentioning the groups and the
variable being measured [(with respect to a variable] predicated with a phrase
equivalent to > or <. For example, the physics grades of batch A are higher than that
of batch B (μA > μB or μA - μB > 0)

Test-statistic:

A. If we assume that the samples are randomly and independently selected from
populations that are normally distributed and that the population variances are equal
(but unknown), we use a pooled-variance t-test to determine whether there is
significant differences between the means of the two populations.

Formula for t-computed


2
(n1 −1)(s 21 )+(n2 −1)(s 22)
s p= [pooled variance]
n 1 + n2 − 1
s1 = sample standard deviation of group 1 (or population 1)
s2 = sample standard deviation of group 2 (or population 2)
n1 = sample size of group 1
n2 = sample size of group 2

(sample mean of group 1− sample meanof group2)


t=
√ s 2p
( n1 + n1 )
1 2

Degrees of freedom = n1 + n2 - 2

B. If we assume that the samples are randomly and independently selected from
populations that are normally distributed and that the population variances are not
equal (and unknown), we use the separate-variance t test to determine whether there
is significant differences between the means of the two populations.
Formula for t-computed

(sample mean of group 1− sample meanof group2)


s21 s22
t=
√( +
n1 n2 )
Degrees of freedom
2
s12 s 22
(+
n1 n2 )
2 2
df = s 21 s 22 [just use the integral portion]
( ) ( )
n1
+
n2
n 1 −1 n2 − 1

Sample:

Suppose a sample of eight 35 to 39 years old nonpregnant, premenopausal oral


contraceptive users who work in a company and have a mean systolic blood pressure
of 132.86mmHg and sample standard deviation of 15.34mmHg are identified. A
sample of 21 nonpregnant, premenopausal non-OC users in the same age group are
similarly identified who have a mean SBP of 124.44mmHG and sample standard
deviation of 18.23mmHg. Do the bloodpressure of nonpregnant, premenopausal OC
(oral contraceptive) users significantly differ from the blood pressure of nonpregnant,
premenopausal non-OC users? Set significance level at 0.05 and assume normality.

Assuming 1st that the variances are equal


1. Ho: The bloodpressures of nonpregnant, premenopausal OC users do not
significantly differ from the blood pressure of nonpregnant, premenopausal non-OC
users. (μOC=μnon-OC→μOC-μnon-OC = 0)
Ha: The bloodpressures of nonpregnant, premenopausal OC users significantly differ
from the blood pressure of nonpregnant, premenopausal non-OC users. (μOC≠μnon-
OC→μOC-μnon-OC ≠ 0)

Note: In translating your hypothesis to its mathematical equivalent, whichever group


is mentioned first shall come first in the mathematical equivalent formulation. In
your hypothesis “The bloodpressures of nonpregnant, premenopausal OC users do not
significantly differ from the blood pressure of nonpregnant, premenopausal non-OC
users” OC users is the one that is mentioned first, hence in the translation it should
appear first. Observe: μOC=μnon-OC→μOC-μnon-OC = 0.

2. Significance level = 0.05.


3. t-statistic (assume the variances are equal)
4. t-tabular
Let 1 = OC; 2= non-OC
Df = n1+ n2 - 2= 8 + 21 - 2= 27
Since our Ha uses ≠, (two-tailed test)
α = 0.05

t-tabular = 2.052

5. t-computed = 1.16
Given:
Let: 1 = OC; 2 = non-OC
OC user (1) non-OC user (2)
Sample Mean x́ 1 = 132.88 x́ 2 = 124.44
Standard deviation s1 = 15.34 s2 = 18.23
Sample size n1 = 8 n2 = 21

Pooled estimate of the population variances:

(8 −1)(15.34 ¿¿ ¿2)+(21 −1)(18.23 ¿¿¿ 2)


S2p = = 307.18
8+21− 2

tcomputed:
(sample mean of group 1− sample meanof group2) 132.86 − 124.44
t= =
√(s 2p
1 1
+
n1 n2 ) √ 307.81 ( 18 + 211 ) =
1.16

6. Compare t computed and t-tabular (just the absolute values; that is never mind the
negative sign)
1.16 < 2.052 Statistical decision: accept Ho

Decision rule:
(if tcomputed is < to ttabular; accept H0; if tcomputed is > ttabular reject Ho)
7. Conclusion: The bloodpressures of nonpregnant, premenopausal OC users do not
significantly differ from the blood pressure of nonpregnant, premenopausal non-OC
users.

Suppose we will not assume that the population variances are equal (they are not
equal)

1. Ho: The bloodpressures of nonpregnant, premenopausal OC users do not


significantly differ from the blood pressures of nonpregnant, premenopausal non-OC
users. (μOC=μnon-OC→μOC-μnon-OC = 0)

Ha: The bloodpressures of nonpregnant, premenopausal OC users significantly differ


from the blood pressures of nonpregnant, premenopausal non-OC users. (μOC≠μnon-
OC→μOC-μnon-OC ≠ 0)

2. Significance level = 0.05.

3. t-statistic (assume the variances are not equal)

4. t-tabular

Let 1 = OC; 2= non-OC

Degrees of freedom:
2
s12 s 22 2

( +
n1 n2 ) ( 15.43❑2 18.23❑2
8
+
21 )2078.082365
2 2
Df = s 21 = 15.34 s❑2 2 18.232❑ 2 =
s 22 = 15.27
( ) ( ) (
n1
+
n2 8
+
) (
21 )
136.12353645

n 1 −1 n2 − 1 8−1 21 −1
We will use the integral part only so our df = 15.

Since our Ha uses ≠, we employ a two-tailed test

α = 0.05

t-tabular = 2.131

5. t-computed = 1.25
(sample mean of group 1− sample meanof group2) 132.86 − 124.44
2 2
t=

8.42
= 1.25
√( s s
1
+ 2
n1 n2 ) =
√( 15.43❑2 18.23❑2 =
8
+
21 )
6.726055768

6. Compare t computed and t-tabular (just the absolute values)


1.25 < 2.131 accept Ho

7. Conclusion: The bloodpressures of nonpregnant, premenopausal OC users do not


significantly differ from the blood pressures of nonpregnant, premenopausal non-OC
users.

=========
Another way by which we construct our hypothesis is to directly compare the two
groups (or populations) without mentioning the variable of interest (the basis of our
comparison) anymore (the variable of interest is somehow implied).

In the hypothesis, “ Men are taller than women”; the two groups are named and are
directly compared (the variable of interest though not mentioned is clear to be the
heights).

The hypothesis, Medicine A is as good as medicine B, the two groups are named, the
variable of interest no longer mentioned as well. The one who knows the medicines
would somehow know the basis of the comparison (implied); those who do not would
at least have in his mind that something has been measured (the variable of interest) to
come with such comparison.

Sample:

A researcher wishes to compare the math 100 grades of NDDU students


enrolled during the 2nd semester of SY 2016-2017 with the math 100 grades of
the NDDU students who are enrolled in the summer of 2017. Using the
results below, can we conclude that Math 100 students enrolled in the 2nd
semester of 2016-17 are as good as the math 100 students enrolled in the
summer of 2017? Let α = 0.01. Assume the populations are normally
distributed and that the samples are picked randomly.

2nd semester Summer 2017


Average grade 74.27 70.31
Standard deviation 14.40 20.86
Sample size 25 25
Solution (assuming population variances are equal)

1. Ho: The Math 100 students enrolled in the 2nd semester of 2016-17 are as good as
the math 100 students enrolled in the summer of 2017 (μ2nd sem=μsummer→μ2nd sem-μsummer =
0).

Ha: The Math 100 students enrolled in the 2nd semester of 2016-17 are not as good as
the math 100 students enrolled in the summer of 2017 (μ2nd sem≠μsummer→μ2nd sem-
μsummer≠= 0).
2. α = 0.01
3. t-statistic (assume the variances are equal)
4. t-tabular
Let 1 = 2nd semester; 2= summer
Df = n1+ n2 - 2= 25 + 25 - 2= 48
Since our Ha uses ≠, we employ a two-tailed test
α = 0.01

t-tabular = 2.682
Note: in case df is not found in the table (say df = 43); in the table are values for df
=42 and 44; you will use the one at df = 42 (the smaller one). If your df = 65 (in the
table are values for df = 60 and 70); you will use the value at df = 60 (the smaller
one).

5. t-computed
Given:
Let 1 = 2nd semester; 2= summer
2nd sem (1) summer (2)
Sample Mean x́ 1 = 78.68 x́ 2 = 70.31
Standard deviation s1 = 4.40 s2 = 14.86
Sample size n1 = 25 n2 = 25

Pooled estimate of the population variances:

(25 −1)(4.40 ¿¿ ¿2)+( 25− 1)(14.86 ¿¿¿ 2)


S2p = = 120.0898
25+25 −2

tcomputed:
(sample mean of group 1− sample meanof group2)
t=
√ s 2p
( n1 + n1 )
1 2
78.68 −70.31
=
√ 120.0898 ( 251 + 251 )
=8.37/3.099545773 = 2.70

6. Compare t computed and t-tabular (just the absolute values; that is never mind the
negative sign)
2.70 > 2.682 Statistical decision: reject Ho

Decision rule:
(if tcomputed is < to ttabular; accept H0; if tcomputed is > ttabular reject Ho)

7. Conclusion: The Math 100 students enrolled in the 2nd semester of 2016-17 are not
as good as the math 100 students enrolled in the summer of 2017. In fact, the Math
100 students enrolled in the 2nd semester of 2016-17 have better performance than the
ones enrolled in the summer of 2017 (here you were able to say this since Ho has
been rejected and the the 2nd sem students have greater mean (78.68) than summer
(70.31).

Note: In hypothesis testing were Ho uses = and has been rejected, you tell in your
conclusion which group is better (by comparing the means).

=======

Task 1: Perform a seven-step hypothesis test for the problems below: Answers
are to be (hand)written on short bond paper. Submit in pdf format to my email
address on or before 6:00PM today, April 17,2021). Just download an app that
can transform photo to pdf. If you have scanners then use it.

1. Your company can buy certain type of yarn from one of two vendors.
The vendors’ products appear to be compatible in all respects except
price, and, possibly, breaking strength. You will buy from vendor 1
(whose price is lower) unless there is reason to believe that vendor 1’s
product has a lower mean breaking strength than vendor’s 2. Random
samples are drawn from two vendors’ stocks, with the following results.
Assume that the breaking strengths are approximately normally
distributed and that the populations’ variances are NOT equal. With α =
0.01, can you conclude that vendor 1’s product is as strong as vendor 2’s?
From which vendor will you buy?

Vendor Sample size Mean breaking strength Standard


s deviation
1 10 94 3.7417
2 12 98 3

2. Your company can buy certain type of yarn from one of two vendors.
The vendors’ products appear to be compatible in all respects except
price, and, possibly, breaking strength. You will buy from vendor 1
(whose price is lower) unless there is reason to believe that vendor 1’s
product has a lower mean breaking strength than vendor’s 2. Random
samples are drawn from two vendors’ stocks, with the following results.
Assume that the breaking strengths are approximately normally
distributed and that the populations’ variances are equal. With α = 0.05,
can you conclude that vendor 1’s product is weaker than vendor 2’s?
From which vendor will you buy?

Vendor Sample size Mean breaking strength Variance


s (s2)
1 10 94 14
2 12 98 9
2
Note: if what is given is already the variance (s ) do not square the value
anymore in the formula where s2 is required.

You might also like