Professional Documents
Culture Documents
103 T-Test Difference of Means Sept 2020
103 T-Test Difference of Means Sept 2020
(T-test)
Dr. Lloyd C. Bautista
BOL Plebiscite
Have you ever wondered the
DIFFERENCES OF MEANS between
the groups voting in the BOL
plebiscite?
Muslims vs Non-Muslims; Male
vs Female; Old vs. Young; or
Working vs. Non-working.
We collect a sample data from the same population
and determine the acceptance or rejection of
the null hypothesis (H0).
Remember the null hypothesis (H0) states that: (μA
= μB) “NO DIFFERENCE…if there are any differences,
discrepancies, or suspiciously outlying results they
are purely due to sampling errors".
Accepting the alternative hypothesis (Ha) states
that: (μA ≠ μB) “YES DIFFERENCE… the difference
between samples are too large to ignore and
statistically significant.”
Comparing two means
• Is there a difference between the Annual Incomes of college degree
and graduate degree graduates?
• Does compensation prior to make decisions have an effect on
corruption index?
• Is the training program effective in increasing residents before and
after the program?
Let us go back to BOL.
There are two sample groups drawn from same
voters’ population – 30 Female respondents (F)
were evaluated with respect to their extent of
support to BOL Law vis-à-vis 30 Males (M).
If the(mean) score of Females is 45 while Males ( is
40, there is a difference of 5.
Question here is whether or not the difference of
5 is statistically significant to reject the NULL
HYPOTHESIS. Or put it in another way is the
sample population large enough to indicate true
difference!
We can get thousands of pairs (M vs F) to obtain
frequency distribution. But this is inefficient.
Testing our Hypothesis
Null hypothesis (H0) means that (μA = μB)… “THERE IS NO
DIFFERENCE BETWEEN THE POPULATION MEAN & SAMPLE MEAN OF
AGES AMONG FEMALE AND MALE VOTERS…if there are any
differences, discrepancies, or suspiciously outlying results they are
purely due to sampling errors".
Alternative hypothesis (Ha) means that : (μA ≠ μB) “THERE IS A BIG
DIFFERENCE, or the difference between population and samples are
too large to ignore and statistically significant.
TABLE A. Percentage of Area under the Normal Curve
z Area between Mean and z Area beyond z
2.50 49.38 0.62
𝑥𝐹 − 𝑥𝑀
𝑧=
𝜎 𝑥 −𝑥 𝐹 𝑀
45 − 40
𝑧=
2 .62%
.62%
𝑧=+2.5 49.38%
is 100% cases or 1
100% or 1
Rules
Standard of Error = 4.4%
2.28%
0.62%
47.72%
Z = + 2.0
Z = +2.5
23 23
25 25
21 +2 21
-1
22 22
22 22
19 19
+3
Standard error = standard deviation of the Standard error = standard deviation of the
Sample Means from the Population Mean Sample Means DIFFERENCE from the
Population Mean DIFFERENCE
Case study
Remember your survey forms. We want to test the difference of
means between two barangays with respect to their perception on
item 11.a. “The presence and visibility of law enforcers in the
community is adequate.” (Rate the perception with 1 as the least and
10 as the highest.
We want to know if the difference between the means of perception
on item 11.a. on DETERRENCE is significant or not.
Null hypothesis (μA = μB)… “THERE IS NO DIFFERENCE BETWEEN
THE MEANS…if there are any differences, discrepancies, or
suspiciously outlying results they are purely due to
sampling errors".
Alternative hypothesis (Ha) means that : (μA ≠ μB) “THERE IS A
BIG DIFFERENCE, or the difference between population and
samples are too large to ignore and statistically significant.
Brgy Upper Hills Brgy Lower Hills
N 10 10
Mean 7.8 4
s2(variance) 0.56 1.8
√
s (stdev) 0.75 1.3
∑ 𝒇𝒙
s2 = 𝑠=
∑ ( 𝑥 − 𝑥) 2
𝒙=
𝑵 𝑁
mean Standard deviation
variance
Go to 103_T-Test_visibility_11a
Find the STANDARD ERROR OF THE DIFFERENCE between two Sample means (pooled)
𝑠 𝑥 1 − 𝑥2 = (√
𝑁 1 𝑠1 2+𝑁 2 𝑠2 2
𝑁 1 +𝑁 2 −2
∗
𝑁 1 +𝑁 2
𝑁1 𝑁 2 )( )
𝑠 𝑥 1 − 𝑥2 = (√ 10 ( .56 ) +10(1.8)
10+10 − 2
∗ )(
10+10
(10)(10) )
Brgy Upper Hills Brgy Lower Hills
𝑠 𝑥 1 − 𝑥2 =0.80 N 10 10
Mean 7.8 4
s2(variance) 0.56 1.8
s (stdev) 0.75 1.3
Find the t ratio by dividing the difference between means ( ) by the standard error of difference between means
7.8 − 4
𝑡=
0.80
𝑡=4 .75 T test of difference between
means of independent samples
Find df
df =
df =
df =
The calculated t (two-tailed test) is 4.75 with degrees of freedom = 18 and α= .05.
Since calculated t is MORE THAN the t critical value table of 2.101 WE REJECT THE NULL HYPOTHESIS, which
means the difference between the means is STATISTICALLY SIGNIFICANT to ignore.
See the graph below.
TABLE C. Critical Values of t
df Level of significance for two-tailed test (α)
0.05
18 2.101
Reject the Ho
α/2 =2.5%
2.5% 47.50%
Rejection area
0 t calculated = 4.75
t critical = 2.101
Workshop 1
Go back to your survey forms. We want to test the difference of
means between two barangays – Upper Hills and Lower Hills - with
respect to their perception on item 12.b. “Law enforcers respect
human rights and due process.” (Rate the perception with 1 as the
least and 10 as the highest.
We want to know if the difference between the means of perception
on item 12.b is significant or not.
N 10 10
2 2
ΣX1 ΣX1 ΣX 2 ΣX2
N 10 S2 0.000 N 10 S2 0.000
ΣX1 55.8 s 0.000 StDev ΣX1 54.11 s 0.000 StDev
X̅1 X̅1
𝑠 𝑥 1 − 𝑥2 = (√
𝑁 1 𝑠1 2+𝑁 2 𝑠2 2 𝑁 1+ 𝑁 2
𝑁 1 +𝑁 2 −2 𝑁1 𝑁2 )( )
𝑠 𝑥 1 − 𝑥2 = (√ 10 ( .0316 ) +10(.2423) 10+10
10+10 − 2 )(
(10)(10) )
PNP Bureau of Fire
0.174 N 10 10
Mean 5.58 5.411
s2(variance) 0.0316 0.2423
s (stdev) 0.178 0.492
Find the t ratio by dividing the difference between means ( ) by the standard error of difference between means
𝑥1 − 𝑥 2
𝑡=
𝑠 𝑥1− 𝑥2
df =
The calculated t (two-tailed test) is .969 with degrees of freedom = 18 and α= .05.
Since calculated t is LESS THAN the t critical value table of 2.101 WE RETAIN THE NULL HYPOTHESIS, which
means the difference between the means MIGHT BE SIMPLY A SAMPLING ERROR.
See the graph below. TABLE C. Critical Values of t
df Level of significance for two-tailed test (α)
0.05
18 2.101
Retain the Ho
α/2 =2.5%
2.5%
47.50%
Rejection area
0
t calculated = 0.969 t critical = 2.101
What if we are testing the difference
between means of two different
populations?
It is used to determine if the sample variances are so dissimilar that we reject that the population
variance are the same. We can call this non-pooled testing of the differences between means.
Observe the different population size.
We want to know whether there is difference between means of the day-required for processing
fire safety permits in two separate provinces – Bulacan and Batangas. The researcher took random
samples of applicants for fire safety permits in each of the province from January to June. Here is
the result.
Bulacan Batangas
N 36 23
Mean 6.5 5.6
s2(variance) 7.8 3.6
s (stdev) 2.8 1.8
Find the STANDARD ERROR OF THE DIFFERENCE between means (Non-Pooled)
√
2 2
𝑠 𝑠 1 2
𝑠 𝑥 1 − 𝑥2 = +
𝑁 1 −1 𝑁 2 − 1
√
Different populations
7.8 3.6
𝑠 𝑥 1 − 𝑥2 = +
36 − 1 2 3 −1
Bulacan Batangas
𝑠 𝑥 1 − 𝑥2 =. 626 N
Mean
36
6.5
23
5.6
s2(variance) 7.8 3.6
s (stdev) 2.8 1.8
Find the t ratio by dividing the difference between means ( ) by the standard error of difference between means
𝑥1 − 𝑥 2
𝑡=
𝑠 𝑥1− 𝑥2 𝑠 𝑥 1 − 𝑥2 =. 626
6.5 − 5.6
𝑡=
.626 Bulacan Batangas
N 36 23
𝑡=1.457 Mean
s2(variance)
6.5
7.8
5.6
3.6
Find df df = s (stdev) 2.8 1.8
We use smaller of the two sample size.
The calculated t (two-tailed test) is 1.46 with degrees of freedom =23 and α= .05. Since it did not
exceed the t table of 2.069 (it is closer to the mean difference (0), then we retain the null
hypothesis, which means there is no difference between the mean hours level of Bulacan and
Batangas.
The difference might only be sampling error.
However, if we think we might commit Error I (retain a false null hypothesis), we can increase the
level of significance to .20, wherein the t table is 1.319. Since calculated t (1.457) is now greater
than 1.319, we can reject the null hypothesis. Critical values of t Table
df Level of significance for two-tailed test (α)
0.05
Reject Ho
23 2.069
α/2 =10%
Retain Ho df Level of significance for two-tailed test (α)
α/2 =2.5% 0.20
23 1.319
47.50%
Rejection area
0
t critical = 1.319 t calculated = 1.46 t critical = 2.069
Concept of Significance Levels
We need a cut-off to determine if the probability of difference is
significant or too large to ignore.
When we say that the DIFFERENCE BETWEEN MEANS is
statistically significant, it means the difference is too real or large
enough to be generalized from the population.
Remember, there can be instances when in large samples in a
population, small difference can be statistically significant while small
samples with large difference might be a sampling error.
α/2 =2.5%
α/2 =2.5%
47.50%
With SD = 1.96, there is 5% chance that the With SD = 2.58, there is 1 chance out of 100
sampled difference fall at or beyond this point. that the sampled difference could happen due
Our level of significance opens us to the chance to sampling error.
of making an error.
The more stringent our α the farther out in the
tail
Workshop 2
Go back to your survey forms. We want to test the difference of
means between two barangays – A-1 and Lower Hills - with respect to
their perception on item 13.a. “Law enforcers can put offenders
behind bars.” (Rate the perception with 1 as the least and 10 as the
highest. They have different populations.
We want to know if the difference between the means of perception
on item 13.a is significant or not.
Go to 103_T-Test _arrest_non-pooled
What if we are testing the difference
between related means (like before-
and-after)?
This is a T-test of difference between Means for Same Sample Measured Twice.
Usually, this is a before-and-after comparison.
Example a sample of informal settlers were asked on their level of satisfaction after they were
transferred by NHA to another housing settlement. 1 being lowest satisfaction and 4 being the
highest. The mean before the program is 2.33 and after is 1.33. Is there a significant difference?
Before After Difference
program after program D = (X1 - X2)
Null hypothesis (H0):
Respondent X1 X2 X1 - X2 D2 (μ1 = μ2) The degree of satisfaction
1 2 1 1 1
2 1 2 -1 1
does not differ before and after.
3 3 1 2 4
Research hypothesis (Ha):
4 3 1 2 4
5 1 2 -1 1 (μ1 ≠ μ2) The degree of satisfaction
6 4 1 3 9 differs before and after.
2
ΣX1 14 ΣX2 8 ΣD 20
X̅1 2.33 X̅12 1.33 X̅D2 3.33
N 6
Find the STANDARD DEVIATION OF THE DIFFERENCES between related means
√
2 = 20
Σ𝐷
𝑆 𝐷= −¿(𝑥 ¿ ¿2−𝑥1 )2/¿¿¿ = 60
𝑁 = 2.33
= 1.33
20
6 √
𝑆 𝐷= −¿(1.33¿¿ −2.33)2/¿¿¿
𝑆 𝐷=√ 3 .33 −1
Use the means, not variances
𝑆 𝐷=1.526
Find the STANDARD ERROR OF THE DIFFERENCES between means of related samples
𝑆𝐷
𝑆 𝐷=
√ 𝑁 −1
1.5 26
𝑆 𝐷=
√ 6 −1
1.5 26
𝑆 𝐷=
2 .236
𝑆 𝐷=0.682
Find the T-TEST OF THE DIFFERENCES between means of related samples
𝑡=1.47
Degree of freedom
df
The calculated t (test of difference between means of related sample) is 1.47 with degrees of
freedom = 5 and α= .05. Since it did not exceed the t table of 2.571 (it is closer to the mean
difference (0), then we retain the null hypothesis, which means there is no statistical difference
between the mean.
The difference might only be sampling error.
See the graph below. TABLE C. Critical Values of t
df Level of significance for two-tailed test ( α)
0.05
5 2.571
Retain Ho
α/2 =2.5%
2.5%
47.50%
Rejection area
0
t calculated = 1.47 t critical = 2.571
Calculating the t statistic in computer
• When we use an estimate of the SE, we do not use the z distribution
• We use the t distribution and calculate the t statistic
Go to 103_T-Test Before&After_exercise.
What if we are testing the difference
means between proportion?
This is a T-test of difference between Proportion.
We want to determine the proportion of male and female lawmakers who support the bill
increasing the age of criminal responsibility.
√ ( )
= .39
𝑁1+ 𝑁2
𝑆 𝑝1 −𝑝 = 𝑃 3 ( 1 − 𝑃 3 ) = 180
2
𝑁1𝑁2 = 150
√
𝑆 𝑝1 −𝑝 = .39 ( 1 −.39 )
2
𝑺 𝒑 𝟏− 𝒑 =. 𝟎𝟓𝟑𝟗
180
(
+150
180 𝑋 150 )
𝟐
Find the Z-TEST OF THE DIFFERENCES between SAMPLE PROPORTIONS
𝑝 1− 𝑝 2 = .45
𝑧=
𝑆 𝑝1 −𝑝 2
= .32
.0539
.13
𝑧=
.0539
𝑧=2.4118
The calculated z (test of difference between sample proportions) is 2.41 with degrees of freedom Ꝏ and
α= .05. Since IT IS MORE THAN the t table of 1.960 (df is Ꝏ), then we REJECT the null hypothesis, which
means there is statistical difference between the PROPORTION mean.
See the graph below.
Critical values of t Table
df Level of significance for two-tailed test ( α)
0.05
ꝏ 1.96
Reject the Ho
α/2 =2.5%
2.5%
47.50%
Rejection area
0
t critical = 1.96 z calculated = 2.41