You are on page 1of 54

Statistics with Computer

Applications
Hypothesis Testing
Objectives: At the end of the class, the students
shall be able to:
1. Differentiate when to use the independent t-test and paired t-test
2. Interpret the SPSS outputs for parametric test and non-
parametric test for 2 samples
3. Answer the midterm examination with accuracy, speed, and
honesty
Lecture Outline:
•What is Hypothesis Testing?
•Hypothesis Formulation
•Statistical Errors
•Effect of Study Design
•Test Procedures
•Test Selection.
Statistics
Descriptive Inferential

Organizing,
Correlational Generalising
summarizing &
describing data
Relationships

Significance
Sampling Error

Statistics Effective sampling is


essential to correctly
generalise back to our
The dependent variable can be target population
generalised from n to N
What is Hypothesis Testing?

Null Hypothesis Alternative Hypothesis

A=B AB
We also need to establish:
1) How unequal are these observations?
2) Are these observations reflective of the general population?
Example Hypotheses: Isometric Torque
• Is there any difference in the length of time that males and females
can sustain an isometric muscular contraction?

Null Hypothesis Alternative Hypothesis

mean♂ = mean♂ 
mean♀ mean♀
Example Hypotheses: Isometric Torque
• Is there any difference in the length of time that males and
females can sustain an isometric muscular contraction?

Alternative Hypothesis (HA) or experimental (HE)


There is a significant difference in the length of time of sustaining an
isometric muscular contraction between males and females.
n.b. these are 2-tailed hypotheses. Most
Null Hypothesis (H0) common and more recommended.

There is no significant difference in the length of time of sustaining


an isometric muscular contraction between males and females
Example Hypotheses: Isometric Torque
• Is there any difference in the length of time that males and
females can sustain an isometric muscular contraction?
Useful analogy- the criminal trial
Imagine you are the prosecutor

H0 = Defendant not guilty


HA = Defendant guilty
Your job is to provide sufficient evidence (i.e. ‘beyond
reasonable doubt’) that the defendant is not innocent.
Remember: the p-value does NOT tell us the probability
they are innocent but rather the probability of finding
our evidence assuming they are innocent
Example Hypotheses: Isometric Torque
• Is there any difference in the length of time that males and
females can sustain an isometric muscular contraction?
160
n.b. This is why effective N♀
140 sampling is so important... N♂
120

n♀
Number of People

100

80 n♂
60

40

20

1500 2500 3500 4500 5500


16 17 18 19 20
Energy Intake (calories per day)

Sustained Isometric Torque (seconds)


Example Hypotheses: Isometric Torque
• Is there any difference in the length of time that males and
females can sustain an isometric muscular contraction?
160
…poor/insufficient sampling can N♀
140 lead to errors… N♂
120

n♀
Number of People

100

80 n♂
60

40

20

1500 2500 3500 4500 5500


16 17 18 19 20
Energy Intake (calories per day)

Sustained Isometric Torque (seconds)


Statistical Errors
• Type 1 Errors
-Rejecting H0 when it is actually true
-Concluding a difference when one does not actually exist

• Type 2 Errors
-Accepting H0 when it is actually false (e.g. previous slide)
-Concluding no difference when one does exist
Errors can occur due to biased/inadequate sampling, poor
experimental design or the use of inappropriate/non-
parametric tests.
Back to Study Design
• Independent Measures
• Individual scores in each data set are independent of one another

• Repeated Measures
• Individual scores in each data set are dependent/paired/correlated
Back to Study Design
• Independent Measures T O1
• Individual scores in each data set are independent of one another
2 Distinct
Groups
P
• Repeated Measures Oa
• Individual scores in each data set are dependent/paired/correlated

O1 T O2
Pre-Experimental
designs.
Same individuals
tested twice
Back to Study Design
• Independent Measures True-Experimental design.
• Individual scores in each data set are independent of one another
Random Group Assignment
O1 T O2

Depends on how
equivalent
groups were R O3
achieved P O4
• Repeated Measures
Cross-Over Design
Example Hypotheses: Isometric Torque
• Is there any difference in the length of time that males and
females can sustain an isometric muscular contraction?

Independent

• So the above example is an measures design

• Which therefore requires an independent t-test.


Students’ (Gosset’s) t-test
160
Independent t-test: Calculation
140
Is this a significant n♀
120
effect? n♂
Number of People
100

80

60

40

20

1500 2500 3500 4500 5500


16 17 18 19 20
Energy Intake (calories per day)
Sustained Isometric Torque (seconds)
Mean SD n
♀ 18.5 1.74 25
♂ 16.5 1.72 25
Independent t-test: Calculation
Step 1:
Calculate the Standard Error for Each Mean

SEM♀ = SD/√n = 1.74/5 = 0.348

SEM♂ = SD/√n = 1.72/5 = 0.344

Mean SD n
♀ 18.5 1.74 25
♂ 16.5 1.72 25
Independent t-test: Calculation
Step 2:
Calculate the Standard Error for the difference in means

SEMdiff = √ SEM♀2 + SEM♂2 = √ 0.23944 = 0.489

Mean SD n
♀ 18.5 1.74 25
♂ 17.5 1.72 25
Independent t-test: Calculation
Step 3:

Calculate the t statistic

t = (Mean♀ - Mean♂) / SEMdiff = 2.00

Mean SD n
♀ 18.5 1.74 25
♂ 16.5 1.72 25
Independent t-test: Calculation
Step 4:
Calculate the degrees of freedom (df)

df = (n♀ - 1) + (n♂ - 1) = 48

Mean SD n
♀ 18.5 1.74 25
♂ 16.5 1.72 25
Independent t-test: Calculation
Step 5:
Determine the critical value for t using a t-distribution table
n.b. Use 0.05
Degrees of Freedom Critical t-ratiofor 2 tailed test
44 2.015
46 2.013
48 2.011
50 2.009

Mean SD n
♀ 18.5 1.74 25
♂ 16.5 1.72 25
Independent t-test: Calculation
Step 6 finished:
Compare t calculated with t critical

Therefore,
Calculated t = 2.00
t calculated < t critical
Critical t = 2.01
Effect size n.s.

Mean SD n
♀ 18.5 1.74 25
♂ 16.5 1.72 25
Independent t-test: Calculation
Interpretation:
P > 0.05 Failed to reject the HO

Conclusion:

There is no significant difference in the length of time of sustaining


muscular contraction between males and females.

Mean SD n
♀ 18.5 1.74 25
♂ 16.5 1.72 25
Independent t-test: Calculation
Evaluation:
The wealth of available literature supports that females can sustain
isometric contractions longer than males. This may suggest that the
findings of the present study represent a type I error
Possible solution:
Increase n
Mean SD n
♀ 18.5 1.74 25
♂ 16.5 1.72 25
Independent t-test: SPSS Output
Group Statistics

Group N Mean Std. Deviation


Std. Error
Mean
Swim Data
SwimTime50m Control 10 24.7720 1.25246 .39606 from SPSS
Visualis ation 10 26.4680 1.92823 .60976
session 8
Independent Samples Test

Levene's Test for


Equality of Variances t-tes t for Equality of Means
95% Confidence
Interval of the
Mean Std. Error Difference
F Sig. t df Sig. (2-tailed) Difference Difference Lower Upper
SwimTime50m Equal variances
7.842 .012 -2.333 18 .031 -1.69600 .72710 -3.22358 -.16842
ass umed
Equal variances
-2.333 15.447 .034 -1.69600 .72710 -3.24188 -.15012
not as sumed

Calculated t Ignore sign


2.333 > 2.101
df 18 = critical t 2.101 So P < 0.05
Repeated Measures Designs
• As shown earlier, a repeated measures design infers that data in each
data set can be paired or correlated with one another

• An independent t-test is inappropriate to analyse such data

• Instead, a paired t-test should be used…


Advantages of using Paired Data
• Data from independent samples is heavily
200
influenced by variance between subjects
180

160
i.e.
Number of Press-Ups

140
This data would have a large
120
Large SD associated with an
100

80
SD independent t-test simply
60
(variance) because some subjects
40 performed better than others
20

0
HOWEVER…

1 2
Week
Advantages of using Paired Data
• Data from independent samples is heavily
200
influenced by variance between subjects
180

160
…using the same participants
on two occasions allows us to
Number of Press-Ups

140

120 pair up the data…


100

80

60
…now we can remove
40
between subject variance
from subsequent analysis…
20

1 2
Week
Paired t-test: Calculation
Subject Week 1 Week 2 Diff (D) Diff2 (D2)
1 10 12
2 50 52
3 20 25
4 8 10
5 115 120
6 75 80
7 45 50
8 170 175
Steps 1 & 2: Complete this table ∑D = ∑D2 =
Paired t-test: Calculation
Step 3:
Calculate the t statistic

∑D
t= n x ∑D2 – (∑D)2 =
√ (n - 1)

∑D = ∑D2 =
Paired t-test: Calculation
Step 3:
Calculate the t statistic

31
t= 8 x 137 – (31)2 = 7.06 √
7

∑D = ∑D2 =
Paired t-test: Calculation
Steps 4 & 5:
Calculate the df and use a t-distribution table to find t critical
Critical t-ratio Critical t-ratio
Degrees of Freedom (0.05 level) (0.01 level)
1 12.71 63.657
2 4.303 9.925
3 3.182 5.841
4 2.776 4.604
5 2.571 4.032
6 2.447 3.707
7 2.365 3.499
df = n8-1 2.306 3.355
9 2.262 3.250
Paired t-test: Calculation
Step 6 finished:
Compare t calculated with t critical

Therefore,
Calculated t = 7.06
t calculated > t critical
Critical t = 3.499
Effect size sig.
Mean SD n
Week 1 61.6 56.6 8
Week 2 65.5 57.5 8
Paired t-test: Calculation
Interpretation:
P < 0.05 Reject H0

Conclusion:

There is a significant difference in the DV between


week 1 and week 2.
Mean SD n
Week 1 61.6 56.6 8
Week 2 65.5 57.5 8
Paired t-test: SPSS Output
Paired Samples Statistics

Std. Error
Mean N Std. Deviation Mean Push-up Data
Pair VAR00001 61.6250 8 56.64157 20.02582
1 VAR00002 65.5000 8 57.54005 20.34348 from lecture 3

Paired Samples Test

Paired Differences
95% Confidence
Interval of the
Std. Error Difference
Mean Std. Deviation Mean Lower Upper t df Sig. (2-tailed)
Pair 1 VAR00001 - VAR00002 -3.87500 1.55265 .54894 -5.17305 -2.57695 -7.059 7 .000

Calculated t
Ignore sign
7.059 > 3.499
df 7 = critical t 2.365 (0.05) So P < 0.01
3.499 (0.01)
Parametric versus Non-Parametric
• Both the t-tests just shown are parametric tests

• These examine for differences in the mean

• Therefore the mean must be an accurate descriptor


Normal
? Non-normal
Example Hypotheses: Isometric Torque
• Is there any difference in the length of time that males and
females can sustain an isometric muscular contraction?
Normal Distribution
160
t-test mean is appropriate
140
Mean Mean
120
A B
Number of People

100

80

60

40

20

1500 2500 3500 4500 5500


16 17 18 19 20
Energy Intake (calories per day)

Sustained Isometric Torque (seconds)


Example Hypotheses: Isometric Torque
• Is there any difference in the length of time that males and
females can sustain an isometric muscular contraction?
NON-Normal Distribution
160
Type 2 mean is INappropriate
140
error
Mean
120
A
Number of People

100

80
Mean
B
60

40

20

1500 2500 3500 4500 5500


16 17 18 19 20
Energy Intake (calories per day)

Sustained Isometric Torque (seconds)


…assumptions of parametric analyses

• All means and paired differences are ND (this is the main


consideration)

• N acquired through random sampling

• Data must be of at least the interval level of measurement

• Data must be Continuous.

…but see Norman (2010) Adv. Health Sci. Educ.


Non-Parametric Tests
• These tests use the median and do not assume anything about
distribution, i.e. ‘distribution free’

• Mathematically, value is ignored (i.e. the magnitude of differences are


not compared)

• Instead, data is analysed simply according to rank.


Non-Parametric Tests
• Independent Measures

• Mann-Whitney Test
e.g. Exam grades (ordinal) from 14 students in 2 separate schools

• Repeated Measures

• Wilcoxon Test
Mann-Whitney U: Calculation
Step 1:
Rank all the data from both groups in one series, then total each
School A School B
Student Grade Rank Student Grade Rank
J. S. B- T. J. D
L. D. B- M. M. C+
H. L. A+ K. S. P. C+
M. J. D- S. R. B-
T. M. B+ M. P. E
T. S. A- W. A. C-
P. H. F F. A-
Median = B-;∑RA = Median = C+;∑RB =
Mann-Whitney U: Calculation
Step 2:
Calculate two versions of the U statistic using:

(nA + 1) x nA
U1 = (nA x nB) + - ∑RA
2
AND…

(nB + 1) x nB
U2 = (nA x nB) + - ∑RB
2

Median = B-;∑RA = Median = C+;∑RB =


Mann-Whitney U: Calculation
Step 2:
Calculate two versions of the U statistic using:

(nA + 1) x nA
U1 = (nA x nB) + - ∑RA
2
…OR to save time you can calculate U1 and then U2 as follows

U2 = (nA x- n
UB1 )

Median = B-;∑RA = Median = C+;∑RB =


Mann-Whitney U: Calculation
Step 3 finished:
Select the smaller of the two U statistics (U1 = 17.5; U2 = 31.5)
…now consult a table of critical values for the Mann-Whitney test

n 6 7 8 9
0.05 5 8 13 17
0.01 2 4 7 11

Calculated U must be less than critical Conclusion


U to conclude a significant difference Median A = Median B
Mann-Whitney U: SPSS Output
Ranks

VAR00002 N Mean Rank Sum of Ranks


VAR00001 1.00 7 8.50 59.50
2.00 7 6.50 45.50
Total 14

Test Statisticsb

VAR00001
Mann-Whitney U 17.500
Calculated U
Wilcoxon W 45.500
Z -.900
(lower value)
Asymp. Sig. (2-tailed) .368
Exact Sig. [2*(1-tailed a
.383
Sig.)]
17.5 > 8
a. Not corrected for ties .
b. Grouping Variable: VAR00002 So P > 0.05 n.s.
Non-Parametric Tests
• Independent Measures

• Mann-Whitney Test

• Repeated Measures
e.g. One group pre-test post-test, assumed non-normal

• Wilcoxon Test
Wilcoxon Signed Ranks: Calculation
Step 1:
Rank all the differences in one series (ignoring signs), then total each
Pre-training Post-training
Athlete Diff. Rank Signed Ranks
OBLA (kph) OBLA (kph) - +
J. S. 15.6 16.1 0.5 6 6
L. D. 17.2 17.5 0.3 4.5 4.5
H. L. 17.7 16.7 -1 -7 -7 4.
M. J. 16.5 16.8 0.3 4.5 5
T. M. 15.9 16.0 0.1 1.5 1.5
T. S. 16.7 16.5 -0.2 -3 -3 1.
P. H. 17.0 17.1 0.1 1.5 5
Medians = 16.7 16.7 ∑Signed Ranks =
Wilcoxon Signed Ranks: Calculation
Step 2:
The smaller of the T values is our test statistic (T+ = 18; T- = 10)
…now consult a table of critical values for the Wilcoxon test

n 6 7 8 9
0.05 0 2 3 5

Conclusion
Calculated T must be less than critical
T to conclude a significant difference Median A = Median B
Wilcoxon Signed Ranks: SPSS Output
Ranks

N Mean Rank Sum of Ranks


VAR00002 - VAR00001 Negative Ranks 2a 3.00 6.00
Pos itive Ranks 5b 4.40 22.00
Ties 0c
Total 7
a. VAR00002 < VAR00001
b. VAR00002 > VAR00001
c. VAR00002 = VAR00001

Test Statisticsb

VAR00002 -
VAR00001 10 > 2
Z -1.364 a
So P > 0.05 n.s.
Asymp. Sig. (2-tailed) .172
a. Bas ed on negative ranks.
b. Wilcoxon Signed Ranks Tes t
So which stats test should you use?
Nominal Q1. What is the LOM? Interval/Ratio

Ordinal
No Q2. Are the data ND?
Yes

Q3. Are the


data paired
or
independent?
Why do we use Hypothesis Testing?

• It is easy (i.e. data in  P value out)

• It provides the ‘Illusion of Scientific


Objectivity’

• Everybody else does it.


Problems with Hypothesis Testing?
• P<0.05 is an arbitrary probability (P<0.06?)

• The size of the effect is not expressed

• The variability of this effect is not expressed

• Induction/deduction - reproducibility

• Overall, hypothesis testing ignores ‘judgement’.

You might also like