Hypothesis Testing With para and Non-Para

Statistics with Computer
Applications
Hypothesis Testing
Objectives: At the end of the class, the students
shall be able to:
1. Differentiate when to use the independent t-test and paired t-test
2. Interpret the SPSS outputs for parametric test and non-
parametric test for 2 samples
3. Answer the midterm examination with accuracy, speed, and
honesty
Lecture Outline:
•What is Hypothesis Testing?
•Hypothesis Formulation
•Statistical Errors
•Effect of Study Design
•Test Procedures
•Test Selection.
Statistics
Descriptive Inferential
Organizing,
Correlational Generalising
summarizing &
describing data
Relationships
Significance
Sampling Error
Statistics Effective sampling is

essential to correctly
generalise back to our
The dependent variable can be target population
generalised from n to N
What is Hypothesis Testing?
Null Hypothesis Alternative Hypothesis
A=B AB
We also need to establish:
1) How unequal are these observations?
2) Are these observations reflective of the general population?
Example Hypotheses: Isometric Torque
• Is there any difference in the length of time that males and females
can sustain an isometric muscular contraction?
Null Hypothesis Alternative Hypothesis
mean♂ = mean♂ 
mean♀ mean♀
• Is there any difference in the length of time that males and
females can sustain an isometric muscular contraction?
Alternative Hypothesis (HA) or experimental (HE)

There is a significant difference in the length of time of sustaining an
isometric muscular contraction between males and females.
n.b. these are 2-tailed hypotheses. Most
Null Hypothesis (H0) common and more recommended.
There is no significant difference in the length of time of sustaining

an isometric muscular contraction between males and females
Useful analogy- the criminal trial
Imagine you are the prosecutor
H0 = Defendant not guilty

HA = Defendant guilty
Your job is to provide sufficient evidence (i.e. ‘beyond
reasonable doubt’) that the defendant is not innocent.
Remember: the p-value does NOT tell us the probability
they are innocent but rather the probability of finding
our evidence assuming they are innocent
160
n.b. This is why effective N♀
140 sampling is so important... N♂
120
n♀
Number of People
100
80 n♂
60
40
20
1500 2500 3500 4500 5500

16 17 18 19 20
Energy Intake (calories per day)
Sustained Isometric Torque (seconds)

160
…poor/insufficient sampling can N♀
140 lead to errors… N♂
120
n♀
Number of People
100
80 n♂
60
40
20
1500 2500 3500 4500 5500

16 17 18 19 20

Statistical Errors
• Type 1 Errors
-Rejecting H0 when it is actually true
-Concluding a difference when one does not actually exist
• Type 2 Errors
-Accepting H0 when it is actually false (e.g. previous slide)
-Concluding no difference when one does exist
Errors can occur due to biased/inadequate sampling, poor
experimental design or the use of inappropriate/non-
parametric tests.
Back to Study Design
• Independent Measures
• Individual scores in each data set are independent of one another
• Repeated Measures
• Individual scores in each data set are dependent/paired/correlated
• Independent Measures T O1
2 Distinct
Groups
P
• Repeated Measures Oa
• Individual scores in each data set are dependent/paired/correlated
O1 T O2
Pre-Experimental
designs.
Same individuals
tested twice
• Independent Measures True-Experimental design.
Random Group Assignment
O1 T O2
Depends on how
equivalent
groups were R O3
achieved P O4
Cross-Over Design
Independent
• So the above example is an measures design
• Which therefore requires an independent t-test.

Students’ (Gosset’s) t-test
160
Independent t-test: Calculation
140
Is this a significant n♀
120
effect? n♂
Number of People
100
80
60
40
20
1500 2500 3500 4500 5500

16 17 18 19 20
Mean SD n
♀ 18.5 1.74 25
♂ 16.5 1.72 25
Step 1:
Calculate the Standard Error for Each Mean
SEM♀ = SD/√n = 1.74/5 = 0.348
SEM♂ = SD/√n = 1.72/5 = 0.344
Mean SD n
♀ 18.5 1.74 25
♂ 16.5 1.72 25
Step 2:
Calculate the Standard Error for the difference in means
SEMdiff = √ SEM♀2 + SEM♂2 = √ 0.23944 = 0.489
Mean SD n
♀ 18.5 1.74 25
♂ 17.5 1.72 25
Step 3:
Calculate the t statistic
t = (Mean♀ - Mean♂) / SEMdiff = 2.00
Mean SD n
♀ 18.5 1.74 25
♂ 16.5 1.72 25
Step 4:
Calculate the degrees of freedom (df)
df = (n♀ - 1) + (n♂ - 1) = 48
Mean SD n
♀ 18.5 1.74 25
♂ 16.5 1.72 25
Step 5:
Determine the critical value for t using a t-distribution table
n.b. Use 0.05
Degrees of Freedom Critical t-ratiofor 2 tailed test
44 2.015
46 2.013
48 2.011
50 2.009
Mean SD n
♀ 18.5 1.74 25
♂ 16.5 1.72 25
Step 6 finished:
Compare t calculated with t critical
Therefore,
Calculated t = 2.00
t calculated < t critical
Critical t = 2.01
Effect size n.s.
Mean SD n
♀ 18.5 1.74 25
♂ 16.5 1.72 25
Interpretation:
P > 0.05 Failed to reject the HO
Conclusion:
There is no significant difference in the length of time of sustaining

muscular contraction between males and females.
Mean SD n
♀ 18.5 1.74 25
♂ 16.5 1.72 25
Evaluation:
The wealth of available literature supports that females can sustain
isometric contractions longer than males. This may suggest that the
findings of the present study represent a type I error
Possible solution:
Increase n
Mean SD n
♀ 18.5 1.74 25
♂ 16.5 1.72 25
Independent t-test: SPSS Output
Group Statistics
Group N Mean Std. Deviation

Std. Error
Mean
Swim Data
SwimTime50m Control 10 24.7720 1.25246 .39606 from SPSS
Visualis ation 10 26.4680 1.92823 .60976
session 8
Independent Samples Test
Levene's Test for

Equality of Variances t-tes t for Equality of Means
95% Confidence
Interval of the
Mean Std. Error Difference
F Sig. t df Sig. (2-tailed) Difference Difference Lower Upper
SwimTime50m Equal variances
7.842 .012 -2.333 18 .031 -1.69600 .72710 -3.22358 -.16842
ass umed
Equal variances
-2.333 15.447 .034 -1.69600 .72710 -3.24188 -.15012
not as sumed
Calculated t Ignore sign

2.333 > 2.101
df 18 = critical t 2.101 So P < 0.05
Repeated Measures Designs
• As shown earlier, a repeated measures design infers that data in each
data set can be paired or correlated with one another
• An independent t-test is inappropriate to analyse such data
• Instead, a paired t-test should be used…

Advantages of using Paired Data
• Data from independent samples is heavily
200
influenced by variance between subjects
180
160
i.e.
Number of Press-Ups
140
This data would have a large
120
Large SD associated with an
100
80
SD independent t-test simply
60
(variance) because some subjects
40 performed better than others
20
0
HOWEVER…
1 2
Week
Advantages of using Paired Data
• Data from independent samples is heavily
200
influenced by variance between subjects
180
160
…using the same participants
on two occasions allows us to
Number of Press-Ups
140
120 pair up the data…

100
80
60
…now we can remove
40
between subject variance
from subsequent analysis…
20
1 2
Week
Paired t-test: Calculation
Subject Week 1 Week 2 Diff (D) Diff2 (D2)
1 10 12
2 50 52
3 20 25
4 8 10
5 115 120
6 75 80
7 45 50
8 170 175
Steps 1 & 2: Complete this table ∑D = ∑D2 =
Step 3:
∑D
t= n x ∑D2 – (∑D)2 =
√ (n - 1)
∑D = ∑D2 =
Step 3:
31
t= 8 x 137 – (31)2 = 7.06 √
7
∑D = ∑D2 =
Steps 4 & 5:
Calculate the df and use a t-distribution table to find t critical
Critical t-ratio Critical t-ratio
Degrees of Freedom (0.05 level) (0.01 level)
1 12.71 63.657
2 4.303 9.925
3 3.182 5.841
4 2.776 4.604
5 2.571 4.032
6 2.447 3.707
7 2.365 3.499
df = n8-1 2.306 3.355
9 2.262 3.250
Step 6 finished:
Compare t calculated with t critical
Therefore,
Calculated t = 7.06
t calculated > t critical
Critical t = 3.499
Effect size sig.
Mean SD n
Week 1 61.6 56.6 8
Week 2 65.5 57.5 8
Interpretation:
P < 0.05 Reject H0
Conclusion:
There is a significant difference in the DV between

week 1 and week 2.
Mean SD n
Week 1 61.6 56.6 8
Week 2 65.5 57.5 8
Paired t-test: SPSS Output
Paired Samples Statistics
Std. Error
Mean N Std. Deviation Mean Push-up Data
Pair VAR00001 61.6250 8 56.64157 20.02582
1 VAR00002 65.5000 8 57.54005 20.34348 from lecture 3
Paired Samples Test
Paired Differences
95% Confidence
Interval of the
Std. Error Difference
Mean Std. Deviation Mean Lower Upper t df Sig. (2-tailed)
Pair 1 VAR00001 - VAR00002 -3.87500 1.55265 .54894 -5.17305 -2.57695 -7.059 7 .000
Calculated t
Ignore sign
7.059 > 3.499
df 7 = critical t 2.365 (0.05) So P < 0.01
3.499 (0.01)
Parametric versus Non-Parametric
• Both the t-tests just shown are parametric tests
• These examine for differences in the mean
• Therefore the mean must be an accurate descriptor

Normal
? Non-normal
Normal Distribution
160
t-test mean is appropriate
140
Mean Mean
120
A B
Number of People
100
80
60
40
20
1500 2500 3500 4500 5500

16 17 18 19 20

NON-Normal Distribution
160
Type 2 mean is INappropriate
140
error
Mean
120
A
Number of People
100
80
Mean
B
60
40
20
1500 2500 3500 4500 5500

16 17 18 19 20

…assumptions of parametric analyses
• All means and paired differences are ND (this is the main

consideration)
• N acquired through random sampling
• Data must be of at least the interval level of measurement
• Data must be Continuous.
…but see Norman (2010) Adv. Health Sci. Educ.

Non-Parametric Tests
• These tests use the median and do not assume anything about
distribution, i.e. ‘distribution free’
• Mathematically, value is ignored (i.e. the magnitude of differences are

not compared)
• Instead, data is analysed simply according to rank.

• Mann-Whitney Test
e.g. Exam grades (ordinal) from 14 students in 2 separate schools
• Wilcoxon Test
Mann-Whitney U: Calculation
Step 1:
Rank all the data from both groups in one series, then total each
School A School B
Student Grade Rank Student Grade Rank
J. S. B- T. J. D
L. D. B- M. M. C+
H. L. A+ K. S. P. C+
M. J. D- S. R. B-
T. M. B+ M. P. E
T. S. A- W. A. C-
P. H. F F. A-
Median = B-;∑RA = Median = C+;∑RB =
Step 2:
Calculate two versions of the U statistic using:
(nA + 1) x nA
U1 = (nA x nB) + - ∑RA
2
AND…
(nB + 1) x nB
U2 = (nA x nB) + - ∑RB
2

Step 2:
Calculate two versions of the U statistic using:
(nA + 1) x nA
U1 = (nA x nB) + - ∑RA
2
…OR to save time you can calculate U1 and then U2 as follows
U2 = (nA x- n
UB1 )

Step 3 finished:
Select the smaller of the two U statistics (U1 = 17.5; U2 = 31.5)
…now consult a table of critical values for the Mann-Whitney test
n 6 7 8 9
0.05 5 8 13 17
0.01 2 4 7 11
Calculated U must be less than critical Conclusion

U to conclude a significant difference Median A = Median B
Mann-Whitney U: SPSS Output
Ranks
VAR00002 N Mean Rank Sum of Ranks

VAR00001 1.00 7 8.50 59.50
2.00 7 6.50 45.50
Total 14
Test Statisticsb
VAR00001
Mann-Whitney U 17.500
Calculated U
Wilcoxon W 45.500
Z -.900
(lower value)
Asymp. Sig. (2-tailed) .368
Exact Sig. [2*(1-tailed a
.383
Sig.)]
17.5 > 8
a. Not corrected for ties .
b. Grouping Variable: VAR00002 So P > 0.05 n.s.
• Mann-Whitney Test
e.g. One group pre-test post-test, assumed non-normal
• Wilcoxon Test
Wilcoxon Signed Ranks: Calculation
Step 1:
Rank all the differences in one series (ignoring signs), then total each
Pre-training Post-training
Athlete Diff. Rank Signed Ranks
OBLA (kph) OBLA (kph) - +
J. S. 15.6 16.1 0.5 6 6
L. D. 17.2 17.5 0.3 4.5 4.5
H. L. 17.7 16.7 -1 -7 -7 4.
M. J. 16.5 16.8 0.3 4.5 5
T. M. 15.9 16.0 0.1 1.5 1.5
T. S. 16.7 16.5 -0.2 -3 -3 1.
P. H. 17.0 17.1 0.1 1.5 5
Medians = 16.7 16.7 ∑Signed Ranks =
Wilcoxon Signed Ranks: Calculation
Step 2:
The smaller of the T values is our test statistic (T+ = 18; T- = 10)
…now consult a table of critical values for the Wilcoxon test
n 6 7 8 9
0.05 0 2 3 5
Conclusion
Calculated T must be less than critical
T to conclude a significant difference Median A = Median B
Wilcoxon Signed Ranks: SPSS Output
Ranks
N Mean Rank Sum of Ranks

VAR00002 - VAR00001 Negative Ranks 2a 3.00 6.00
Pos itive Ranks 5b 4.40 22.00
Ties 0c
Total 7
a. VAR00002 < VAR00001
b. VAR00002 > VAR00001
c. VAR00002 = VAR00001
Test Statisticsb
VAR00002 -
VAR00001 10 > 2
Z -1.364 a
So P > 0.05 n.s.
Asymp. Sig. (2-tailed) .172
a. Bas ed on negative ranks.
b. Wilcoxon Signed Ranks Tes t
So which stats test should you use?
Nominal Q1. What is the LOM? Interval/Ratio
Ordinal
No Q2. Are the data ND?
Yes
Q3. Are the

data paired
or
independent?
Why do we use Hypothesis Testing?
• It is easy (i.e. data in  P value out)
• It provides the ‘Illusion of Scientific

Objectivity’
• Everybody else does it.

Problems with Hypothesis Testing?
• P<0.05 is an arbitrary probability (P<0.06?)
• The size of the effect is not expressed
• The variability of this effect is not expressed
• Induction/deduction - reproducibility
• Overall, hypothesis testing ignores ‘judgement’.

Hypothesis Testing With para and Non-Para

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Hypothesis Testing With para and Non-Para

Uploaded by

Copyright:

Available Formats

Statistics with Computer

Statistics Effective sampling is

Null Hypothesis Alternative Hypothesis

Null Hypothesis Alternative Hypothesis

Alternative Hypothesis (HA) or experimental (HE)

There is no significant difference in the length of time of sustaining

H0 = Defendant not guilty

1500 2500 3500 4500 5500

Sustained Isometric Torque (seconds)

1500 2500 3500 4500 5500

Sustained Isometric Torque (seconds)

• So the above example is an measures design

• Which therefore requires an independent t-test.

1500 2500 3500 4500 5500

SEM♀ = SD/√n = 1.74/5 = 0.348

SEM♂ = SD/√n = 1.72/5 = 0.344

SEMdiff = √ SEM♀2 + SEM♂2 = √ 0.23944 = 0.489

Calculate the t statistic

t = (Mean♀ - Mean♂) / SEMdiff = 2.00

There is no significant difference in the length of time of sustaining

Group N Mean Std. Deviation

Levene's Test for

Calculated t Ignore sign

• An independent t-test is inappropriate to analyse such data

• Instead, a paired t-test should be used…

120 pair up the data…

There is a significant difference in the DV between

Paired Samples Test

• These examine for differences in the mean

• Therefore the mean must be an accurate descriptor

1500 2500 3500 4500 5500

Sustained Isometric Torque (seconds)

1500 2500 3500 4500 5500

Sustained Isometric Torque (seconds)

• All means and paired differences are ND (this is the main

• N acquired through random sampling

• Data must be of at least the interval level of measurement

• Data must be Continuous.

…but see Norman (2010) Adv. Health Sci. Educ.

• Mathematically, value is ignored (i.e. the magnitude of differences are

• Instead, data is analysed simply according to rank.

Median = B-;∑RA = Median = C+;∑RB =

Median = B-;∑RA = Median = C+;∑RB =

Calculated U must be less than critical Conclusion

VAR00002 N Mean Rank Sum of Ranks

N Mean Rank Sum of Ranks

Q3. Are the

• It is easy (i.e. data in  P value out)

• It provides the ‘Illusion of Scientific

• Everybody else does it.

• The size of the effect is not expressed

• The variability of this effect is not expressed

• Overall, hypothesis testing ignores ‘judgement’.

You might also like