Professional Documents
Culture Documents
27/10/2021
Week 5; Topic 5
In terms of prep.
• No prior exams will be provided
• Use the practice quizzes and practice practicals
• Revision week is the end of December
• Ask questions!
Resources
• King’s Academic Skills for Learning (KASL)
https://keats.kcl.ac.uk/course/view.php?id=62576.
Students can see workshops and they can book 1-2-1 study skills sessions with
PhD students that will support academic writing, scientific writing, academic reading
and statistics.
Learning outcomes
• To be familiar with the underlying assumptions of statistical tests and recognize when these assumptions
have been violated.
• To calculate relevant effect indices for comparing two samples when assumptions are violated.
• To understand how to use non-parametric testing methods to evaluate such indices in the underlying
population.
• Hypothesis testing
• T-tests for continuous variables
• Chi-squared tests for categorical values
• Understanding their assumptions and what test to run (Non-parametric tests) when these are violated
• How to do this and identify this in SPSS
Look at your data!!
Topic 4 cheat slide
What can we conclude
This is Jack, jack wants to know if the amount of candy children get differs between children who dress
up in a happy way, and those who dress up in a scary way. What can we conclude?
Levene’s Test
F Sig. t df Sig. Diff STD err
Equal variance
assumed
4.16 0.06 1.45 60 0.01 1.36 0.10
Equal variance not
assumed
1.48 59.3 0.03 1.19 0.12
A. There is a significant difference in candy between a happy and scary costume (t = 1.45, df = 60, p = 0.01)
B. There is no significant difference in candy between a happy and scary costume (t = 1.45, df = 160, p = 0.06)
C. There is a significant difference in in candy between a happy and scary costume (t = 1.48, df = 60, p = 0.03)
Levene’s test!
This is Jack, jack wants to know if the amount of candy children get differs between children who dress
up in a happy way, and those who dress up in a scary way. What can we conclude?
Levene’s Test
F Sig. t df Sig. Diff STD err
Equal variance
assumed
4.16 0.06 1.45 60 0.01 1.36 0.10
Equal variance not
assumed
1.48 59.3 0.03 1.19 0.12
A. There is a significant difference in candy between a happy and scary costume (t = 1.45, df = 60, p = 0.01)
B. There is no significant difference in candy between a happy and scary costume (t = 1.45, df = 160, p = 0.06)
C. There is a significant difference in in candy between a happy and scary costume (t = 1.48, df = 60, p = 0.03)
Different tests for different answers
Jedi
Jedi Yes No
Female 30% Female 25% 75%
Male 70% Male 50% 50%
Assumptions
• The observations are randomly and independently drawn
• There are no outliers (within each group if appropriate)
If data are normally distributed we can use the mean and standard deviation, which are parameters of the
normal distribution
When data are not normally distributed, we cannot use these parameters of the normal distribution (mean
and SD) to describe the data, thus we use non-parametric approaches
When violated (one sample) we use Wilcoxon signed rank test
For
• Skewed continuous data
• Ordinal or discrete data
Wilcoxon Signed Rank Test
Step 1. Test whether our data are normally distributed
A. H1. The median preference for brain is significantly different than 50-50
B. H1. The median scary score is significantly different than 24
C. Ha. The median scary score is not significantly different between liking
brains or not
D. Ha. The mean scary score is significantly different than 24
Wilcoxon Signed Rank Test
Step 1. Test whether our data are normally distributed
A. H1. The median preference for brain is significantly different than 50-50
B. H1. The median scary score is significantly different than 24
C. Ha. The median scary score is not significantly different between liking
brains or not
D. Ha. The mean scary score is significantly different than 24
Example
I have a sample of monsters with data on weight, preference for brains, plasma, and
scary scores. Based on what I am showing, what can we write?
A. The mean weight of the sample was not different from 26 kg (t = 1.683, p = 0.092)
B. The mean weight of the sample was significantly different from 26 kg (z = 1.683, p = 0.092)
C. The median weight of the sample was not significantly different from 26 kg (t = 1.683, p =
0.092)
D. The median weight of the sample was not significantly different from 26 kg (z = 1.683, p =
0.092)
Example
I have a sample of monsters with data on weight, preference for brains, plasma, and
scary scores. Based on what I am showing, what can we write?
A. The mean weight of the sample was not different from 26 kg (t = 1.683, p = 0.092)
B. The mean weight of the sample was significantly different from 26 kg (z = 1.683, p = 0.092)
C. The median weight of the sample was not significantly different from 26 kg (t = 1.683, p =
0.092)
D. The median weight of the sample was not significantly different from 26 kg (z = 1.683, p =
0.092)
In a way, SPSS is really informative
The distribution of Scary Score was statistically different across groups (Mann-Whitney U = 137.5, p = 0.005), with ghosts
having a higher scary score than zombies.
Example from real-life!
The median difference between the ‘weight after’ and the ‘weight before’ was significantly different than zero
(Wilcoxon rank sum Z = -14.88, p <0.001). The weight decreases significantly after the programme.
In a way, SPSS is easier
Non-parametric hypotheses…
Assumptions
• Up to 20% of the cells can have expected counts less than 5%
• The minimum expected count is larger than 1
Assumptions
• Unpaired
Cells – Column (which means columns add to 100%) Cells – Row (which means row add to 100%)
Cells – Column (which means columns add to 100%) Cells – Row (which means row add to 100%)
Pearson, Fisher (and McNemar) can be used for more than two groups too!
Ethnicity
White Black Asian Other
Female
Male
And McNemar
Assumptions
• We need at least 25 discordant observations and paired (categorical) data.
Discordant (Yes -> No, and No -> Yes) (must be more
than 25)
But make sure you write it down correctly (Value A and B are not related (McNemar, binomial, exact p = 0.1)
And McNemar for more than 2 groups…
Assumptions
• We need at least 25 discordant observations and paired (categorical) data.
Discordant (Yes -> No, and No -> Yes) (must be more
than 25)
But make sure you write it down correctly (Value A and B are not related (McNemar, binomial, exact p = 0.1)
What test should I run?
I work in a zoo. I want to test whether the mean height of African animals is different
from that of Asian animals. In African animals it looks like this.
What test should I run?
I work in a zoo. I want to test whether the mean height of African animals is different
from that of Asian animals. Height is skewed.
I want to test if the proportion of people doing the KEATS quiz (YES/NO) is linked
to age bracket (<23, 23-28, > 28) but there was no one over 23 last year.
What test should I run?
T!
R I N
P
And another one!
Learning outcomes ”What was discussed”
• When to use T-Tests and when to use 𝝌2
• When to use parametric and when to use non-parametric tests
• Understand how to test the null hypothesis and interpret SPSS (and read the fine print)
Thank you/Questions?
Contact details/for more information:
Nicolaas Puts
E1.05/IOPPN
nicolaas.puts@kcl.ac.uk