You are on page 1of 2

Understanding “Effect Size” (Cohen’s d) and Statistical Significance

In recent years, effect size has been reported as an indicator of how teaching and learning at St
Stephen’s School compares with the whole State. It is an easy statistic to calculate, but it is not
always meaningful. In interpreting these effect sizes, it is important to have some understanding of
the idea of statistical significance.
Statistical significance in this context is a measure of how likely it is that a given score could
be obtained by random sampling from the population. A score a long way above or below the
mean is unlikely to be achieved by random chance while scores close to the mean are more likely
to be achieved by chance. If the probability of a score being achieved by random chance is less
than p=0.05 then the score is said to be statistically significant.
It’s important to understand that a statistically significant score can still be achieved by
random chance. Even with purely random sampling one in twenty scores will still achieve p<0.05 ,
so cherry-picking results can show what looks like a significant result even when it’s really just
random.
A small result might represent a real difference, but if it is not statistically significant there
is no way we can be sure that it’s not just due to the luck of the draw.

With Effect Size, small values are not statistically significant even for quite large groups (and even
if it was statistically significant, an Effect Size of less than about 0.2 is a trivial difference in any
case). However, for small sample sizes, even quite large values of Effect Size can fail to meet the
test of statistical significance. The table below shows the absolute value of Effect Size that must be
exceeded in order to achieve p<0.05 for different sample sizes n . In the context of the school, n is
the number of students taking a subject.

n Minimum d n Minimum d
1 1.64 12 0.47
2 1.16 14 0.44
3 0.95 16 0.41
4 0.82 18 0.39
5 0.74 20 0.37
6 0.67 25 0.33
7 0.62 30 0.30
8 0.58 35 0.28
9 0.55 40 0.26
10 0.52 45 0.25
11 0.50 50 0.23

Based on this, for example, a Mathematics Methods class with eight students will have a
significant effect size only if the calculated value of d is about 0.6 or more (positive or
negative), but a Maths Applications class with 22 students will have a significant effect size if it is
0.4 or more and a Maths Specialist class with only two students will not have a significant effect
size unless it is at least 1.2.

So does a great effect size of 0.9 for my Mathematics Specialist class mean I’m a great teacher?
No, there’s every likelihood that I just got lucky with the students I had. The effect size is not
significant for only two students.
Similarly, a disappointing -0.3 for my Mathematics Methods class doesn’t mean I’m failing them.
The effect size is not significant for a class of eight, so we cannot draw any such conclusion.

You might also like