You are on page 1of 4

120MP Lecture 26 Type I and Type II Errors

Analogy
A statistical test is like a court case:
H0 is the defendant. H1 is the prosecutor
H0 is assumed innocent until proved guilty
Data is the evidence. The statistician is the jury.
If the evidence is sufficient, H0 is convicted
But, there can be miscarriages of justice

Risks
Whenever we carry out a significance test we risk making two mistakes:

Type I error - Rejecting H0 when it is true


P(Type 1 error) = α = significance level of the test

Type II error - Not rejecting H0 when it is not true


P(Type II error) = β

Example:
You have been asked by a local health authority to test the hypothesis that there are more
girls being born than boys in that area.

H0: p = 0.5 H1: p > 0.5 where p = proportion of girls born

Hence you visit the maternity ward of a local hospital and record the genders of the next 16 babies
born

Type I Error
You decide that you will reject H0 if there are at least 11 girls out of the 16 births

X= no. of girls born

If we assume that that p = 0.5 ie. X~Bin(16, 0.5)


then P(X≥11) = 16C11 0.511 0.55 + … + 0.516 = 0.105

ie significance level = P(Type I error), α = 0.105


This means that there is a 10.5% chance of rejecting H0 when in fact it is true.

1
Demonstrating this in a spreadsheet simulation:

Using the random number generator in Excel, the


gender of the child is randomly generated with the
probability of a girl being 0.5.
The genders of these 16 randomly generated children
are in Column B.

In this sample, 7 of the children were girls.


This is < 11 so we don’t reject H0: p = 0.5.

We could randomly generate lots of samples of 16 children, and collate the results. Eg,
Num of Samples 15 50 100 200
Num rejected 1 4 12 20
% rejected 6.7 8 12 10

We can see that as the number of samples increases, the % of rejections in our simulation
approaches 10.5% which is the figure expected according to theory (the binomial distribution).

We may want to reduce this type I error. ie reduce the probability of rejecting.
Clearly this can be done by making the rejection criterion stricter
eg. Reject if number of girls ≥ 12

Now P(X≥12) = 16C12 0.512 0.54 + … + 0.516 = α = 0.038

Again we could demonstrate this in a


spreadsheet simulation (part of which is
shown- there are a further 196 samples
(columns of data) not shown):

% rejected in the simulation = 3.5


≈ α = 3.8 %

So, we are committing a Type I error


less often now.

2
Type II Error
Now, what if H0 is not true?

Suppose that the proportion of girls born in the area (population) as a whole is actually 0.6

Let’s assume we are still rejecting H0 when X≥12:


P(X≥12) = 16C12 0.612 0.44 + … + 0.616 = 0.167
P(Reject H0) = 0.167 so P(Not Reject H0) = 1-0.167 =0.833
ie Here P(Type II error), β = 0.833
This means there is a 94.6% chance of not rejecting H0: p = 0.5 when you should have.
(Note that α is still 0.038 as that is based on the assumption that p = 0.5)

Now let’s assume we are rejecting H0 when X≥11, (α = 0.105)


P(X ≥ 11) = 16C11 0.611 0.45 + … + 0.616 = 0.329
So β = 0.671
(and from earlier, α = 0.105)

Trade-off between Type I and Type II Error


Compiling the above results in the table below:

Rejection criterion α β (if p = 0.6)


X ≥ 11 0.105 0.671
X ≥ 12 0.038 0.833

Making the rejection criterion stricter reduced α but increased β. We can see that this continues to
be true:
eg. Consider rejecting H0 when X≥13.
α = 16C13 0.513 0.55 + … + 0.516 = 0.001
β = 1 – (16C13 0.613 0.45 + … + 0.416) = 0.935

Decreasing the Type I error increases the Type II error and vice-versa 

Increasing the Sample Size


Imagine we recorded the genders of 50 babies in our sample.

Let’s say, as before, we want α = 0.105

Hence Reject H0 when P(X ≥ A) < 0.105


We can use the Inverse Binomial function in Excel to calculate that A = 30 satisfies the above
inequality.
Thus we will Reject H0: p = 0.5 if at least 30 out of the 50 births are girls
{Actually here α = 0.101 ie P(X≥30 | p = 0.5) = 0.101}

Again assuming p is actually 0.6, we can (use Excel to) calculate 1 - P(X≥30 | p = 0.6) = β = 0.439
We have roughly the same α (≈ 0.1), but we have reduced β.

3
Increasing the sample size can decrease the Type II error without increasing the Type I error 

eg. Let the sample size be 100.


If we Reject H0: p = 0.5 if at least 57 out of the 100 births are girls,
we could calculate α = 0.089 and (again if p = 0.6) β = 0.237
Thus β has been reduced.

Power
The power of a test is the probability that it correctly detects when H0 is false

ie Power = 1- β = 1 – P(not rejecting H0 when it is not true)

We saw earlier that if we have a sample of 16 babies and we reject H0 if X≥11, and p actually = 0.6,
then β = 0.671. Hence under those conditions, the Power of the test = 1 – 0.671 = 0.329

We discovered above that increasing the sample size decreases β. Hence it follows that
increasing the sample size increases the power

Changes in the True Value


For our calculation of β and Power above, we have taken the true value of p as 0.6

What if the true value of p is 0.7?


With a sample size of 16 and an α of 0.1, we find that Power = 0.34. It has increased.

If p=0.8, then Power = 0.92. It has increased again (quite considerably)

The further the actual value moves away from 0.5, the higher the power ie

The further H0 is from the truth, the more likely we are to detect that it is not true

It follows that the closer your hypothesised value is to the true value, the more likely you don’t reject
it when you should. Hence in your conclusions to a statistical test you should never really say
‘Accept H0’, because there is a good chance that your hypothesised value (point estimate) isn’t
accurate. It is better to give a Confidence Interval.

You might also like