You are on page 1of 2

School of Health Systems and Public Health

HME 712 Week 1 Fact sheet bos_3

The general principles of hypothesis testing


A hypothesis is a proposed relationship or theory. For example we might suggest that, at the population level,
the blood level of cholesterol is higher, among those who have previously experienced a heart attack, than
among those (same sex, similar age) who have never experienced a heart attack.

We would then wish to test if this hypothesis is true or not. We would perform a study, using study
participants, two samples of people, in which we compare cholesterol levels among those who have
experienced a heart attack; and those who have never experienced a heart attack. We measure cholesterol
levels. We would then subject the results to statistical analysis using an appropriate hypothesis test.

First we propose our hypothesis, known as the “alternative hypothesis”. Then we derive the null hypothesis
that proposes there is no relationship between heart attack and blood cholesterol level.

The problem is that any observed difference in the two sample means might have arisen purely by chance.
“Sampling error” may give rise to differences between sample means, even if the population means are
identical. Each sample has its own unique mean depending only on the members of the sample, and
membership depends on chance if we work with random samples.

In order to address this concern, we then calculate the probability of having obtained the observed differences
in the group means, or an even greater difference, if the null hypothesis (no difference) is true. If this
probability is very low (say, less than 5% or <0.05), then we would reject the null hypothesis and argue that the
alternative hypothesis would seem to be true after all.

If the probability that we calculated was high, then we would fail to reject the null hypothesis. This does not
mean the null hypothesis is true, only that we failed to show that it is false.

It may seem puzzling that we choose to test, and then either reject or fail to reject, the null hypothesis. Why
not just test the alternative hypothesis directly; and then see, from statistical testing and calculation of its
probability, if it has a high probability of being true? In the slide show that follows we will explain why we take
the indirect route (testing the null hypothesis rather than testing the alternative hypothesis directly).

The following concepts should be understood before going on. The explanations are given for a null hypothesis
about the difference between two means. However, the principles and arguments are the same if one is
performing other kinds of hypothesis tests.

1. The null hypothesis – H0


This is a hypothesis of no difference. In the cholesterol study above, the H0 would be that “there is no
difference between the mean cholesterol level of those with and without a history of previous heart
attack”. We usually write this formally as (Greek letters are used for population parameters):
H0: μ1 – μ2 = 0 (μ is the Greek letter for “m”, which stands for the mean)
2. The alternative hypothesis – HA
This is the initial hypothesis. In the example above it would be that “there is a difference in mean
cholesterol levels of those with and without a previous history of heart attack”. Formally:
HA: μ1 – μ2 ≠ 0
3. The p-value: p
When a statistical hypothesis test is carried out, using your sample data, a “p-value” is obtained. This
p-value is a probability and so it must always lie between 0 and 1. It can never be negative and it can
never be greater than 1. Actually, a p-value should never be equal to zero or unity either, although it
may be very close to zero or unity. For example, a p-value of 0.0000213 may be obtained using

1
School of Health Systems and Public Health

software and may be presented by the software as 0.000; this is due to rounding and you should
transcribe it as, say, “p <0.001” rather than the (incorrect) p = 0
What does this p-value mean? It is the probability of obtaining the observed difference in the sample
means, or an even greater difference, just by chance if the population null hypothesis is true.
Remember that the null hypothesis refers to the equality of the population means. If this p-value is
very low, we would be inclined to reject the null hypothesis and conclude that there is a difference in
the population means. If it is high, we would be inclined to fail to reject the null hypothesis and
conclude that we failed to demonstrate a difference in the population means.
4. Alpha: α
Having obtained a p-value from our hypothesis test, how do we decide if the value is low enough for
us to reject H0? Or is it high enough for us to fail to reject H0? In the natural and agricultural sciences,
levels of p that are used are typically 0.1; 0.05; 0.01; and 0.001
HOWEVER: in the biostatistical world (public health and clinical medical statistics) it is usual to use
only one specific level of p for deciding whether to reject a null hypothesis or not. That value is 0.05.
This value of p (0.05) was chosen after taking into account the trade-off between rejecting a null
hypothesis when it is true (a “Type 1 error”); failing to reject when the null hypothesis is false (a “Type
2 error”); and the expense, inconvenience, and patient-ethical issues when aiming for very low p-
values by using larger sample sizes.
This critical p-value, that is used as a benchmark in statistical hypothesis testing, and against which
the calculated p-value will be compared when deciding whether or not to reject a null hypothesis, is
called “alpha” (α). For practical purposes, in the field of biostatistics, α is usually taken to be 0.05. If
your calculated p-value is less than or equal to 0.05, reject H0 and Fail to reject H0 otherwise. When H0
is rejected we may state that the difference is “statistically significant” or “statistically significant at
the α = 0.05 level”. We could also just state that a difference is “statistically significant (p = ….. insert
the p-value here)”. So we compare the p-value (calculated) to α (a predetermined value) when
deciding on statistical significance.
5. Type 1 error (α)
The Type 1 error is the probability of rejecting H0 when H0 is actually true. A moment’s reflection
should convince you that Type 1 error = α. It is determined before the study is carried out.
6. Type 2 error (β)
The Type 2 error is the probability of failing to reject H0 when H0 is in fact false. As it is also a
probability, the value must lie between 0 and 1. It is denoted by β (“beta”). Beta is largely dependent
on the sample size. Larger sample sizes have lower values of β. Type 2 error is set before the study is
carried out. It is used to calculate the sample size for the study. It is never used to interpret the
analysis results. By convention, in biostatistics, we set β = 4*α and we then use this value of β to
compute the required sample size. (4*α means “4 x α”. The star * indicates multiplication).

7. The power of a test

The power of the study is also a probability (lies between zero and one). It is the probability that a real
difference between the population means will be detected and found to be statistically significant.
Since the Type 2 error (beta) is the probability that a real difference will be missed, it stands to reason
that the power of the study is simply = 1- β:

Power = 1 - β

The obvious way to increase power is by increasing sample size. It can also be increased by improving
the precision of measurement (for example do all measurements in triplicate and work with the
median of the three readings).

You might also like