Professional Documents
Culture Documents
Course Notes
Bias
WARNING
define random error and bias and describe how they differ
describe how random error can be reduced
define sampling error and selection bias and describe how they differ
explain the difference between random and systematic measurement error
explain the difference between differential and non-differential misclassification
define confounding and describe how it can be reduced
define effect modification and describe how to differentiate between
confounding and effect modification
Learning activities
1. View video 1 Introduction: random error and bias
2. Complete activity 1 Random error and bias
3. View video 2 Selection
4. Complete activity 2 Selection
5. View video 3 Measurement
6. Complete activity 3 Measurement
7. View video 4 Confounding and effect modification
8. Complete activity 4 Confounding and effect modification
9. Review the module 4 course notes
10. Complete the module 4 quiz by the due date in the timetable
11. Complete tutorial 4
Additional resources
Fletcher RW, Fletcher SW, Fletcher GS. Clinical Epidemiology: the essentials. 5th ed.
Philadelphia: Wolters Kluwer/Lippincott Williams & Wilkins Health, 2014. Chapters 1, 3
and 5.
Random Error
When something we are measuring diverges from the true value due to chance alone
it is called random error. Below for example, we are measuring someone’s systolic
blood pressure (SBP). The true blood pressure is actually 120 mmHg but each time
we measure it we might get a slightly different reading (dotted lines).
Another way to help visualise this is if we imagine that the true result of a study or a
test is the bulls-eye on a target. Because of random error or chance however, we
won’t hit the target every time, but with multiple throws the deviations from the target
are more likely to be close to the bullseye than far away, just like the dots around the
bullseye on the left below. Also, if we took the mean of all of the hits, we would get an
estimate very close to the target or the true answer.
In this case the results would be less closely clustered around the truth than before,
like the dots around the bullseye on the right that are more spaced out. We would say
that the first example, where the hits are close to each other has less random error or
is more precise. In contrast, in the second example there is more random error or less
precision.
We will still get slightly different readings each time we repeat the measure due to
random error but as well as this we now how some bias in the measurement. We call
bias systematic error because unlike random error that gives results either side of the
true answer with the mean of all results being close to the true answer in a random
pattern, when results are biased they differ systematically or, in other words, in one
direction from the truth.
Clearly if the results are different from the target or the truth, then this is something
that we would want to know when reading a study, so working out the risk of bias of a
study is going to be the main focus of many of the following modules in this unit.
It’s important to remember that when we use the word bias in clinical epidemiology, we
use it in a different way to how we would use this term in everyday English to mean
that someone has a certain prejudice. In epidemiology bias is something that leads to
the results no longer being the truth leading to the study getting the wrong result.
Bias is something that occurs mostly due to how a study is designed but how results
are analysed can also contribute to it. We can make a judgement about the risk of bias
(high or low) in a study’s results through the process of critical appraisal. Bias affects
the accuracy of the results i.e. how close the results are to the truth. However, the
direction and the size of the change in the results is usually not known.
Random error or chance is something that cannot be avoided altogether but can be
reduced by doing things such as increasing the size of the sample in the study and by
repeating measurements and taking an average. The main way that we assess the
effect of chance or random error on the results in a study is by looking at the width of
the confidence interval around the results. This is a measure of the precision of the
study. A narrow confidence interval means better precision or less random error, a
wider confidence interval means more random error.
Sampling Error
Sampling error comes about because when we take a sample from a population, by
chance the people that we select might not be representative of all the characteristics
of the study population. You can imagine that the smaller the sample we take, the less
likely it is that the sample will be representative of the population. However, as we
increase the size of the sample, we are more likely to get a better distribution of the
characteristics of the population so this type of random error will decrease. If we
increased the size of the sample to include the entire population, then obviously we
would have complete coverage of all the different types of characteristics in the
population and so the sampling error would reach zero.
Selection Bias
Selection bias is what happens when we select an inappropriate group of people to be
in the study or we make comparisons in the analysis between inappropriate groups of
people. This results in the estimate of frequency or effect being incorrect or different
from the truth i.e. biased.
Selection bias can occur at a number of different steps during a study. It can occur
during selection of people to include in the study sample, the selection of people into
the two or more comparison groups used in an analytic study, the loss of people from
the study during follow up, and inappropriate analysis of study participants.
How these different steps can contribute to selection bias differs somewhat according
to study type and you will learn more about the specifics of selection bias by study
type in the later critical appraisal modules. But for now, we’ll just go through some
general principles.
Volunteers for the study are recruited using media advertisements. The study finds a
prevalence of arthritis of 5%. Would you be happy to take this as the prevalence of
arthritis in Sydney?
Now let’s consider an analytic study where we are comparing 2 or more groups to
each other. Because we are comparing groups, these groups should be as similar as
possible to each other apart from the risk factor or treatment of interest; otherwise this
comparison will be biased.
Sometimes however, the randomisation might not be done correctly so that it’s not
truly random which would cause selection bias, as it would mean that the two groups
are different. For example, if the researchers interfered with the randomisation so that
the sickest people all ended up in the intervention group, then the comparison
between the intervention and control group would be biased.
Other times we might start off with two similar groups but during the study some
participants are lost to follow up. If these losses mean that the groups now differ in
terms of their risk of getting the outcome, the comparison is going to be biased, in
other words, we have selection bias.
What about if we used volunteers in the study? Would this cause selection bias in an
RCT? In an RCT, we consider that the steps before randomisation don’t lead to
selection bias, rather they affect the generalisability of the results, i.e. the ability to
So using volunteers won’t lead to selection bias in an RCT. Even if these people are
healthier than the rest of the population, as we randomise the sample to 2 groups,
group A will still start off being as healthy as group B, so the comparison between
group A and B won’t be biased. In an RCT, it is randomisation and the steps after this
that can lead to selection bias.
Internal validity refers to how likely it is that the results are correct for the sample of
participants being studied. It is the internal validity of a study that we are assessing
when we critically appraise a paper for risk of bias. Selection bias is a type of bias that
impacts the internal validity of a study.
External validity refers to how likely it is that the results will hold true for other
settings. It is the external validity of a study that we take into account when we are
thinking about whether the study results might hold true for patients that are different in
some way to those included in the study or in other words it is what we refer to as the
generalisability of the study.
Measurement Error
When we are taking measurements in a study, the measurements might differ from the
true value of whatever we are trying to measure due to random or systematic error.
These errors in measurement can occur: when we measure the risk factor or
prognostic factor we are interested in, the outcome factor of interest or other factors
that we might want to adjust for in the analysis (known as confounders).
Chance or random error in the measurements that we take means that each time we
take a measurement of the same thing we get a slightly different result each time. For
example, as we saw previously, if we took a person’s BP several times it is likely that
the reading will be slightly different each time we take it. We can reduce this type of
random measurement error by repeating the measurement several times and then
taking an average of the results.
Repeatability
We assess the degree to which chance or random error effects the measurements by
looking at the repeatability of the measure, that is how likely are we to get the same
measurement when we repeat it. Repeatability can also be referred to as reliability,
reproducibility or the precision of the measurement, so you might come across any
of these terms when you read a paper, but they all mean the same thing.
There are a few different ways that we can compare the repeatability of
measurements. For example, if we wanted to know how measurements taken at
different times by the one person or observer compare, we would call this the intra-
observer repeatability. Alternatively if we wanted to know how measurements taken by
different observers compare we would call this the inter-observer repeatability. Papers
will often use these terms when describing the measurements used in the methods
section of a paper.
The K value can be interpreted as follows Altman DG (1991) Practical statistics for
medical research. London: Chapman and Hall.)
For example, as we talked about before if we are taking someone’s BP but the cuff
has not been correctly calibrated so that each measurement overestimates blood
pressure by 5mmHg, the measurement of blood pressure would be biased. We assess
the degree to which systematic error effects the measurements by talking about the
accuracy of the measure.
1. Does the measure include all the important aspects of what we are trying to
measure?
2. Is the measure the gold standard or reference standard for this outcome, or, if
not, how well does this measure compare to the gold standard?
- Does the measure seem reasonable? For example, a 10 point scale for
measuring pain.
Many clinical measurements are what is known as dichotomous measurements i.e. the
process of measurement, whether it involves a questionnaire, assessment of
symptoms or use of a blood test or X-ray, allows us to put patients into one of two
categories, for example those that are smokers and those that aren’t, being dead vs
being alive.
Sometimes the measurement itself might not actually be dichotomous but we still end
up grouping the results in a dichotomous way. For example, when we diagnose
someone with diabetes we are measuring their blood glucose level which is a
continuous measurement, in other words it is a measure that involves a continuous
scale of values. However, we use a cut-off on this continuous scale to put people in
one category (those with diabetes) or the other (those without diabetes). This way of
grouping results is important to keep in mind when we are talking about the effect of
measurement error on the results.
If we have a continuous measure that we are not grouping with cut-points, then
random error will not have an impact on the average of these measures in the study. If
however we had a systematic error this would result in the average for the sample or
each group being an over or underestimation of the truth.
For example, if we were measuring the BP of 100 people in a study, random error
would mean that we would over-estimate the blood pressure of some, underestimate
the blood pressure of others, but the mean blood pressure of the 100 people in the
study should be the same. But if we had some error in calibration of the instrument,
then the mean blood pressure in the study would be underestimated or overestimated
by this amount.
Similarly, because of chance some of those with hypertension might get a lower
reading by chance and end up in the normotensive category (a false negative).
Misclassification
This incorrect grouping of participants in a study is called misclassification and can
bias the study results. When doing an analytic study that is making comparisons
between groups, the most important thing to assess in terms of misclassification is
whether these errors in misclassification are different or the same between the two
study groups.
Non-differential
When the measurement error that results in misclassification is occurring equally
between the two study groups we are comparing, we call this non-differential
misclassification. For example, if we were doing an RCT looking at the effect of
participating in a particular exercise program on weight loss, if the scales were
incorrectly calibrated so that weight was overestimated by 5kg in all participants, we
would be misclassifying some participants incorrectly as being overweight when they
are not but this would happen equally between those participating in the exercise
program and those not participating.
Non-differential misclassification will still bias the results, but it usually biases the
results towards no effect or in other words it means that you are likely to
underestimate the effect that you are trying to measure. So if the exercise program
actually works, it might look like the exercise program is less effective than it really is.
This is because measurement error in confounders will mean that you do not
completely adjust for the effect of the confounder on the relationship between your
factor of interest and the outcome, and since adjustment for a confounder can either
increase or decrease the size of the effect, under-adjustment for confounding due to
measurement error could bias your result in either direction.
Differential
If misclassification is more likely to occur in one group than the other, this is called
differential misclassification. For example, imagine that those measuring participants’
weight were aware if the participants were in the exercise or control group, this could
influence how measurements were done in the two groups. When measuring the
Differential misclassification will also bias the results, but it can actually bias the results
in any direction, so it might make an effect look bigger or smaller than it really is.
Because the impact of this kind of misclassification on the results is harder to
determine and can lead to an overestimation as well as an underestimation of the
effect, it is differential misclassification that we are particularly concerned about when
appraising a study for measurement bias. The most common reason for differences in
the amount of measurement error between groups that results in differential
misclassification is lack of blinding. This lack of blinding can be in those taking the
measurements, those interpreting the measurements or the participants themselves.
Another important type of bias that can occur in clinical research is confounding.
Confounding is something that occurs when the risk factor or exposure we are
interested in is associated with, or travels together with, some other factor that is also
associated with the outcome of interest. This results in a confusion or distortion of the
effect of the risk factor on the outcome.
For example, imagine we wanted to compare the incidence of cancer between people
taking anti-oxidant vitamins and those not taking anti-oxidant vitamins in a cohort
study. We do the study and find that the rate of cancer is much lower in those taking
the vitamins.
We note however that people taking the vitamins are much less likely to be smokers
than those not taking the vitamins. So if we found that the vitamin takers had a lower
incidence of cancer can we be sure that the vitamins have lowered the risk of cancer?
Or is it just that this group has a lower risk of cancer because they are less likely to
smoke? In other words, we want to know if the observed association between the
vitamins and cancer is being confounded by smoking.
Imagine now that we are doing another cohort study investigating the relationship
between obesity and myocardial infarction. We have a group of people in the
overweight/or obese weight range and a group in the normal weight range and we are
following them over time to compare the rate of heart attack between the two groups.
At the start of the study participants have a variety of measurements taken including
blood pressure and it is found that the BPs are much higher in the overweight/obese
group than in the normal weight group. This means that BP is associated with the risk
factor and we also know that high BP is associated with an increased risk of heart
attack. Does this mean that we should adjust for blood pressure as a confounder?
It is likely that part of the effect of obesity on myocardial infarction is acting through the
effect that obesity has on raising blood pressure.
In other words, blood pressure is on the causal pathway between obesity and MI so it
doesn’t make sense to adjust for the effect of blood pressure, as if we did this we
would just be removing some of the real effect of obesity.
Looking at the three criteria for confounding for this example, blood pressure would
meet the first two but not the last one.
We can try to reduce the impact of bias from confounding on the results in a number of
different ways, either in the design or the analysis stages of the study.
Design stage
Techniques include:
Randomisation
Restriction
Matching
Techniques at the design stage include doing a randomised controlled trial and this is
really the best approach to reducing confounding if this study type is possible.
Two other approaches in the design stage are restriction and matching. In restriction,
we restrict the sample to a particular subgroup of the population, for example if we
thought there might be confounding by gender we might restrict the study to women
only. However, this would mean that we wouldn’t be able to apply the results to men,
nor would we be able to examine the effect of gender on the outcome, so this
approach is not commonly used. We can also only use restriction for either one or a
very small number of characteristics so it can’t be used to deal with all of the
confounding in a study.
Analysis stage
Techniques include:
Multivariate analysis
Stratification
Standardisation
The other approach for reducing bias from confounding in a study is to adjust for
confounding in the analysis stage. The most common way to do this is by using
Here are some examples of effect modification of risk factors for disease and
interventions against disease.
- Sun, or more accurately UV exposure, increases the risk of melanoma, but the
risk of disease with the same exposure would be higher in those with fair skin
compared to those with darker skin. So the effect of UV exposure is being
modified by skin type.
- Taking NSAIDs can cause the side effect of gastrointestinal (GI) bleeding, but
the risk of developing this side effect is higher in those with a past history of
peptic ulcer disease than those without. So the risk of GI bleeding is modified
by a +ve past history for peptic ulcer disease.
Effect modification is not something that we want to adjust for like confounding as it is
not a bias. Effect modification often tells us something about the biological process of
the disease or intervention we are interested in. Whenever effect modification is found,
it is not appropriate to combine the estimates of the different subgroups to give an
overall effect as this would be misleading. Rather, the effects in the different
subgroups should be presented separately. Effect modification is an important concept
to consider when applying results to individual patients and so this will be discussed in
more detail in the later module on applicability.
In contrast, precision has nothing to do with bias. Precision is how close two or more
measurements are to each other and is determined by random error. Confidence
intervals are a measure of the precision of the estimates.
For example, smoking confounds the association between alcohol consumption and
heart attack. Smoking is a risk factor for heart attack. And alcohol consumption may
appear to be associated with an increased risk of heart attack because smokers tend
to drink more than non-smokers (the high alcohol consumption group may include a
higher proportion of smokers than the low alcohol consumption group).