CEPI5100 - M4 - Course Notes

Module 4
Course Notes
Bias
CEPI5100: Module 4 - Bias Page 1

COMMONWEALTH OF AUSTRALIA
Copyright Regulations 1969
WARNING
This material has been reproduced and communicated to
you by or on behalf of the University of Sydney
pursuant to Part VB of the Copyright Act 1968 (the Act).
The material in this communication may be subject
to copyright under the Act. Any further reproduction or
communication of this material by you may be the
subject of copyright protection under the Act.
Do not remove this notice.
© Sydney School of Public Health, University of Sydney, 2017

Learning outcomes
By the end of this module you will be able to:
 define random error and bias and describe how they differ
 describe how random error can be reduced
 define sampling error and selection bias and describe how they differ
 explain the difference between random and systematic measurement error
 explain the difference between differential and non-differential misclassification
 define confounding and describe how it can be reduced
 define effect modification and describe how to differentiate between
confounding and effect modification
Learning activities
1. View video 1 Introduction: random error and bias
2. Complete activity 1 Random error and bias
3. View video 2 Selection
4. Complete activity 2 Selection
5. View video 3 Measurement
6. Complete activity 3 Measurement
7. View video 4 Confounding and effect modification
8. Complete activity 4 Confounding and effect modification
9. Review the module 4 course notes
10. Complete the module 4 quiz by the due date in the timetable
11. Complete tutorial 4
Additional resources
Fletcher RW, Fletcher SW, Fletcher GS. Clinical Epidemiology: the essentials. 5th ed.
Philadelphia: Wolters Kluwer/Lippincott Williams & Wilkins Health, 2014. Chapters 1, 3
and 5.

Can we trust the results?
When we are reading a clinical research article, it’s important to know if we can believe
the results that we see, and to do this we need to know about what might cause the
results to be incorrect or to differ from the truth. There are two main reasons why
results diverge from the truth. In this module you will learn about the concepts of
random error or chance and systematic error or bias and how they can affect the
results of a study.
Random Error
When something we are measuring diverges from the true value due to chance alone
it is called random error. Below for example, we are measuring someone’s systolic
blood pressure (SBP). The true blood pressure is actually 120 mmHg but each time
we measure it we might get a slightly different reading (dotted lines).
This might be because of some biological

variability in SBP from moment to moment or due
to the way we are reading the measurement off
the scale. We would call this type of error random
error because the observed values fall either side
of the true value, in other words they might be
higher or lower. Most of the observations will be
close to the true value, and observations that
differ widely from the true value would be much
less common.
Another way to help visualise this is if we imagine that the true result of a study or a
test is the bulls-eye on a target. Because of random error or chance however, we
won’t hit the target every time, but with multiple throws the deviations from the target
are more likely to be close to the bullseye than far away, just like the dots around the
bullseye on the left below. Also, if we took the mean of all of the hits, we would get an
estimate very close to the target or the true answer.
Precise Less Precise There might be particular situations where

the results are more prone to random error
than others, perhaps we have a very small
study with very few observations, or it
could be that what we are measuring has
a lot of random biological variability such
as BP.
In this case the results would be less closely clustered around the truth than before,
like the dots around the bullseye on the right that are more spaced out. We would say
that the first example, where the hits are close to each other has less random error or
is more precise. In contrast, in the second example there is more random error or less
precision.

Systematic Error
Let’s contrast random error or chance
with systematic error, also known as
bias. If we go back to the
measurement of SBP on a patient
with a true SBP of 120, from the
picture to the right it can be seen that
there is something wrong with the
calibration of the machine which
means that blood pressure is
overestimated by 5 mmHg each time
we take it.
We will still get slightly different readings each time we repeat the measure due to
random error but as well as this we now how some bias in the measurement. We call
bias systematic error because unlike random error that gives results either side of the
true answer with the mean of all results being close to the true answer in a random
pattern, when results are biased they differ systematically or, in other words, in one
direction from the truth.
Let’s see how this compares using the target

diagram again. Here if the results are biased
they are no longer clustered around the target or
truth. Again, the results still have some random
error due to chance so there is a little bit of
scatter, as this is unavoidable whenever you
measure anything, but this time, even if we do
multiple throws, and take the mean of the results,
we won’t get a result anywhere near the target
because of the bias in the results.
Clearly if the results are different from the target or the truth, then this is something
that we would want to know when reading a study, so working out the risk of bias of a
study is going to be the main focus of many of the following modules in this unit.
It’s important to remember that when we use the word bias in clinical epidemiology, we
use it in a different way to how we would use this term in everyday English to mean
that someone has a certain prejudice. In epidemiology bias is something that leads to
the results no longer being the truth leading to the study getting the wrong result.

Types of Bias
There are many different types of bias that you may come across but the 3 main
types of bias that you will be learning about in this module are: selection bias,
measurement bias and confounding. There are some other types of bias that are
important for particular study types and these will be discussed in the later modules
dealing with critical appraisal of these types of studies.
Bias is something that occurs mostly due to how a study is designed but how results
are analysed can also contribute to it. We can make a judgement about the risk of bias
(high or low) in a study’s results through the process of critical appraisal. Bias affects
the accuracy of the results i.e. how close the results are to the truth. However, the
direction and the size of the change in the results is usually not known.
Random error or chance is something that cannot be avoided altogether but can be
reduced by doing things such as increasing the size of the sample in the study and by
repeating measurements and taking an average. The main way that we assess the
effect of chance or random error on the results in a study is by looking at the width of
the confidence interval around the results. This is a measure of the precision of the
study. A narrow confidence interval means better precision or less random error, a
wider confidence interval means more random error.

Selection
When we select a sample from the population to be in the study, the sample that we
get might differ from the population due to chance or due to bias. Differences between
the study sample and the target population due to chance are referred to as sampling
error; those due to bias are called selection bias.
Sampling Error
Sampling error comes about because when we take a sample from a population, by
chance the people that we select might not be representative of all the characteristics
of the study population. You can imagine that the smaller the sample we take, the less
likely it is that the sample will be representative of the population. However, as we
increase the size of the sample, we are more likely to get a better distribution of the
characteristics of the population so this type of random error will decrease. If we
increased the size of the sample to include the entire population, then obviously we
would have complete coverage of all the different types of characteristics in the
population and so the sampling error would reach zero.
Selection Bias
Selection bias is what happens when we select an inappropriate group of people to be
in the study or we make comparisons in the analysis between inappropriate groups of
people. This results in the estimate of frequency or effect being incorrect or different
from the truth i.e. biased.
Selection bias can occur at a number of different steps during a study. It can occur
during selection of people to include in the study sample, the selection of people into
the two or more comparison groups used in an analytic study, the loss of people from
the study during follow up, and inappropriate analysis of study participants.
How these different steps can contribute to selection bias differs somewhat according
to study type and you will learn more about the specifics of selection bias by study
type in the later critical appraisal modules. But for now, we’ll just go through some
general principles.

First let’s see what might happen when we select a particular group of people to
participate in a study. Imagine that we want to do a study to work out the prevalence of
arthritis in the Sydney local area.
Volunteers for the study are recruited using media advertisements. The study finds a
prevalence of arthritis of 5%. Would you be happy to take this as the prevalence of
arthritis in Sydney?
The actual prevalence of arthritis is

about 15% so why did we get such a
different result? Ideally the sample
should be a random selection of Sydney
adults but instead we have a sample of
volunteers.
These volunteers may be quite different

to the rest of the population and may be
healthier, hence why we have
underestimated the prevalence of
arthritis or in other words the result is
biased.
Now let’s consider an analytic study where we are comparing 2 or more groups to
each other. Because we are comparing groups, these groups should be as similar as
possible to each other apart from the risk factor or treatment of interest; otherwise this
comparison will be biased.
Let’s look at a randomised controlled trial as an example. In an RCT, after we have

selected the sample, as we randomise participants to the two groups (treatment and
control) we should end up with groups that are the same apart from the presence or
absence of the intervention and hence if this step is done well we should avoid
selection bias.
Sometimes however, the randomisation might not be done correctly so that it’s not
truly random which would cause selection bias, as it would mean that the two groups
are different. For example, if the researchers interfered with the randomisation so that
the sickest people all ended up in the intervention group, then the comparison
between the intervention and control group would be biased.
Other times we might start off with two similar groups but during the study some
participants are lost to follow up. If these losses mean that the groups now differ in
terms of their risk of getting the outcome, the comparison is going to be biased, in
other words, we have selection bias.
What about if we used volunteers in the study? Would this cause selection bias in an
RCT? In an RCT, we consider that the steps before randomisation don’t lead to
selection bias, rather they affect the generalisability of the results, i.e. the ability to

generalise the results to the broader population that may be different to those included
in the study.
So using volunteers won’t lead to selection bias in an RCT. Even if these people are
healthier than the rest of the population, as we randomise the sample to 2 groups,
group A will still start off being as healthy as group B, so the comparison between
group A and B won’t be biased. In an RCT, it is randomisation and the steps after this
that can lead to selection bias.
Internal and External Validity

Another way of distinguishing between these two concepts is to talk about the internal
and external validity of a study; these might be terms you have come across before.
Validity refers to the ability of the study to measure what it is designed to measure or
in other words how likely it is that the results are correct or free from bias.
Internal validity refers to how likely it is that the results are correct for the sample of
participants being studied. It is the internal validity of a study that we are assessing
when we critically appraise a paper for risk of bias. Selection bias is a type of bias that
impacts the internal validity of a study.
External validity refers to how likely it is that the results will hold true for other
settings. It is the external validity of a study that we take into account when we are
thinking about whether the study results might hold true for patients that are different in
some way to those included in the study or in other words it is what we refer to as the
generalisability of the study.

Measurement
Important concepts related to measurement error and measurement bias include:
understanding the concepts of random and systematic measurement error, the
concept of repeatability, aspects of measurement validity including content and
criterion validity and understanding the difference between differential and non-
differential misclassification.
Measurement Error
When we are taking measurements in a study, the measurements might differ from the
true value of whatever we are trying to measure due to random or systematic error.
These errors in measurement can occur: when we measure the risk factor or
prognostic factor we are interested in, the outcome factor of interest or other factors
that we might want to adjust for in the analysis (known as confounders).
Chance or random error in the measurements that we take means that each time we
take a measurement of the same thing we get a slightly different result each time. For
example, as we saw previously, if we took a person’s BP several times it is likely that
the reading will be slightly different each time we take it. We can reduce this type of
random measurement error by repeating the measurement several times and then
taking an average of the results.
Repeatability
We assess the degree to which chance or random error effects the measurements by
looking at the repeatability of the measure, that is how likely are we to get the same
measurement when we repeat it. Repeatability can also be referred to as reliability,
reproducibility or the precision of the measurement, so you might come across any
of these terms when you read a paper, but they all mean the same thing.
There are a few different ways that we can compare the repeatability of
measurements. For example, if we wanted to know how measurements taken at
different times by the one person or observer compare, we would call this the intra-
observer repeatability. Alternatively if we wanted to know how measurements taken by
different observers compare we would call this the inter-observer repeatability. Papers
will often use these terms when describing the measurements used in the methods
section of a paper.
Kappa (K) is a common way of reporting reliability/agreement in clinical research. It is

a way of estimating agreement that takes into account the agreement that will occur
due to chance alone. You do not really need to know how to estimate kappa so don’t
panic if this seems difficult, but you should know what kappa means and how to
interpret it. Below are some general rules for calculating and interpreting kappa:

We can calculate kappa using the following:
Kappa = Observed agreement – agreement due to chance

1 – agreement due to chance
The K value can be interpreted as follows Altman DG (1991) Practical statistics for
medical research. London: Chapman and Hall.)
Value of K Strength of agreement
< 0.20 Poor
0.21 - 0.40 Fair
0.41 - 0.60 Moderate
0.61 - 0.80 Good
0.81 - 1.00 Very good
In contrast to random measurement error that leads to variability in measurements

without any particular pattern, either above or below the true value, systematic
measurement error means that the measurements differ in some systematic way from
the true value, in other words the error has a pattern or occurs in one particular
direction.
For example, as we talked about before if we are taking someone’s BP but the cuff
has not been correctly calibrated so that each measurement overestimates blood
pressure by 5mmHg, the measurement of blood pressure would be biased. We assess
the degree to which systematic error effects the measurements by talking about the
accuracy of the measure.

Content and Criterion Validity
We can also use the term validity to talk about how likely it is that our measurements
are free from bias or how likely it is that we are measuring what we have set out to
measure. Here are some important questions to consider when determining the
validity of a measure:
1. Does the measure include all the important aspects of what we are trying to
measure?
- For example, if we wanted to measure something complex like

socioeconomic status, it is unlikely that this concept would be captured
completely by education alone, but rather would need to include other
measures such as income and occupation.
- This idea of measuring all of the different dimensions of a particular

concept is known as content validity.
2. Is the measure the gold standard or reference standard for this outcome, or, if
not, how well does this measure compare to the gold standard?
- This concept is known as criterion validity. When we talk about the

accuracy of a measurement we normally are referring to how well the
measure compares to the gold standard. We assess this by looking at
the sensitivity and specificity of the test, which you will learn more about
in a later module.
3. If there is no gold standard available to use or to compare to
- Does the measure seem reasonable? For example, a 10 point scale for
measuring pain.
- Does the measure predict outcomes that we would be expect to be

associated with the measure? For example, does the depression scale
predict suicide?

Data type and measurement error
The impact of random and systematic error on the results depends on the type of data
we are dealing with and also whether it occurs for all participants in the study equally
or more in one group compared to the other.
Many clinical measurements are what is known as dichotomous measurements i.e. the
process of measurement, whether it involves a questionnaire, assessment of
symptoms or use of a blood test or X-ray, allows us to put patients into one of two
categories, for example those that are smokers and those that aren’t, being dead vs
being alive.
Sometimes the measurement itself might not actually be dichotomous but we still end
up grouping the results in a dichotomous way. For example, when we diagnose
someone with diabetes we are measuring their blood glucose level which is a
continuous measurement, in other words it is a measure that involves a continuous
scale of values. However, we use a cut-off on this continuous scale to put people in
one category (those with diabetes) or the other (those without diabetes). This way of
grouping results is important to keep in mind when we are talking about the effect of
measurement error on the results.
If we have a continuous measure that we are not grouping with cut-points, then
random error will not have an impact on the average of these measures in the study. If
however we had a systematic error this would result in the average for the sample or
each group being an over or underestimation of the truth.
For example, if we were measuring the BP of 100 people in a study, random error
would mean that we would over-estimate the blood pressure of some, underestimate
the blood pressure of others, but the mean blood pressure of the 100 people in the
study should be the same. But if we had some error in calibration of the instrument,
then the mean blood pressure in the study would be underestimated or overestimated
by this amount.
In contrast, if we are using categories such

as hypertensive vs normotensive, both
random and systematic error can result in
people being put into the wrong category
as a result of the error in measurement.
Someone might actually have a blood

pressure within the normal range but
because of chance or random error they
get a once off higher reading so get put
into the hypertensive category (a false
positive).
Similarly, because of chance some of those with hypertension might get a lower
reading by chance and end up in the normotensive category (a false negative).

Alternatively, there might be a problem with the calibration of the blood pressure cuff
resulting in a systematic overestimation of blood pressure in all participants, meaning
that quite a few of those with normal blood pressure would end up in the hypertension
category.
Misclassification
This incorrect grouping of participants in a study is called misclassification and can
bias the study results. When doing an analytic study that is making comparisons
between groups, the most important thing to assess in terms of misclassification is
whether these errors in misclassification are different or the same between the two
study groups.
Non-differential
When the measurement error that results in misclassification is occurring equally
between the two study groups we are comparing, we call this non-differential
misclassification. For example, if we were doing an RCT looking at the effect of
participating in a particular exercise program on weight loss, if the scales were
incorrectly calibrated so that weight was overestimated by 5kg in all participants, we
would be misclassifying some participants incorrectly as being overweight when they
are not but this would happen equally between those participating in the exercise
program and those not participating.
Non-differential misclassification will still bias the results, but it usually biases the
results towards no effect or in other words it means that you are likely to
underestimate the effect that you are trying to measure. So if the exercise program
actually works, it might look like the exercise program is less effective than it really is.
The exception to this rule is when the non-differential misclassification applies to

confounders. Even if the confounders are measured the same way in all participants
and therefore any measurement error would result in non-differential misclassification,
this type of misclassification in confounders can lead to bias in any direction, or in
other words it can lead to an overestimation or underestimation of the effect.
This is because measurement error in confounders will mean that you do not
completely adjust for the effect of the confounder on the relationship between your
factor of interest and the outcome, and since adjustment for a confounder can either
increase or decrease the size of the effect, under-adjustment for confounding due to
measurement error could bias your result in either direction.
Differential
If misclassification is more likely to occur in one group than the other, this is called
differential misclassification. For example, imagine that those measuring participants’
weight were aware if the participants were in the exercise or control group, this could
influence how measurements were done in the two groups. When measuring the

weight of those in the treatment group, perhaps the measurements might be repeated
more times until a low result is obtained. Particularly if the measure is something that
is subjective such as self-rated pain or function, if the participants, or the researchers
are aware of which group they are in, this could also result in measurement
differences between groups.
Differential misclassification will also bias the results, but it can actually bias the results
in any direction, so it might make an effect look bigger or smaller than it really is.
Because the impact of this kind of misclassification on the results is harder to
determine and can lead to an overestimation as well as an underestimation of the
effect, it is differential misclassification that we are particularly concerned about when
appraising a study for measurement bias. The most common reason for differences in
the amount of measurement error between groups that results in differential
misclassification is lack of blinding. This lack of blinding can be in those taking the
measurements, those interpreting the measurements or the participants themselves.
In an RCT or cohort study, it is important that the measurement of the outcome is

measured blind to the intervention or risk factor of interest. In a case-control study on
the other hand, since we start off with the outcome, ideally we would like to measure
the exposure blind to outcome status. Blinding is particularly important when what we
are measuring is subjective such as level of pain or quality of life vs objective
measures such as mortality.

Confounding and Effect Modification
In this section you learn about the concept of confounding and how it can bias results,
the criteria for confounding and how confounding can be reduced. You will also learn
about another important concept known as effect modification.
Another important type of bias that can occur in clinical research is confounding.
Confounding is something that occurs when the risk factor or exposure we are
interested in is associated with, or travels together with, some other factor that is also
associated with the outcome of interest. This results in a confusion or distortion of the
effect of the risk factor on the outcome.
For example, imagine we wanted to compare the incidence of cancer between people
taking anti-oxidant vitamins and those not taking anti-oxidant vitamins in a cohort
study. We do the study and find that the rate of cancer is much lower in those taking
the vitamins.
We note however that people taking the vitamins are much less likely to be smokers
than those not taking the vitamins. So if we found that the vitamin takers had a lower
incidence of cancer can we be sure that the vitamins have lowered the risk of cancer?
Or is it just that this group has a lower risk of cancer because they are less likely to
smoke? In other words, we want to know if the observed association between the
vitamins and cancer is being confounded by smoking.
How do we know if something is a confounder?
In practical terms we choose potential confounders based on if we think that it is

something that is likely to have an effect on the outcome from prior research or some
hypothesised mechanism of action – like the smoking and cancer example. We then
test whether adjusting for this confounder in the statistical analysis changes the results
by an important amount (usually by about 10-20%). If yes – then it is a confounder, if
no, we don’t need to worry about it.
In principle, to be a confounder, a factor has to be associated both with the risk

factor of choice and the outcome. In the example, smoking was less common in
vitamin takers and more common in those not taking vitamins, meaning that smoking
was associated with the use or non-use of vitamins or that smoking was not equally
distributed between the two groups. We also know already that smoking is associated
with cancer. So smoking meets these two criteria of confounding.

The other important principle/criterion when deciding whether or not something can be
a confounder or not, is to think about whether the potential confounding factor fits into
the causal pathway between the risk factor and the outcome or in other words if it is
what is known as an intervening variable.
Imagine now that we are doing another cohort study investigating the relationship
between obesity and myocardial infarction. We have a group of people in the
overweight/or obese weight range and a group in the normal weight range and we are
following them over time to compare the rate of heart attack between the two groups.
At the start of the study participants have a variety of measurements taken including
blood pressure and it is found that the BPs are much higher in the overweight/obese
group than in the normal weight group. This means that BP is associated with the risk
factor and we also know that high BP is associated with an increased risk of heart
attack. Does this mean that we should adjust for blood pressure as a confounder?
It is likely that part of the effect of obesity on myocardial infarction is acting through the
effect that obesity has on raising blood pressure.
In other words, blood pressure is on the causal pathway between obesity and MI so it
doesn’t make sense to adjust for the effect of blood pressure, as if we did this we
would just be removing some of the real effect of obesity.
Looking at the three criteria for confounding for this example, blood pressure would
meet the first two but not the last one.

Methods of reducing bias due to confounding
We can try to reduce the impact of bias from confounding on the results in a number of
different ways, either in the design or the analysis stages of the study.
Design stage
Techniques include:
 Randomisation
 Restriction
 Matching
Techniques at the design stage include doing a randomised controlled trial and this is
really the best approach to reducing confounding if this study type is possible.
When we do an RCT, the process of randomisation distributes all potential

confounders equally between the two groups. Remember that the first criterion to be a
confounder was that the factor had to be associated with the risk factor, exposure or
intervention we are looking at so, in other words, the confounder has to be unequally
distributed between the two comparison groups. Randomisation prevents this unequal
distribution both of known confounders that we might be able to measure and
unknown confounders that we wouldn’t be able to measure or adjust for in the
analysis.
Two other approaches in the design stage are restriction and matching. In restriction,
we restrict the sample to a particular subgroup of the population, for example if we
thought there might be confounding by gender we might restrict the study to women
only. However, this would mean that we wouldn’t be able to apply the results to men,
nor would we be able to examine the effect of gender on the outcome, so this
approach is not commonly used. We can also only use restriction for either one or a
very small number of characteristics so it can’t be used to deal with all of the
confounding in a study.
In matching we pair participants according to specific characteristics, for example for

each male aged 50-60 in the exposed group we would want a male aged 50-60 in the
non-exposed group, meaning that we have matched for both sex and age. For
practical reasons, it is only possible to match for a few characteristics only so this
technique is also unlikely to be able to control for all confounding present in a study.
Analysis stage
Techniques include:
 Multivariate analysis
 Stratification
 Standardisation
The other approach for reducing bias from confounding in a study is to adjust for
confounding in the analysis stage. The most common way to do this is by using

statistical software to conduct multivariate analysis. Two other techniques that are
usually used for adjusting for one confounder at a time are stratification and
standardisation. In stratification, participants are divided into different strata or
subgroups according to the presence or absence of the particular confounder. Then
the relationship between the risk factor exposure and outcome is examined separately
in each stratum. For example we might divide people into different age groups and
then compare results according to our particular risk factor of interest within each age
group. In standardisation, the rates of disease in each particular stratum in the study
group are applied to a different standard population to give a new overall rate of
disease. This is commonly used in large mortality and disease datasets such as those
from the Australian Bureau of Statistics (ABS). Standardisation is most often used to
adjust for differences in age between populations that might confound the difference in
disease rates seen between the populations. For example, if we are comparing the
health of Indigenous Australians to non-Indigenous Australians, this comparison will be
confounded by the younger age distribution of Indigenous Australians. We could adjust
for this confounding effect by standardising the age of Indigenous Australians to non-
Indigenous Australians or by standardising the ages of both populations to another
population. The problem with techniques like stratification and standardisation is that
they are not very useful approaches when you want to adjust for multiple confounders.

Effect Modification
Another concept that is often explained together with confounding but in fact is not a
type of bias is effect modification. Effect modification is where the risk factor or
intervention acts differently in one group compared to another. This concept is
sometimes referred to as heterogeneity or differences in effects between sub-groups
and is an important concept in systematic reviews.
Here are some examples of effect modification of risk factors for disease and
interventions against disease.
- Sun, or more accurately UV exposure, increases the risk of melanoma, but the
risk of disease with the same exposure would be higher in those with fair skin
compared to those with darker skin. So the effect of UV exposure is being
modified by skin type.
- Taking NSAIDs can cause the side effect of gastrointestinal (GI) bleeding, but
the risk of developing this side effect is higher in those with a past history of
peptic ulcer disease than those without. So the risk of GI bleeding is modified
by a +ve past history for peptic ulcer disease.
- Being vaccinated with the measles vaccine is an effective way of preventing

measles. However, vaccination is more likely to result in seroconversion and
therefore effectively prevent the development of this disease if children are
vaccinated at 12 or 15 months of age compared to 9 months of age. So the
effectiveness of this intervention is being modified by age.
Effect modification is not something that we want to adjust for like confounding as it is
not a bias. Effect modification often tells us something about the biological process of
the disease or intervention we are interested in. Whenever effect modification is found,
it is not appropriate to combine the estimates of the different subgroups to give an
overall effect as this would be misleading. Rather, the effects in the different
subgroups should be presented separately. Effect modification is an important concept
to consider when applying results to individual patients and so this will be discussed in
more detail in the later module on applicability.

FAQs
What is the difference between accuracy and precision?

Accuracy is how close a measurement is to the true value. Inaccuracy occurs when a
systematic error exists that distorts the effect toward a particular direction. Bias, such
as that due to confounding, impacts on the accuracy of the results or how close the
results are to the truth.
In contrast, precision has nothing to do with bias. Precision is how close two or more
measurements are to each other and is determined by random error. Confidence
intervals are a measure of the precision of the estimates.
What is the difference between confounders and effect modifiers?

Confounding biases or distorts the results, whereas effect modification is not a type of
bias. A risk factor or intervention acts differently according to the presence or absence
of effect modification. This provides important information about the biological process
of the disease and/or the intervention. Confounding can be dealt with by randomisation
or statistical adjustment. In contrast, effect modification means that the results in
different subgroups should be presented separately. It cannot and should not be
adjusted for!
For example, smoking confounds the association between alcohol consumption and
heart attack. Smoking is a risk factor for heart attack. And alcohol consumption may
appear to be associated with an increased risk of heart attack because smokers tend
to drink more than non-smokers (the high alcohol consumption group may include a
higher proportion of smokers than the low alcohol consumption group).
In contrast, smoking modifies the effect of asbestos exposure on mesothelioma.

Asbestos exposure increases the risk of mesothelioma, but the risk of mesothelioma
would be higher in smokers compared to non-smokers. So the carcinogenic effect of
asbestos exposure is being increased by smoking exposure.

CEPI5100 - M4 - Course Notes

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

CEPI5100 - M4 - Course Notes

Uploaded by

Copyright:

Available Formats

Module 4

CEPI5100: Module 4 - Bias Page 1

Copyright Regulations 1969

This material has been reproduced and communicated to

you by or on behalf of the University of Sydney

pursuant to Part VB of the Copyright Act 1968 (the Act).

The material in this communication may be subject

to copyright under the Act. Any further reproduction or

communication of this material by you may be the

subject of copyright protection under the Act.

Do not remove this notice.

© Sydney School of Public Health, University of Sydney, 2017

CEPI5100: Module 4 - Bias Page 2

CEPI5100: Module 4 - Bias Page 3

This might be because of some biological

Precise Less Precise There might be particular situations where

CEPI5100: Module 4 - Bias Page 4

Let’s see how this compares using the target

CEPI5100: Module 4 - Bias Page 5

CEPI5100: Module 4 - Bias Page 6

CEPI5100: Module 4 - Bias Page 7

The actual prevalence of arthritis is

These volunteers may be quite different

Let’s look at a randomised controlled trial as an example. In an RCT, after we have

CEPI5100: Module 4 - Bias Page 8

Internal and External Validity

CEPI5100: Module 4 - Bias Page 9

Kappa (K) is a common way of reporting reliability/agreement in clinical research. It is

CEPI5100: Module 4 - Bias Page 10

Kappa = Observed agreement – agreement due to chance

Value of K Strength of agreement

< 0.20 Poor

0.21 - 0.40 Fair

0.41 - 0.60 Moderate

0.61 - 0.80 Good

0.81 - 1.00 Very good

In contrast to random measurement error that leads to variability in measurements

CEPI5100: Module 4 - Bias Page 11

- For example, if we wanted to measure something complex like

- This idea of measuring all of the different dimensions of a particular

- This concept is known as criterion validity. When we talk about the

3. If there is no gold standard available to use or to compare to

- Does the measure predict outcomes that we would be expect to be

CEPI5100: Module 4 - Bias Page 12

In contrast, if we are using categories such

Someone might actually have a blood

CEPI5100: Module 4 - Bias Page 13

The exception to this rule is when the non-differential misclassification applies to

CEPI5100: Module 4 - Bias Page 14

In an RCT or cohort study, it is important that the measurement of the outcome is

CEPI5100: Module 4 - Bias Page 15

How do we know if something is a confounder?

In practical terms we choose potential confounders based on if we think that it is

In principle, to be a confounder, a factor has to be associated both with the risk

CEPI5100: Module 4 - Bias Page 16

CEPI5100: Module 4 - Bias Page 17

When we do an RCT, the process of randomisation distributes all potential

In matching we pair participants according to specific characteristics, for example for

CEPI5100: Module 4 - Bias Page 18

CEPI5100: Module 4 - Bias Page 19

- Being vaccinated with the measles vaccine is an effective way of preventing

CEPI5100: Module 4 - Bias Page 20