You are on page 1of 38

Measures of Disease Occurrence

We will be discussing measures of disease occurrence in this module. These include


terms that are used to describe who, what, when and where in the population with regards to
disease or different kinds of health outcomes. After you have reviewed all of the week two
lectures you should be able to complete these learning objectives. They include differentiating
the following measures, prevalence, risk, rate and odds. You'll also learn how to calculate each
of these measure the prevalence risks rates and odds. Then you'll also learn how to define the
concept of person-time and be able to apply it to calculations of rates. You'll also be able to
choose the measure of frequency most appropriate for a given situation. Then you'll also be able
to interpret the prevalence, odds, risks and rates within the context of public health research. So
epidemiologists study diseases in the population. Here is an example of an epidemiologist
collecting data with in-person interviews. In addition to diseases, epidemiologists also study
health outcomes in the population. What do we mean by health outcomes? It's a broad term.
Health outcomes can include diseases, illnesses, conditions, disorders, symptoms, behaviors, risk
factors and injuries. Health outcomes can also include healthy behaviors such as consumption of
fruits and vegetables or the benefits of moderate physical activity. You may commonly come
across articles in the news that use measures of disease occurrence. However, in this course, in
order to become more comprehensive and inclusive, we will frequently refer to measures of
health outcome occurrences instead of disease occurrence. These occurrence measures can be
applied to many health outcomes and not just diseases. So, in order to describe the distribution of
health outcomes. We first need to define the population at risk and then measure the occurrence
of one or more health outcomes in the population. We need to be able to measure the occurrence
of health outcomes in a population in order to monitor changes and plan interventions. For
example, this map illustrates differences in malaria transmission among various regions of the
world, with parts of Africa and South America having the highest risk. We can use this data to
plan and target malaria interventions. The ability to quantify numbers of peoples with some
health outcome of interest at a given point in time. Or over a period of time is the foundation for
comparing why, where and how health problems affect different populations. We need accurate
statistics on the occurrence of diseases and health outcomes in order to identify new health trends
and to evaluate whether specific health programs are making a difference. Now we will see four
examples illustrating the terms: prevalence, risks, rates and odds. Let's start with the first
measure of health outcome occurrence, prevalence. Malaria is a mosquito-borne infectious
disease of humans and other animals caused by parasitic protozoa. Malaria infects 10% of the
world's population. This statistic about malaria is an example of a prevalence. Human
immunodeficiency virus or HIV is a retrovirus that causes acquired immunodeficiency syndrome
or AIDS. Infection with HIV gradually destroys the immune system which makes it harder for
the body to fight infections. Approximately 50,000 people are newly infected with HIV each
year in the United States. Dividing the number of newly infected people by the at risk population
yields a statistic which is an example of risk. Cardiovascular disease includes diseases of the
blood vessels, heart rhythm problems, heart infections, and heart defects someone is born with.
Cardiovascular diseases are the number one cause of death globally. More people die annually
from cardiovascular disease than from any other cause. The overall rate of death from
cardiovascular disease was 236.1 per 100,000 person-years in 2009 in the United States. This
statistic is an example of a rate. You might have also heard of the term odds before. For example
you might ask the doctor. Doctor what are the odds of having a baby boy? Out of a hundred
births the probability of having a boy is 51%. While the probability of having a girl is 49%. So
the odds of having a boy is 51 to 49. Dividing 51 by 49 you get the odds of 1.04. In this segment
we have introduced the following terms, prevalence, risk, rates, and odds. These measures are
used to describe patterns of health outcomes or disease in the population. These measures are
instrumental in being able to describe patterns of disease distribution, changes in health trends, or
even being able to evaluate specific programs like interventions or teen smoking. And whether
they are effective or not, and make changes in smoking prevalence's for example, in the
population. In the next segments, we'll delve into each of these terms in more detail.

Incident vs Prevalent
Let's talk about incident and prevalent cases. It's important to understand the difference
between incident and prevalent cases so that you can better understand the definitions of
prevalence, risks, and rates. [MUSIC] Once you've completed this module you should be able to
distinguish between prevalent and incident cases, and also to know when to use prevalent versus
incident cases. There is a simple distinction between incident and prevalent cases. Incident cases
are new cases. Incident cases include all individuals who change in status from not having a
disease to having a disease in a specific time period. More generally, incident cases change from
not having that health outcome interest to having the health outcome of interest. Over a specific
period of time. In contrast, prevalent cases are existing cases of disease not just new cases of
disease. Generally prevalent cases include all individuals living with a disease or health outcome
of interest within a specified time frame. Regardless of when that person was diagnosed or
developed the health outcome. You can use incident or new cases of disease or health outcomes
in calculating risks and rates. You use prevalent or existing cases of disease in calculating
prevalence. To understand incident cases versus prevalent cases, imagine a bucket partially filled
with water. Each drop of water represents an existing case of the health outcome or the disease.
These are prevalent cases. If we add more water to the bucket, then these new droplets are the
incident cases. Any water that drains out or leaves the bucket would represent death or recovery.
So, to briefly summarize the difference between incident and prevalent cases. Are that incident
cases are new cases of disease, and prevalent cases include both existing and new cases of
disease. [MUSIC]
Prevalence
In this segment, we're going to discuss the measure of disease occurrence known as
prevalence. Prevalence measures are one of the most common statistics you'll hear in the news.
[MUSIC] The learning objectives for this segment are as follows. To define and calculate the
measure of prevalence and be able to interpret prevalence within the context of public health
research. Prevalence is one of the most common epidemiologic measures you see in our
everyday news. For example, you may see a news headline based on prevalence, such as the
proportion of people in the population that are overweight. The measure prevalence helps us
quantify the proportion of the population with the specific health outcome. For example,
approximately 12% of the world's total population is obese. Prevalence is the proportion of a
defined population that has a particular disease or health outcome of interest. Prevalent cases are
existing cases of disease. These are cases whose disease developed or was diagnosed before they
were identified for the study. Prevalence is useful to quantify the burden of a health outcome or
disease in the population at a given point or period of time. Prevalence can also be useful for
planning health services. 11% of people over 65 and older in the United States have Alzheimer's
disease. This statistic is an example of a prevalence. Here's another example. Malaria is a life-
threatening disease caused by parasites that are transmitted to people through the bites of
infected mosquitoes. Malaria infects 10% of the world's population. This statistic about Malaria
is an example of a prevalence. To understand prevalence, imagine a bucket partially filled with
water. Each drop represents an existing case of the disease or health outcome. The capacity of
the bucket represents the total population at risk for the disease or health outcome. If we look at
the bucket at one point in time, the number of drops or existing cases in the bucket divided by the
total number of drops the bucket could hold is the prevalence. Prevalence is a proportion. The
numerator is the number of people with the disease or health outcome and the denominator is the
number of people in the total study population. Note that people with the disease are also
counted in the denominator. Here's a formula for a prevalence. The numerator is the prevalent
cases, all the existing cases at a given point in time and the denominator is the total study
population. For all measures of disease occurrence, it is important to think about who should be
in the denominator of your calculation. You want the denominator to represent the people who
could have the disease or health outcome in your study population. Think about who would be in
the denominator when calculating the prevalence of pregnancy among women in North Carolina.
The denominator would be women of reproductive age in North Carolina who are able to
become pregnant. Now, think about who would be the denominator when calculating the
prevalence of prostate cancer in the United States? This would be men susceptible to prostate
cancer in the United States. The other important component of the measure of prevalence is the
time period specified. It is important to specify the time point or period over which the
prevalence measure is calculated. For example, a year is a common time period used for
prevalence. Next, we will work through an example of how to calculate prevalence. On
December 1, 2009, all residents in one town were surveyed. So, the total population at this state
was 14,000. The residents were asked if they currently had a cold. 200 of the 14,000 reported
having a cold. What is the prevalence of cold on December 1, 2009, in this population? So,
looking at the formula for prevalence, we note that we need to find the number of prevalent cases
and then divide it by the total number of individuals in the study population. In this example, we
look for the number of prevalent cases of cold. In the town, there were 200 prevalent cases of
cold reported. The denominator in the population of residents is 14,000, thus the prevalence of
cold in the town on December 1st, 2009 was 1.4%. Prevalence is often referred to as a cross-
sectional measure, because it tells us about the number of people with the health outcome at one
slice in time. Now, we will discuss two types of prevalence calculations. Remember, prevalence
can be viewed as a slice through the population at a point in time in which it is determined who
has the disease or health outcome, and who does not. The two types of prevalent measures are
point prevalence, often just called prevalence, and period prevalence. Point prevalence is the
prevalence at one point in time. Period prevalence is the number of people with a health outcome
over a specified period of time, divided by the number of people in the population during that
time period. In this example, it's the summer of 2012. An example of point prevalence is, do you
currently have asthma? In contrast, a period prevalence example would be, have you had asthma
during the past three years? The time period is the past three years. So, you can see here, the only
difference between point and period prevalence is the time period specified. We have seen in the
previous slides how important the time period is when determining a prevalence. In the example
on this slide, we are presented with two prevalence statistics. Think about which prevalent
statistic is of greater concern. What is the time frame? Are we talking about new or old cases? Is
all the needed information provided? Maybe, the Alamance County cases occurred over the past
century. These cases may be anyone who ever had rabies over the past 100 years. So, the concern
for today may be minimal. What if the global measure is 10 new cases of rabies in a day? This
number isn't really high, but we would be more concerned about new cases of disease than a
historical record of existing cases of disease. So, we're comparing apples and oranges in this
example. We need to be specific about what information we're reporting to avoid this confusion
in the future. We will go through another example to illustrate how to calculate the measure of
prevalence. In this example, 500 men with lung cancer are asked to report on their smoking
habits. Of the 500 men, 461 reported smoking at least one pack of cigarettes a day. How do you
calculate the prevalence of smoking among men with lung cancer enrolled in this study? The
numerator is the number of smokers, or 461. The denominator is the total number of men with
lung cancer in the study, or 500. The prevalence is equal to 461 divided by 500, or 0.922, or
92.2%. The interpretation is the prevalence of smoking among these 500 men with lung cancer is
92.2%. Now, we'll give you the opportunity to calculate a prevalence. Now, we've covered the
basic definition of prevalence. The most important thing to remember about the definition of
prevalence is what's in the numerator and what's in the denominator. So, for prevalence, the
numerator is all existing cases of disease. That includes both new and old cases, and in the
denominator is the total population. [MUSIC]
Risk
In this lecture, we are going to cover the definition of risk. An important thing to
remember about a risk is that it's a probability of getting either a disease or a health outcome.
And that it's for a population and not an individual. [MUSIC] [BLANK_AUDIO] The learning
objectives for this segment are as follows. Define and calculate the measure of risk. Choose the
measure of risk for the appropriate situation and interpret risk within the context of public health
research. But, let's first note risk is a common word with a broader meaning. For this course we
will have a very specific definition. The numerator for a risk are incident or new cases. These are
new cases that are identified during the study follow up. The denominator is the study population
at risk of getting the disease or health outcome at the beginning of follow-up. So the formula
looks like this. It's number of incident or new cases divided by the total number of at-risk
individuals. Risk measures the number of new or incident cases of the health outcome that
develop among people in the population at risk over a specified time period. Risk refers to the
probability that a health outcome will occur. Risk can be expressed as a proportion, and ranges
from zero to 100%. You may be wondering why we use the risk as a measure. We use risk as a
measure for several reasons. First, it's fairly easy to calculate and interpret. It also has clear
meaning to clinicians and lay-people, and patients understand basic percentages. So let's go
through an example. Risk can be used to make various health decisions both at a population and
individual level. At an individual level risk can be used to help patients decide whether to accept
a drug intervention. Note there's a caveat that this risk estimate is not specifically for that
individual but for a population. For example, Alendronate medication. A specific inhibitor of
osteo class mediated bone re-absorption, can increase bone mineral density, or BMD and prevent
radio-graphically defined vertebral fractures. This figure is from a study that examined radio-
graphs of women who received Alendronate versus a placebo group. The figure shows the risk of
hip fractures in placebo and treatment groups over time. It shows new cases of hip fractures
developing during the 36 month time period. In the three year study, fractures of the hip occurred
in 22 or 2.2% percent of 1005 patients on the placebo, and 11, or 1.1% of 1,022 patients on the
alendronate sodium, which is commonly known as Fosamax. The figure displays the risk of hip
fractures in this study. When calculating risk, we generally assume that the entire population at
risk at the beginning of the study period has been followed to determine who develops the health
outcome of interest. A closed study population means no new individuals are entering the study
once the study has started. So no individuals could join halfway through the study for example.
In a closed study population, people in the study do not leave or enter the population due to birth,
death, migration, loss to follow up, etcetera. Next we will discuss how a risk is actually
calculated. In order to calculate risk we must define our study population, or the population at
risk. And then we determine the number of incidents or new cases of the health outcome or
disease. Next, we specify the time period. So this is what the formula looks like. Risk is the
number of new cases of the health outcome divided by the population at risk during a specified
time period. So let's try an example calculation. Here we have 11,000 people that were in an area
around a large nuclear power plant. And they were followed for 7 years. Or until the
development of any cancer of the blood. 30 cases were identified over the 7 year period. What is
the risk of developing the outcome of interest? So in this case, we take the numerator as 30 new
cases, and we divided by the population at risk which is given to us as 11,000. Then we calculate
the risk as 0.0027 or 0.27%. Then we can convert this to a more easily understood statistic. By
multiplying by 1,000 so then the risk is interpreted as 2.7 people per one 1,000 over a 7 year time
period. So next we'll have a short in-video quiz so you can try calculating a risk yourself. This
concludes our segment on risk which included both the definition, showed you some calculation
and also an example of how to, how is risk is used in the population.

Rates
Welcome, in this segment, we're going to talk about the measure of rates. [MUSIC] The
learning objectives we will cover in this segment include. Defining and calculating the measure
of rates. Define the concept of person-time and be able to apply it to calculations of rates. And
lastly, to interpret rates within the context of public health research. A rate measure measures the
occurrence of new cases of a health outcome in a population. A rate is not a proportion because
the denominator is not fixed. Instead, a rate accounts well for the realistic situation in which a
population is dynamic and changing over time. Populations at risk change due to changes such as
births, deaths, and migration. In a study population, a person can decide to no longer participate
in a study. Thus, some people may be lost to follow up during the course of study. A rate takes
into account the sum of time, called person time, that each person remains at risk for that disease
or health outcome under study observation. In our previous lectures or segments, we have
already learned about prevalence and risks. Now we will discuss why a rate is a preferred
measure to use. There are important advantages to using a rate rather than a risk or a prevalence
measure. Rates are more flexible, more exact, and capture the reality of often having a dynamic
changing population. Rates can also be used to study repeated events, where a person can
develop the health outcome. They no longer have the disease or health outcome for a period of
time, and then develop the same disease or health outcome again. The reason we don't use them
all the time is that rate data can be more costly and challenging to collect. In order to calculate a
rate, these are the following steps we use. First, we must define our study population. Then
determine the number of news cases of the disease or health outcome. And then finally, specify
our denominator, which is the person time at risk. The formula for rate is as follows. The rate is
the number of new or incident cases divided by person-time. Now let's discuss person-time in
more detail. In order to understand how to calculate a rate, you will need to understand the
concept of person-time. Person-time is the sum of time that each person remains at risk for the
disease or health outcome and under study observation. Person-time may be expressed in units of
person-years, person-months, person-days, or some other scale. A person in the study can stop
contributing person-time for a variety of reasons. Such as death, leaving the study, moving to a
different country. Or the person develops the disease or health outcome during the study. Or the
researcher is unable to follow-up with them. Or the researcher cannot locate the person. The use
of person-time as opposed to just time enables you to handle situations, in which people die or
migrate out of the study population. Or where there are drop outs in a study, and where you have
not been able to follow your entire study population at risk to watch for the development of the
disease under investigation. Thus, the follow-up period does not have to be uniform for all
participants. That's an important point to remember. Person-time for a group is the sum of the
times of follow-up for each participant in that group. Now we'll show you how to actually
calculate person-time. Here is a simple example of calculating person-time. Each of the
horizontal lines represents the person-time experienced by one person. Note that there are five
persons depicted here, subjects 1 through 5. Each notch represents one year of completed
observation. So for example, subjects 1, 2, 3, and 5 have lines that start at year 1, indicating that
they have completed year 1. In this depiction, an X represents death. D represents the disease or
health outcome of interest. And L represents lost to follow-up. A subject's person-time is the
amount of time they are at risk. So events like death or X, developing the outcome of interest or
D, and lost to follow up, L, mean that the person is no longer at risk for the following time
period. We will add up the total person-time for the subjects in this study. We will follow each
subject's person timeline across horizontally to count up each person's person-time contributed.
Subject 1 contributed four years of person-time before dying. Subject 2 contributed eight years
of person-time before the end of this observation period. Subject 3 contributed an initial four
years of person-time, then had a gap when they were not under observation, and then contributed
one more year of person-time before getting the disease under study. So all in all, subject 3
contributes five person years. Subject 4 contributed five years of person-time. Subject 5
contributed six years of person-time before becoming lost to follow-up. If we sum all of this
person-time, we get 28 total person years. We can now find the rate over this eight year time
period. Since only one subject developed the disease under study, our rate is one case per 28
person years. We often rewrite a rate to refer to more standard number of person years, such as
100 or 1000 person years. So the rate of one case per 28 person years is equivalent to 3.5 cases
per 100 person years. Note that since this graph is on a scale of years, we can very easily
calculate the amount of person years contributed by each subject. Now we'll give you the
opportunity to calculate person-time in this video quiz. Now that you understand how to
calculate person-time, let's use this information to calculate a rate. Remember, in order to
calculate a rate, we must define our steady population, determine the number of new cases of the
health outcome, and specify our denominator, which is person-time at risk. Let's look at this
example of calculating the rate of viral infection among women undergoing cancer treatment at
several large medical centers. We have 5,031 female cancer patients. Among these women, they
contributed 128,557 person-days of observation. Among the group, 609 patients developed a
viral infection while in the hospital or within 48 hours of discharge. So let's answer the question,
what is the rate of viral infection among this population? Let's start with the numerator. This is
609. The denominator is 128,557 person-days. When we do the division, the rate equals 0.0047.
We can then convert this to more easily interpretable statistic to get 4.7 cases per 1000 person-
days. Now let's give you the opportunity to calculate a rate. This concludes the segment on the
measure of health outcome occurrence known as rates. In this segment we have learned how to
define and calculate rates. We've also defined the concept of person-time. And you learned how
to interpret rates within the context of public health research. [MUSIC]

 Odds
In this segment, we are going to talk about the measure of health outcome occurrence or
disease known as odds. [MUSIC] In this segment we will have the following learning objectives.
They include both defining and calculating the measure of odds. Knowing when to choose the
measure of odds for the appropriate situation. And interpreting odds within the context of public
health research. In statistics, we refer to odds as the ratio of the probability that an event, such as
a disease, will occur, to the probability that the event, or disease, will not occur. Odds are
sometimes used in epidemiology to their convenient mathematical properties. We will use p as
the symbol for a probability. The mathematical formula for odds is p divided by the quantity 1
minus p. You may be wondering why we use the odds as a measure, since we already have other
measures to use in epidemiology. Such as prevalence, risk, and rate. Odds are easy to calculate
and interpret. Odds tend to have more meaning to clinicians and lay-people compared to rates.
The use of odds can be used to provide information to patients in clinical settings since odds can
be easily understood. In addition, later in this course, you will learn about why the measure of
odds is important in certain studies, such as case control studies. Sometimes we are not able to
access or collect risk or rate data,and odds data are our only feasible option. Now that you've
been given a definition of odds, let's go through a few examples. Here again is the formula for
odds. And you've been given the probability of an event is 0.20. Then let's calculate the odds
using this formula. So the numerator will be 0.20, and then you will divide it by the quantity 1
minus 0.20, which gives you 0.25. Or you can have the ratio, constructed as the ratio 1:4. Let's
try another example. If the probability of diabetes in a patient is 5%, then the odds of diabetes
are, let's plug 0.05 into our formula. So you get p, or 0.05, divided by 1 minus 0.05, or 0.052632.
To get a more easily understood ratio, you can then divide both sides by 0.05 to get a 1 to 19
ratio. Let's try a third example. Out of 100 births, the probability of having a boy is 51%, while
the probability of having a girl is 49%. So to calculate the odds, you would take p, which is 51,
and divide it by the probability of having a girl, which is 1 minus p, or 49. And this gives you the
odds of 1.04. Now we'll give you the opportunity to try it on your own. This concludes this
segment on the measure of health outcome occurrence or disease occurrence known as odds. So
to summarize what we covered in this segment. We learned how to define and calculate the
measure of odds, choose the measure of odds for the appropriate situation, and interpret the odds
within the context of public health research. [MUSIC]
 Epidemiologic Study Design 1
Welcome. In this module we're going to discuss different types of epidemiologic study
designs. These include both experimental study designs and ob, observational study designs.
Sometimes we need to use observational study designs because there's some types of exposures
that cannot be randomized, it's not ethical. So we'll learn both about experimental and
observational study designs. [MUSIC] In this segment, we will introduce you to experimental
study designs. We are going to cover several different types of experimental study designs, both
randomized control trials, and case crossover studies. After you have reviewed this lecture, you
should be able to complete these learning objectives. They include: describe the differences
between experimental and non-experimental or observational study designs. List two different
units of analysis, explain the purpose of randomization. Characterize randomized control trials
and clinical cross over study designs. Identify the advantages and disadvantages of randomized
control trials. And define the following terms, blinding, equipoise, placebo and intention-to-treat.
If you learn about different study designs and their advantages and disadvantages, you're in a
better position to interpret and evaluate the results from various research designs. The most
important distinction of an experimental study design compared with a non-experimental or
observational study design is the exposure assignment. One example of this would be
randomizing one group of patients to get the new breast cancer chemotherapy drug and the other
group of patients to receive the current standard chemotherapy drug. Recall that exposure is the
intervention. For example, a new drug treatment. And in the experimental study the investigator
usually determines who is exposed and who is not exposed. The exposure or intervention is
randomly allocated to study participants. To test if bed nets reduced infants contracting and
dying from malaria researchers randomized a group of infants to receive bed nets from birth
onward. The control group of infants received bed nets after six months. Researchers found that
the use of the bed nets in infants reduced the rate of both developing malaria and dying from
malaria. In a non-experimental or observational study, the investigator does not assign exposure
status. For example in an observational study on the health effects of living near nuclear reactors
such as Fukushima. The investigator does not assign some people to live near the nuclear reactor
and others to live far away from the reactor. Another example of a non-experimental study would
be to follow a group of diabetes patients over time. Some of whom smoke tobacco and some who
don't. And then look at what their rate of cardiovascular disease is. Examples of experimental
study types include randomized control trials. We will be discussing this in greater detail in this
video. Examples of some non-experimental or observational study designs are case-control,
cohort, ecologic and cross-sectional studies. You will learn more about these designs in other
segments in this MOOC. An important difference between experimental and non-experimental
studies is the randomly assigned exposure. Randomization is important because it minimizes
differences in key characteristics between the group that gets the exposure and the group that
does not get the exposure. But it's not ethical to randomly expose people to serious hazards such
as radiation, toxic chemicals or inadequate health care. Therefore, experimental studies designs
don't work for everything. Instead, researchers use observational studies in these situations. In
experimental studies sub, study subjects are assigned by a formal, usually chance mechanism
between two or more exposures or interventions. Experimental studies are the gold standard for
inferring causality. Commonly, experimental studies provide participants with an exposure, such
as a drug, that may be either therapeutic or preventative. The provided intervention is usually
randomly allocated but not always to study subjects by the researcher. Next we will cover two
types of experimental studies, randomized control trials and clinical crossover trials. In a
randomized control trial, the treatment of interest, such as a new drug, would be randomly
allocated to half of the study subjects. And the other half would receive a placebo or the current
standard of care or medication for the disease. An example of a randomized control trial would
be a study comparing two different treatments for arthritis. Subjects would be randomized to one
of the two arthritis treatments. In a clinical crossovers trial, subjects switch from one treatment to
another after a certain period of time. They, quote/unquote, crossover to the other treatment or
exposure. We'll go into more depth on this type of study later in this module. Experimental
studies commonly provide participants with exposure such as a drug that may be therapeutic or
preventative as I said earlier. There are both individual and community experimental studies.
And example of an individual experimental study is where one group receives an experimental
drug aimed at preventing Alzheimer's disease in an at risk group while another at risk group
receives a placebo drug. Experimental studies can also be targeted to communities rather than to
individuals. An example of a community intervention study is a colon cancer screening program
that was implemented in nine counties and seven control counties who did not receive the
screening program. So this relates to unit of analysis. Let's talk about unit of analyses in these
studies. Let's remember that there are individual and community-level experimental studies. In
an individual-level experimental study some study participants are assigned an exposure. And
the remaining participants are assigned to be unexposed or exposed to a different factor. In a
community level experimental study one or more communities is assigned to an exposure. And
one or more other communities are assigned to be unexposed, or exposed to a different factor.
Thus the unit of analysis for an individual experimental study is the individual, whereas for the
community level study the unit of analysis is the community.

Epidemiological Study Design 2


Randomized control trials are often used to test new drugs or medical treatments. The key
feature is that the drug or medical treatment is randomized. Another example of a randomized
control trial would be a study comparing two different treatments for asthma. Subjects would be
randomized to one of the two asthma treatments. Also, the intervention group and the control
group should be comparable in all aspects except the intervention itself. For example, the
proportion female and the age distribution would be very similar in both the intervention and the
control groups. Randomization provides the strongest evidence for causal inference.
Randomization basically allows us to say if everything else is the same, what is the effect of just
the exposure on the outcome? For example, if assignment of a treatment was not random then
researchers may, might select which patients got which treatment based on patient characteristics
such as severity of illness. This could bias the study results. When randomization is done, a
computer or some other method is used to assign groups to receive one exposure, or treatment, or
another. Randomization is done by chance and helps to reduce or prevent bias in a study. It is the
most important component of the experimental study design. Now let's look at an example of a
randomized control trial. Researchers designed a randomized control trial to answer the research
question: Do U.S. postal service mailmen, who receive a specific sun safety education
intervention subsequently wear wide-brimmed hats and use sunscreen more than U.S. postal
service mailmen who did not receive the sun safety education intervention. The intervention
included six educational sessions, wide brimmed hats, sunscreen and reminders. Workers were
randomized to sun safety promotion or delayed sun safety promotion education. Postal workers
in the intervention group were provided with wide brim hats, sunscreen reminders, and six
educational sessions. The postmen were then followed for two years to assess the outcome. The
researchers found that the postmen receiving the intervention had increase use of sunscreens and
hats compared with the control group. Okay, now let's talk about the key advantages to
performing randomized control trials. First randomization reduces the influence of other
determinants of exposure and outcomes, ie, confounding. This study design provides strong
evidence for causality or casual inference. Since investigators assign the exposure, or medical
treatment, the time, or temporal relationship between exposure and outcome is clear. So while
there are advantages to randomized control trials, it is important to note that there are also
disadvantages. Randomized control trials can be costly. Sometimes RCT's or randomized control
trials have issues with external validity or generalizability. People who participate in RCT's may
be very different from the rest of the population, thus the affects seen in the participants may not
be generalizable to the population at large. Randomized control trials usually focus on a specific,
narrow research question related to a certain treatment or medication and a specific comparison
with another treatment or exposure. In addition, there are ethical considerations when
randomizing treatments or exposures. For example, it would be unethical to randomize people to
be exposed to a known toxic substance, such as water containing high levels of arsenic, mercury,
or lead. So to review, in the randomized control trial, researchers will follow the treatment and
the comparison group to see who develops the health outcome or disease of interest. Now we
will go on to discuss another type of experimental study design, known as the crossover clinical
trial, in more depth. In a clinical crossover study design, subjects switch from one treatment to
another after certain period of time. They crossover to the other treatment or exposure, as we
don't want the effect of the first treatment to carry over when a person switches to another
treatment, there is usually a period in between the two exposures called a wash-out period, where
no treatment or exposure is given. The order that the exposure or treatment would be given is
randomized but the same participants are involved in each part of the trial. Note that if a person
changes in some meaningful way over time for example if a women were to get pregnant during
the study this may effect the study results. For this reason, shorter intervention effects are
preferred for studies in crossover clinical trials. Let's look now at an example of crossover
clinical trial. In this study, researchers wanted to compare the effect aged garlic has on blood
lipids. Study participants were men ages 32 to 68 with moderately high cholesterol. The men in
the study were randomized to take a dietary supplement containing either garlic or a placebo for
a six month period. Blood tests were recorded, then each participant switched to the other
supplement for a time, after which blood tests were recorded again. The test results showed that
garlic supplements appeared to reduce cholesterol as well as blood pressure.

Epidemiological Study Design 3


So now let's talk about blinding, in the context of experimental study designs. Blinding is
a technique used in experimental studies to conceal certain information from different people
involved with an experimental study. Why would you want to conceal information? Knowing
certain information could lead to biases in the study. For example, you might want to conceal the
information from the researcher. This person might be very invested in a cancer related
medication working, and the results may show that the medication doesn't work. Or the subject.
If the subject believes that they got the medication, and think that the medication is working.
They may report their symptoms, activity levels, or level of depression differently than if they
didn't know whether they got the medication or not. Or you might want to blind, both the
researcher and the subject and until the research is complete. For the purpose of this course, let's
consider four different types of blinding in studies. These include; non-blinded, where every
knows who received the interventions, both the subjects themselves and the researcher and the
statistical analyst. A single blinded study where one category of the person is blinded, either the
subject or the researcher or the analyst. Or you could have a double-blinded study where both the
tester and the subject are blinded. Or you could have a triple-blinded study where everyone is
blinded including the subject, the researcher, and the statistical analyst. Phew, that's a lot of
blinding right? If this topic is of interest to you check out the related resources for this video or a
paper by Schultz and Grimes. It gives a great description of why we do blinding and all the types
of blinding. Finally, we should discuss some other important considerations that apply to
experimental studies. This may be a new vocabulary for you, but it's important to know if you
pursue your study in epidemiology, and we hope you will. The first term is equipoise. Which
refers to a genuine uncertainty about the benefits or harms of a possible treatment or exposure. If
we are sure that one treatment is much better than another, then we should not be randomizing
subjects to a treatment that is known to be inferior. Placebos are a sham treatment. That appear
identical to the real treatment but lack that treatments active agent. Placebos are used in order for
the different groups to not realize if they are exposed or unexposed which could affect the
participants behavior or health outcome. Compliance and adherence refers to whether or not
participants follow the treatment, medications, or recommendations as some participants may not
stick with their assigned treatment or exposure. And lastly, intention to treat analysis refers to
when subject are analyzed according to their randomized treatment, regardless of whether they
actually got or took their treatment. This concludes our segment on experimental study designs.
The most important thing to remember is that randomization of exposure is a key component of
an experimental study design.

Cohort
Cohort studies are a type of observational study. We would like you to learn this study
design particularly well. The cohort study provides the foundation for understanding other types
of observational study designs. With the cohort study design, researchers follow an at-risk study
population over time, and evaluate exposures over time, and determine the subsequent risk or
rate of disease or health outcome. In this segment, we will cover the following learning
objectives. First, to explain the definition of a cohort. Next, to recognize which measures of
disease or health outcome occurrence are used in a cohort study design. You will also learn how
to distinguish between open and closed cohorts, as well as prospective and retrospective cohorts.
And lastly, you'll be able to list the advantages and disadvantages of cohort studies. Historically,
the word cohort was used to describe a subunit of a Roman legion of soldiers. I mention this
historic reference as it might, may help you remember the image of a group of people marching
through time. A cohort is typically defined as a group of persons sharing a common
characteristic. Epidemiologists may define their cohort by any number of shared factors. Some
examples of a common characteristic are geographic location, occupation, socioeconomic status,
age, gender, or race or ethnicity. A famous example of a cohort study was initiated in the early
1950s by Richard Doll and Sir Austin Bradford Hill. Hill and Bradford began a 20-year study
following a cohort of British physicians. The common characteristic defining this cohort was that
they were male doctors whose names were in the 1951 British Medical Register. Thus, the
characteristics used to assemble the cohort were occupation, physicians, and geographic location,
Britain. At the outset of the investigation, the relevant demographic exposure or other factors
were determined for each subject. The main baseline exposure of interest in this study was
smoking. Specifically, tobacco cigarette smoking. Smoking was determined by questionnaires at
baseline. The study subjects were asked if they were current smokers, past smokers, or had never
smoked before. The outcome of interest was mortality or death. The mortality data was obtained
from the Register's General of the United Kingdom and complimented with records from the
British Medical Association. In the United States, you may have heard of cohorts such as the
Women's Health Initiative. This cohort was comprised of 93,676 post menopausal women
between the ages of 50 and 79, and followed for approximately eight years. Other cohorts
include the National Children's Study, a cohort of pregnant women in the United States. Or the
Framingham Heart Study. What do all these cohorts have in common? They were assembled
with the common characteristics, such as age and gender in mind. These cohorts then had key
exposure characteristics assessed at baseline. And then the people participating were followed
over time. The basic design of a cohort's study ideally begins with a well designated source
population. Either study does a whole or a randomly selected sample. For example, all
physicians included in the 1951 British Medical Register. Or, a random sample of male
physicians whose names were included in the 1951 British Medical Register. This population, or
random sample, is then assessed, and subjects are removed who already have the outcome or
disease of interest, such as lung cancer or cardiovascular disease. Or who don't meet whatever
predesignated inclusion criteria that investigators have decided upon. The goal here is that the
eligible study population both accurately and efficiently represent the source population. For
example, in a cohort study of prostate cancer, you would not likely want to include women. One
criteria of our source population for this example would therefore be gender. So let's go over the
cohort study basics. Cohort studies track participants over time. The subjects in a cohort study
are selected to be free of the outcome of interest at the study onset. So it is clear that the
exposure precedes the outcome. The exposure of interest is measured in all subjects at baseline,
and/or at regular time points dur, during the course of the study. Once the cohort is assembled
and baseline exposures are measured, then study subjects are followed over time. The occurrence
of this specific disease or how the outcome of interest is followed closely. New outcome events,
such as incident cases of disease, death or a health status change accounted for all measures of
the cohort throughout the follow-up period. New cases of the outcome are used to calculate
whatever measures of incidents are relevant to the study, usually a risk or a rate. In the cohort of
British physicians we mentioned earlier, the main exposure of interest in the study was smoking.
It was determined by questionnaire at baseline. The study subjects were asked if they were
current smokers, past smokers, or had never smoked before. The outcome of interest was
mortality or death. And investigator may select a cohort specifically to study certain uncommon
or rare exposures. In some cohort studies, population groups with known exposures to a
suspected hazardous substance or environment are first identified and recruited for study. And
then another population or group without that exposure is identified. And the risk or rate of that
outcome over time is compared in the two groups. For this reason, cohort studies can be
particularly useful for studying uncommon or rare exposures, because usually it's possible to
identify and assemble groups of persons who have that uncommon exposure. Other studies may
create categories, such as amalgam of risk factors for the disease or outcome under study. The
famous Framingham Cardiovascular Cohort Study provided much of the evidence for what is
known today regarding the risks of heart disease. The study subjects were initially categorized
according to suspected risk factors, creating risk groups for comparison. These risk groups were
then followed for 20 to 30 years. The development of cardiovascular disease among the various
risk groups was then compared with statistical analysis. A cohort study of US Air Force veterans
from the Vietnam War was set up to examine the effects of exposure to Agent Orange, a
defoliant dropped by planes during the campaign. This group of veterans were compared to air
force pilots active at the time with no involvement in the Agent Orange campaign. Attempting to
conduct this study in the general population would have not been possible as exposure to Agent
Orange is too rare. A common measure of health outcome occurrence in a cohort study is a risk
or a rate. Since cohort studies are chosen to be free of the outcome of interest at the outset, only
new health outcome events, such as diseases, behavior changes, injuries, or even improvement in
health status are considered. Note that some cohort studies of diseased persons have been
conducted. For example, persons with arthritis. The outcome of these studies is not the
development of the disease, but rather the consequences of having the disease, such as
development of heart disease in persons with different types of arthritis. Or mortality differences
between people with different types of arthritis. Or quality of life. In these cases, one could
consider the disease type the, the exposure. For risk, the total number of disease-free persons in
the cohort is the denominator. For a rate, we only count person time at risk in the denominator.
This is estimated by calculating the amount of time each person contributes to the study free of
disease. When a subject develops the outcome of interest or disease, he or she is no longer
contributing to person time. Similarly, those subjects lost to follow-up do not contribute person
time as the investigator is unable to determine their health outcome status. We will now discuss
the type of study population followed in a cohort study. Cohort study populations can be open or
closed. In an open cohort, individuals are allowed to join the study at any point in time, from the
beginning to the end within limitations. In a closed cohort, the entire cohort is formed at the
beginning of the study and the cohort is closed to new participants. An open study population
collects person-time. An open study population is also less prone to problems with sample size,
because study subjects can contribute person time, even if they are in the study only for a short
time. There are two types of cohort studies, retrospective and prospective. And they are
classified according to their temporal sequence. Retrospective and prospective refer to the time
the investigator initiates the study and starts collecting data. Both designs assemble cohorts on
the basis of exposure first. In the retrospective study, the cohort is formed in the past. The
prospective study starts now and goes into the future. In a prospective study, the investigators
obtain baseline exposure data in real time, and then follow the cohort members during the time
after baseline exposure to measure the occurrence of the health outcome or disease. The
retrospective or historical cohort study is often used to evaluate occupational exposure, such as
cancer and other chronic diseases in workers exposed to potentially hazardous substances. An
example might be deaths from lung cancer among asbestos exposed workers. Retrospective
cohort studies are possible when historical records exist to identify the important baseline
characteristics of study subjects from prior years, i.e., the list of workers employed at an asbestos
mine between 1930 and 1940. The mortality experience of these workers can be traced through
vital statistics and medical records from the baseline years to the present, and then compared
with the similar non asbestos mining cohort, or with the general population. Using the British
Doctors study example again, we have a prospective cohort study. Prospective or concurrent
cohort studies access the baseline exposure in real time, and then cohorts are followed into the
future. In this study, baseline measurements were made in real time of physicians' smoking
habits as of 1951 when they were, received their initial questionnaire. Physicians were followed
over time by a mailed questionnaire sent out every five or ten years. And their mortality was
tracked by extensive records kept on physicians in the United Kingdom. Does smoking exposure
precede mortality? Because exposure status was determined at baseline among living members
of the British Medical Register in the study, we are certain that exposure preceded the outcome.
Now let's talk about the advantages of cohort studies. We will discuss measures of association,
such as risk ratios and risk differences in next week's module. So let's summarize by considering
the advantages and disadvantages of cohort studies. A big advantage of cohort studies is that they
allow direct estimations of risks or rates. Investigators may specifically seek out individuals for
study with an exposure that is not typical among the general population, as you remember, our
Agent Orange example. The ability to assess the effects of rare exposures is an advantage of the
cohort study. Cohort studies can also be useful for assessing multiple outcomes. The various
causes of mortality assessed in the British Physician studies illustrates the ability to assess
multiple outcomes of a single exposure. This study examined all reported causes of mortality as
per the current International List of Causes of Death. The researchers summarized their
conclusions by identifying excess mortality among smokers by cause. A big disadvantage of
cohort studies is that they are expensive. They are also time consuming. Our ability to detect
relatively small differences in risks and rates between exposed and unexposed groups is
primarily influenced by the number of out, health outcomes in each group, rather than the
number of persons in each exposed group. Thus, if there are relatively few persons in an
exposure category, we may need a very long period of follow-up to observe sufficient numbers
of rarer outcomes in order to detect differences across levels of exposure. This accounts for the
considerable cost and time needed to properly conduct a cohort study. If outcomes are very rare,
then the size of the cohort groups may be too large to effectively detect a difference between
study groups. Examples of rare outcomes include certain cancers, such as acute leukemia or a
kidney cancer. Losses to follow-up occur when we can not determine the outcome for some
measures of the cohort during the entire course of follow-up. If losses are greater in the exposed
versus the unexposed group or vice versa, we may obtain a biased estimate of the risk ratio or ra,
rate ratio. This concludes our segment on cohort study designs. Of all the study designs, we
recommend that you learn this one particularly well because it's the basis for learning about all
the other study designs.

Case Control
Welcome. In this segment, we're going to learn about the case control study design. A
good way to remember the case control study design is actually the name itself. With the case
control study design, we start off with cases and controls, and then we look back in time to see
what the exposures were. So let's delve in. After completing this segment, you should be able to
complete the following learning objectives. List the basic characteristics of case-control studies,
and identify the advantages and disadvantages of case-control studies. Let's review the types of
study design briefly before we go forward in discussing case control studies. This is a more in-
depth graphic showing the study design types for both experimental and observational studies.
Now just to remind you, experimental designs include randomized controlled trials and crossover
clinical trials. Basic observational study designs include cohort, case control, cross-sectional, and
ecologic. The choice of study design to address a specific research question will be driven by the
nature of the disease or health outcome being studied, the exposure of interest, and cost, time,
and feasibility issues. Let's now discuss case control studies in more depth. Case control studies
are an efficient and common epidemiologic study design to study rare diseases. The rule of
thumb that we will be using for this MOOC is, rare is defined as a prevalence of less than 10%.
In a case control study, researchers begin by selecting diseased individuals or individuals with a
health outcome of interest. These are known as cases. Researchers also select a group of
individuals without the disease, known as controls. In contrast to the cohort study design we
learned about in the previous segment, in case controlled studies, subjects are selected for study
because they either have the disease of interest i.e., a case, or they do not, i.e., they're the control.
Case control studies proceed logistically from effect, i.e., the disease or health outcome, to the
cause, the exposure of interest. As the researchers look back in time to see what the exposure
was in both the case group and the control group. There are three key steps in conducting a case
control study. Step 1, first, you define and select the cases. Cases are selected from a group that
has the disease or health outcome of interest. The next step is step 2, define and select the
controls. Controls are the non-cases that are representative of the same source population that
gives rise to the cases. Step 3 is then we measure and compare the exposure prevalence in the
controls versus the cases. Let's discuss case selection in more detail. Researchers first determine
the diagnostic criteria they will use to define a case. For example, if studying Rocky Mountain
Spotted Fever, which is a tick-borne disease, the diagnostic definition of the disease should be
clearly specified in order to classify people as cases or controls. In defining cases of Rocky
Mountain Spotted Fever, the diagnostic criteria should include the following symptoms.
Symptoms include fever, headache, nausea, vomiting, and abdominal pain. A very large
proportion of cases also have a rash within two to 14 days of a tick bite. So you could include
that symptom. Or as a researcher, you could decide to not include rash as a symptom, and make
your definition slightly broader. Even though ca, study cases should be representative of all
cases, it is not necessary to enroll every case of the disease in your study. You may end up with a
sample of cases that meet your diagnostic criteria from a specific population, such as a hospital,
clinic, or other resource. The type of case you select is also important. It is better to use incident
cases rather than prevalent cases. Prevalent cases are influenced by the duration of the disease. In
the next slides, we will learn how to select controls for a study. When selecting controls for a
study, researchers may include multiple controls per case or multiple control groups. Multiple
controls per case can be used to help add statistical power when cases are unduly difficult to
obtain. Statistical power refers to the size of your study and your ability to detect an association,
should one exist. Sometimes researchers use more than one control group to see if the
relationships they find are consistent across control groups. Consistency across control groups
gives more credibility to the results. Now that we have discussed selection of cases and controls,
we will consider how to compare exposure prevalence. Recall that the total number of exposed
persons in a case control study is not that same as the total number of exposed persons in the
source population. The same is true about the number of non-exposed in the case control study.
Thus, the denominators obtained in a case control study do not represent the total number of
exposed and non-exposed persons in the source population. The investigators arbitrarily decide
how many controls will be selected to compare with the cases. The consequence of this arbitrary
selection is that we cannot measure risks or rates in a case control study directly, because the
population at risk, the denominator, is not ascertained. Instead, we use a measure called an odds
ratio. The odds ratio is simply the odds of exposure for cases divided by the odds of exposure for
controls. The odds ratio represents the streaks of an association between exposure and outcome.
Let's examine the pros and cons, or advantages and disadvantages, of a cohort study versus a
case control study. Consider a hypothetical study designed to learn whether pesticide exposure
increases the risk of breast cancer. Imagine a prospective cohort of 89,949 women ages 34 to 59
who are avid gardeners. Blood samples are taken from all 89,000 women at the beginning of
follow-up, and then they were frozen. These samples were used to determine pesticide levels
present in the blood. Over eight years of follow-up, 1,439 breast cancer cases were identified.
What would these data look like if we were doing a cohort study? Here's our cohort study data,
which would be great, but well, what's the problem with our cohort study? Quantifying pesticide
levels in the blood is very expensive. It's not practical to analyze all 89,949 blood samples. To be
efficient, analyze blood on all cases, 1,439, but only analyze blood from a small sample of the
women who did not get breast cancer, say 2 times as many cases, or 2,878. Now let's imagine
that these data were used in set for a case control study. Recall that to be efficient, the
researchers should analyze all blood from the cases, N equals 1,439. But just take a sample of the
women who did not get breast cancer, say 2 times as many cases, or N equals 2,878. These data
can be used to estimate the risk ratio or rate ratio, depending on how we sample the controls.
Remember, the investigator selects the study population from the source population, and the
study population is divided into participants and non-participants. The case group and the control
groups are chosen from the study participants. Therefore, this total number of exposed person in
a case control study is not the same as the total number of exposed person in the source
population. The same holds true for the controls or non-exposed in a case control study. Thus,
the denominators it obtained in a case control study do not represent the total number of exposed
and non-exposed persons in the source population. So where do you get the denominator? The
denominator is the number of controls. We can use this information to create a two by two table
to help determine the odds ratio. Let's review when it is best to use a case-control study. Case-
control studies are best when the disease is rare. For example, studying risk factors for birth
defects. Or when exposure data are expensive or difficult to obtain, like our example with the
pesticides, so the lab tests for the pesticides in the blood. Case control studies are also useful to
use when the disease has a long induction and latent period, for example, cancer or
cardiovascular disease. And lastly, these are useful for when, little is known about this disease.
For example, the studies of AIDS, early when, the AIDS epidemic began. Now we will discuss
the underlying source population for our case control study. The case population does not have to
consist of all cases in a potential source population, but can be restricted to a specific age range,
sex, race, or socioeconomic status. For example, the majority of the U.S. may be a source
population for Rocky Mountain Spotted Fever, but for a specific study, the researcher may
restrict the case population to be adults ages 18 to 35 in North Carolina. Population controls can
be obtained by probability sampling of the source population if the latter can be defined.
Probability sampling can be done by sampling from a complete census by random digit dialing,
or by having a roster of all members of the source population i.e., or for example, union
members, members of a professional associations, or voter registration lists if registration is
mandatory, etc. Control should represent the restricted source population from which cases arise,
not all non-cases in the total population. That's an important point to remember. Now, let's move
on to discuss matching in case control studies. We match to make sure that controls in cases are
similar in variables, which may be related to the outcome we are studying. Matching means that
for every case, there is at least one control who has the same or similar values of the matching
variable. Matching may be by sex, age, race or ethnic group, etc. Sometimes there is more than
one control per case. Matching should be limited to one or more important and, and strong risk
factors. Otherwise, it will be difficult to obtain matches for case. Weak risk factors are not worth
considering for matching. They can be easily evaluated if they are simply measured and
considered in the statistical analysis. Matching on a variable prevents evaluating the effects of
the match variable, since this variable would be equal or similar between cases in controls. This
concludes the segment on case-control studies. The important thing to remember about a case-
control study, that if it's done appropriately with the right kind of sampling, the information that
we determine from a case-control study can really mirror what can be found from a cohort study,
but with a great deal less cost and a lot less time. So advantages include that, one, is the most
efficient design for rare diseases. Case-control studies are good for rare outcomes. It takes less
time. It also uses fewer resources and money, and you can examine multiple exposures. It's likely
to be replicable in other populations. If sampled accurately, odds ratios provide an estimate of
the risk or rate ratios. Disadvantages include that there might be some possible biases in the
selection of the subjects, measurement of the exposure, or the analyses. Also, a case control
study does not provide a direct estimate for the risk or rate ratio. Also, they are not good for rare
exposures. The time sequence between exposure and outcome is uncertain.

Cross-sectional
Welcome. In this segment, we'll be talking about another type of study design called a
cross-sectional study. This is yet another type of observational study design. These are learning
objectives for this segment on cross-sectional studies. There are to list the basic characteristics
and be able to explain the cross-sectional study design and also to learn to identify the
advantages and disadvantages of the cross-sectional study design. Cross sectional studies fall
into the observational study design type. They are one of the four types of observational study
designs we are covering in our course. Like cohort studies, cross sectional studies conceptually
begin with a population base within which the occurrence of disease or health outcome and
sometimes the simultaneous occurrence of the exposure will be studied. For example, the
population could be all individuals currently living in Harari, Ethiopia. Or it could be all children
ages five to six currently attending kindergarten in Seattle, Washington. Or it could be all taxi
drivers currently working in Beijing, China. A key aspect of a cross sectional study is that the
exposure and the outcome are assessed at the same point in time within the specified study
population. If you conducted a cross sectional study in 1990, you would first define your study
population. For example, all adolescents in high schools in the city of Rio de Janeiro, Brazil. You
would then survey all the high school students at school and ask them about their exposures to
traffic pollution. For example, how close they live to heavily used roads. At the same time, you
would also ask them if they currently had asthma, or asthma-like symptoms. In this example, you
obtained the information about both their exposure, traffic-related air pollution and their health
outcome, asthma, at one point in time. This is in contrast to both the cohort study in which you
start with defining the population, measuring the exposure, and then ascertain the disease at a
later date. In this diagram the investigators start in 1970, measure exposure, and then assess the
health outcome in 2007. The cross-sectional study design is also different from the case-control
design. As you might remember, the case-control design starts with selected cases and controls,
and then looks back at exposures in the past. In this diagram, the investigators start in 2005,
selected cases and controls, and look back in the past at exposures in 1970. Here's another
diagram of the cross-sectional study design. You start with defining your source population or a
population base, then you define your study population. And next you sample study participants
from that study population. Among the study participants you asses both exposure and disease or
health outcome statue in the same point in time. An easy way to think of a cross sectional study
design is as a snapshot of an exposure and or health outcome at one point in time. Here's an
example: among all individuals living in the United States, what is the prevalence of type one
diabetes? Let's now apply the cross sectional study designed to the topic of distracted driving by
looking at a study done by Vera Lopez et al in 2013. Using a mobile phone for either talking or
texting while driving a car can lead to traffic accidents. The Vera Lopez study took place in 2011
and 2012 in Mexico. Mexico like many other countries has a public health problem with regards
to high rates of deaths from traffic accidents. Several municipalities have passed laws restricting
mobile phone use by drivers. Researchers selected three cities in Mexico. Their goal was to
measure the prevalence of talking and texting on mobile phones among drivers in the three cities.
A sample of 3% of all intersections with functioning traffic lights in all three cities was randomly
selected. With this systematic sampling methodology, 7,940 drivers and their vehicles were
observed during 2011 to 2012. The overall prevalence of mobile phone use while driving was
10.8%. Now we will discuss the numerator and denominator of the prevalence measure used in
cross sectional studies. Cross-sectional studies are often used to describe the occurrence of a
health outcome or exposure in the population. The measure used to describe this occurrence is
prevalence. For the numerator you include all existing cases of the health outcome or disease in a
population group, ie, prevalent cases. While for the denominator you include all existing persons
in the study population or among study participants, including both prevalent cases and non
cases. For example, among adults, 50 and older living in Dallas, Texas, what is the prevalence of
high blood pressure? The numerator would be all people 50 years or older, with existing high
blood pressure. The denominator would be all people 50 and older living in Dallas, Texas. There
are several ways in which cross sectional studies may be used. Some cross sectional studies
characterize the prevalence of a health outcome or disease in a specified population in a defined
period of time, ie, prevalence. Other cross-sectional studies obtain data on the prevalence of
exposure and the health outcome or disease for the purpose of examining the association of these
two variables. For example, is smoking prevalence among adolescents related to smoking
prevalence of parents. Now, let's look at how to conduct a cross-sectional study. First, a cross-
sectional study begins with a defined study population. From which data on the presence or
absence of the health outcome in individuals are gathered. Second, the researcher ascertains the
prevalence proportion of the health outcome in the study population. The prevalence odds ratio
and prevalence ratio are commonly used measures of association when data are obtained from
cross sectional studies. Some cross-sectional studies ascertain just the prevalence of a health
outcome, while other cross-sectional studies ascertain the prevalence proportion of a health
outcome, among the exposed and unexposed persons. For example, in this diagram, the
prevalence proportion, for both exposed, is a over N1 and non exposed is c over N0. For
example, let's think about the prevalence of type I diabetes in undergraduates. That would be a
divided by N. The prevalence of type 1 diabetes in female undergrads is a divided by N1, and the
prevalence in male undergrads is c over N0. In comparing females and males, one sex is
considered exposed if there is some evidence from the literature of a difference in type 1 diabetes
between sexes. Here's the example with smoking. You may want to answer the question, what is
the proportion of student smokers who have a parent who smokes? On this slide we will
calculate a prevalence proportion. The exposure in this example is, having at least one parent
who smokes, and the outcome is middle school student who smokes. P1 equals a divided by N1.
In this example it is 50 divided by 220 or 22.7%. The interpretation is 22.7% of student smokers
have a parent who also smokes. It is important to recall from a previous lecture that under
steady-state conditions, prevalence equals rate times average duration of the disease or health
outcome. Note that there are some limitations of cross-sectional studies. For instance, the
prevalence is influenced by the rate and duration of the health outcome. For example, persons
who survive longer with a health outcome or disease will be more likely to be counted in the
numerator of a prevalence proportion. Short term survivors are not as likely to be counted, as
they are by definition around for a shorter time. Sometimes there can be issues with interpreting
cross-sectional studies. Antecedent-consequent bias effects cross-sectional studies and case
control studies but not cohort studies. In cohort studies, persons are selected for study because
they're exposed or not exposed while they're still at risk and thus disease free. For example, if
you were investigated diet and arthritis. In a cohort study, we obtain data on diet at base line.
Before any of the study subjects have evidence of arthritis. In a cross-sectional study, we
ascertain dietary patterns at the same time as we obtain data on the presence or absence of
arthritis. Thus, you cannot be sure that the exposure preceded the disease as they are both
ascertained at the same time. So, what are cross-sectional studies used for? Cross-sectional
studies can be used for different purposes. They are widely used to estimate the occurrence of
risk factors or health outcomes in the population. For example, a study to look at the prevalence
of elevated blood lead in toddlers or the prevalence of asthma in children. National examples of
cross-sectional studies of great importance are the decennial census, the Nation Health and
Nutrition Survey or NHANES or the prevalence of HIV positive antibodies in military recruits.
Opinion polls and political polls are basically cross-sectional studies. Surveillance of changes in
smoking habits or of other behavioral risk factors are sequential cross-sectional studies.
Similarly, surveillance of long lasting diseases such as AIDS are cross-sectional. Other cross-
sectional studies obtain data on the prevalence of exposure and the health outcome for the
purpose of comparing, or looking at the relationship among these two variables. One example we
discussed in this segment was the proportion of student smokers who had a parent who smokes.
In this example we calculated the prevalence of the exposure, parents who smoke and that
prevalence of the health outcome, the proportion of student smokers. This concludes the segment
on the cross-sectional study design. Remember, the most important thing to remember about a
cross-sectional study design is that it's a cut or a snapshot at one point in time in which you
measure both the exposure and the health outcome at the same time. [MUSIC]

Ecologic
Welcome. In this segment, we're going to talk about ecologic study designs. The key
point about an ecologic study design to remember, is that either the exposure or the health
outcome, or both are measured at a group level. Let's start and I'll show you what I mean. The
learning objectives for the ecologic study design segment are to list the basic characteristics and
explain the ecologic study design. And also to identify the advantages and disadvantages of the
ecologic study design. Ecologic studies are a type of observational study design. They are one of
the four types of observational study designs that we are covering in our course. Let's first talk
about the unit of measurement. In each of the observational study designs we've covered so far,
i.e, the cohort, case control and cross-sectional study designs Generally, exposure data and health
outcome data are collected from each study participant. There are some exceptions, but they
won't be discussed here. Study designs which collect data at the individual level include cohort,
case control, and cross sectional studies. In contrast, you can also make measurements at a larger
group level. Exposure, and or health outcome data are collected at a group level. Not an
individual level. Generally ecologic studies use a group level of measurement. For example, the
exposure measurement would be yearly average air pollution air concentrations in five different
cities. Sometimes the health outcome occurance, proportions or rates is only know at a group
level. For example, the yearly mortality, or death rate, from chronic lung disease in these same
cities with measured air pollution levels. Let's compare group and individual level data. Group
level data averages the exposure of the group, not individuals. But individual level data provides
information on the exposure of each individual. With group level data, we only know the health
outcome of the group. We don't know the exposure of individuals who became diseased and
those who did not. But with individual level data we are able to link individual exposures to
those who became diseased and those who did not. Linking individual exposures is a critical
difference to note between individual and group level data. This leads us to the ecologic fallacy,
the major limitation of an ecologic study design. An ecologic fallacy is concluding that an
association between the exposure and the health outcome at a group level is true at an individual
level, when this may not be true. The reason for this fallacy is that we do not know the length
between exposure and the health outcome. Among individuals within each group i.e, we don't
know the number of diseases person who were exposed or non-exposed in the high exposure
group nor in the low exposure group. What we find at a group level may not hold true in an
individual level. Let's consider the hypothetical example that air pollution is higher in Los
Angeles than in Denver. But mortality from lung disease is lower in Los Angeles than in Denver.
We might come to the fallacious conclusion that air pollution protects against lung disease
deaths. The explanation might be that persons dying of lung disease in Denver, may have moved
from high air pollution cities. We don't know the cumulative exposures of cases and non-cases in
either city. Consider this example of an ecologic study question. Is the ranking of cities by air
pollution levels associated with the ranking of cities by mortality from cardiovascular disease,
adjusting for differences in average age, percent below poverty level, and occupation? Note that
in this example there are no data at the individual level, allowing us to link individual exposure
to air pollution with outcomes such as cardiovascular disease mortality. Here is another example
of an ecologic study question. Have seat belt laws made a difference in motor vehicle fatality
rates, comparing years before and after laws were passed? Note that again, there are no data at
the individual level allowing us to link individual compliance with seat belt laws to the outcome,
motor vehicle fatalities. Now we will discuss advantages of the ecologic study design. Group
level data on exposure and health outcomes are often publicly available in state and national
databases i.e, census data, mortality and cancer registry. So ecologic studies have lower cost and
are convenient. Ecologic studies are useful for evaluating the impact of community level
interventions for example, fluoridation of water, seat-belt laws, mass media campaigns, etcera.
We can compare outcomes at a community level before and after the intervention. In the United
States and many other countries, data are regularly obtained on air quality, water quality and
weather conditions. The size of the population, the status of the economy and the health of the
population. For example, the US Environmental Protection Agency collects air pollution data at
selected locations all around the country. Using the national air quality monitoring network.
These monitors collect air pollution data at the group level. In contrast, to collect individual level
air pollution exposure data, a person would need to wear an exposure monitor. An example of
group level data on a health outcome would be obesity prevalence among low income preschool
children by state in the United States. State and county obesity prevalences can be mapped to
explore regional variations. Comparing obesity prevalence by state, we see that California and
North Carolina are two states with higher prevalence. If we look at obesity by county, you can
see there is a great deal more variation in obesity prevalence by county. In fact, there are some
counties that have obesity prevalence that is above 20%. But the state average is only 10 to 15%.
Such as in Washington state. If we were planning educational interventions in California and
North Carolina, the county level data will allow us to use our limited public health resources
wisely. And target specific counties. These state and county obesity prevalence's are examples of
group level data that are used to ecologic studies. These publicly available records provide low
cost and convenient ways for researching variation in health outcomes at a group level, with
characteristics of the population, the environment, or the economy, at a group level. Now let's
look an example of an ecologic study conducted on household fire arms, or gun ownership in the
United States and deaths. In this figure, from Freagler et al 2013, we see that by state, the group
level, as household firearm ownership increases, there seems to be an increase in firearm deaths
per 100,000. This is an example of an ecologic study in which both the exposure, household
firearm ownership, and outcome, firearm deaths, are measured at a group level. Another
advantage of an ecologic study is that this study design can maximize exposure differences
between communities, where minimal within community differences render individual risk
studies impractical. Whereas exposures may differ substantially between communities, such as
cities, states, or countries, i.e, effective latitude on the risk of multiple sclerosis. Ecologic studies
are also useful for studying the effects of short-term variations in exposures within the same
community. For example, temperature and mortality. Examples of small exposure differences
within a community, but large between community differences include quality of drinking water,
concentration of certain air pollutants such as ozone and fine particles, average fat content of
diet, larger differences between countries than between individuals within the same city of a
country. Or, cumulative exposure to sunlight where there are larger differences by latitude, north
south of residence then among individuals within the same latitude. Now we will discuss
limitations of the ecologic study design. We have already discussed the ecologic fallacy earlier in
this segment. The ecologic fallacy refers to concluding that association, a, a group or aggregate
level are true at the individual level when they may not be. Another limitation of ecologic studies
is that we cannot be confident that exposure preceded the outcome. Lastly, another limitation of
ecologic studies is that we do not know what happens to individual people. Thus, migration into
and out of communities can bias the interpretation of ecologic studies. This concludes the
segment on ecologic study design. What I'd like you to remember from this segment is that in an
ecologic study, either the exposure or the health outcome, are both, are measured at a group
level. One example to help you remember that, is the example of air pollution that's measured at
a central site location and is used to determine what the exposures are for a population with a, a
ten mile radius. That's an example of an exposure that's measured at a group level. We will end
the lecture on ecologic studies with a practice question. [MUSIC]

Measures of Association
Welcome. In this module, we're going to talk about the measures of association. The most
important things to remember about the measures of association. Are that they could either help
us tell some, tell us something about the strength of the association between an exposure and a
health outcome. Or they can help us quantify the absolute excess of the disease that's related to
the exposure of interest. Just a quick reminder before we get started. Please make sure that you
understand the measures of health outcome occurrence before you go forward with the measures
of association. Those measures of occurrence are the building blocks of the measures of
association. So if you understand those, then it'll be easy to understand the measures of
association. After you have reviewed all of the week four lectures you should be able to do the
items listed in the learning objectives. These include defining the different types of measures of
association including risk ratio, rate ratio, odds ratio and prevalence ratio. And you should also
be able to define risk difference. Rate differences, odds differences and prevalence differences.
In addition you should be able to recognize which measures of disease occurrence and
association are often used with various study designs. And interpret both statistically significant
and not statistically significant measures of association. And their related confidence intervals.
Let's start with a few examples of measures of association. These examples may be similar to
ones you might have heard of in the news. Epidemiologic research on smoking and lung cancer
has found that people who smoke are 15 to 30 times as likely to get lung cancer, or die from lung
cancer, than people who do not smoke. This is an example of a measure of association. We will
learn about how understand and interpret statistics such as this one during this weeks lectures.
Another example of a measure of association is from Malaria researchers who conducted a study
of mosquito nets in Mozambique. One of the measures of association, a rate ratio, the researchers
calculated was 0.16. How is this interpreted? Houses treated with insecticide treated mosquito
nets had a rate of mosquito entry that was 0.16 times. The rate of mosquito entry in house with
treated nets. This ratio can also be interpreted as houses treated with mosquito netting reduced
entry rates of an Offalese Gambia mosquitos by 84%. Thus, the research provides evidence that
the use of insecticide treated mosquito nets, reduces exposure to mosquitoes in the home.
Another example, is injury prevention and motor vehicle safety. An important topic. This slide
includes a measure of association. In this case, an odds ratio from a study done on common
driving distractions. Drivers who are composing or sending a text message had 23 times the odds
of a safety critical event compared with drivers who were not composing or sending a text
message. Let's talk now about the definitions and formulas for measures of association. So, up to
this point in the MOOC we have covered measures of disease occurrence. You may recall these
include prevalence, risks, rates, and odds. Here are the actual formulas for these measures of
disease occurrence. I want to review them with you now because they are the building blocks of
the measures of association. Before we go any further, I'd like to highlight the differences and
similarities among these various measures. Most importantly, risks and rates both use incident or
new cases. Whereas prevalence uses prevalent, i.e existing cases. And odds can use either
incident or prevalent cases. Note the denominators. The denominators for each formula differ
from each other. Risk has the at risk study population, at the start of the study period, while rate
uses person time at risk during the study period. Prevalence uses the entire study population as
the denominator. And for odds, the non-cases form the denominator. Measures of association
compare the measure of disease occurence in two different groups. We compare the measure of
disease occurence in the exposed group. With the measure of disease occurrence in the
unexposed group. Our goal is to see if the disease occurrence is different in the two groups. This
comparison can be made by division, i.e ration of effect measures or by subtractions difference
effect measures. For the division you are comparing relative measures of effect. One key
question for this ratio measures is what group you are comparing relative to which other group?
For example the groups being compared in the ration effect. Are often exposed versus
unexposed. Or one population versus another. For difference measures in which you use
subtraction you are comparing absolute differences in effect. In the following lectures or
segments we will discuss each ratio and difference measure in more detail. The ratio measures
indicate the relative strength of the association between the exposure and a disease or health
outcome compared with the absence of exposure or less exposure. This strength of an association
between a exposure and a health outcome or disease is of greater interest when we are trying to
understand causes of a disease or a health outcome. In contrast, different measures, sometimes
called attributable risk measures, place the magnitude of the association between and exposure
and a health outcome, in a public health prospective. Difference measures tell us whether the
exposure or risk factor is associated with a large number of disease cases or small number of
disease cases. Consider this example. The exposure of smoking has a risk ratio of about 10 for
lung cancer mortality. But a risk ratio of only 1.7 approximately for coronary heart disease
mortality. However the risk difference for coronary heart disease is much higher, 125. Compared
with the risk difference for lung cancer 43.8, why is this? This table shows us the base rates of
death from cardiovascular disease, which are 294.67 in smokers and 169.54 in non-smokers in
the population. Are much higher than the base rates of death from cancer, 49.33 and 4.49 in the
population. The risk differences for smoking and coronary heart disease is also considerably
higher or larger, 125, than the risk difference for smoking and lung cancer. 43.84. The risk
difference can be used to compare the preventative impact of a smoking cessation program on
the rates of coronary heat disease deaths compared with lung cancer deaths. From this table, we
see that smoking cessation program. Would have a bigger impact on rates of coronary heart
disease deaths compared with lung cancer deaths. Before we go any further, we'd like to add
cautionary note on the term relative risk. Please note that the term relative risk is an older.
Commonly used term for any ratio measure of effect that approximates risk. This term is not
precise, and we recommend using more precise and specific terms such as risk ratio, rate ratio, or
odds ratio. However, you may see this term regularly when reading epidemiologic articles or
studies. Here are the equations for risk and rate ratios. Note that the risk ratio is abbreviated RR
and the rate ratio is abbreviated IRR. The I stands for incidence. Here are the equations for odds
and prevalence ratios. Note that the odds ratio is abbreviated OR and the prevalence ratio is
abbreviated PR. Here are the formulas for the risk and rate differences. These formulas express
the risk or rate among exposed in excess of that among the unexposed or less exposed.
Difference measures consider the risk or rate among the unexposed as a background risk or rate.
In other words a risk or rate that occurs in the absence of the exposure of interest. This
background risk or rate may not be a true absolute risk or rate because the unexposed. Are not
necessarily without some risk or rate associated with different population based factors that are
not the focus of the study. Difference measures are sometimes called attributable risk measures.
In our previous example about smoking, we showed that the coronary heart disease mortality rate
in smokers was 125 in excess of the rate among non-smokers. Now lets talk about two by two
tables. Two by two tables are commonly used to teach the concepts of measures of association.
The two refers to two columns and two rows. These tables are also known as contingency tables.
The two by two table starts with a square cordoned into four spaces. With the two-by-two table
we can show exposures as two categories, exposed and non-exposed. And the disease of interest
or health outcome in two categories, usually diseased and non-diseased. For the minook our two-
by-two convention will be disease on the top and exposure on the left-hand side. However, you
may find it the other way around in various textbooks or in the published literature. Each of these
sub squares is labeled with a letter, A, B, C, and D. We'll present the formulas for measures of
association in this context. Let's start with Risk. We can simplify the table by using E for
exposed and E with a line above it for non-exposed. And D for diseased and D with a line above
it for non-diseased. Now applying our risk difference and risk ratio formulas the risk difference
is a divided by a plus b minus c divided by c plus d. And the rate ratio is a divided by a plus b
divided by c divided c plus d. Now. Applying our rate difference and rate ratio formulas, the risk
difference is A divided by PT in the exposed, where PT stands for person time, minus C divided
by person time in the unexposed. And the risk ratio is a divided by person-time in the exposed.
Divided by c. Divided by PT, or person time in the unexposed. Note that for the following
texting while driving example, all data are fictitious and not from a published study. However,
there have been studies that show texting while driving to be associated with traffic accidents.
Here is the data in a two by two table. Note, the texting exposure is on the left-hand side and the
disease outcome, traffic accidents, is on the top. We calculate both the risk in the exposed,
9.09%, and the risk in the unexposed, 1.10%. With the formulas on the slide. Now to calculate
the risk difference you subtract the risk in the exposed minus that in the unexposed and get
7.99%. For the risk ratio you divide the risk in the exposed by the risk in the unexposed to get
8.27. Interpreting the risk for exposed and the risk for the unexposed in the texting while driving
examples. Looks like this. Among those who texted while driving, 9.09% reported a traffic
accident in a one year time period. And among those who did not text while driving, 1.10%
reported a traffic accident in a one year time period. So you can see it's higher in those that traffic
accidents were more common among those that texted while driving. And here is how you
interpret the risk ratio and risk difference for this example. For the risk ratio, those that texted
while driving were approximately eight times as likely to have a traffic accident. Compared to
those who did not text while driving over a one year time period. And the risk difference among
those that texted while driving, the risk of traffic accidents was 7.99% higher then those who did
not text while driving over a one year. Time period. Now to calculate the rate in the exposed
group, the texters, you divide 30 by the total number of person-years, i.e 400 to get 7.5 cases per
100 person years. The rate in the unexposed group is 5 divided by 337 person years. Which gives
you 1.48 cases per 100 person years. Then to calculate the risk rate difference you subtract the
rate in the expose minus the rate in the unexposed to get 6.02 cases per person 100 person years.
To get the rate ratio you divide the rate in the exposed by the rate in the unexposed to get 5.06.
And here is how you would interpret the rate in the exposed and the rate in the unexposed.
Among those who texted while driving the rate of ext of traffic accidents was 7.05 per hundred
person years. Where as for the rate in the unexposed, among those who did not text while
driving; the rate of traffic accidents was 1.48 per 100 person years. And here's how you interpret
the rate ratio and rate difference for this texting example. For the rate ratio, those that texted
while driving had five times the rate of traffic accidents compared to those who did not text
while driving. The rate difference among those that texted while driving, the rate of traffic
accidents was 6.02 cases per 100 person years. Higher than the rate among those who did not
text while driving. In this segment we have learned about the different types of measures of
association including: Risk Ratios, Rate Ratios, Odds Ratios and Prevalence Ratios. We've also
learned about Rate Differences, Odds Differences and Prevalence Differences, and Risk
Differences. This concludes our segment about the definition. Formulas and interprenations of
different types of measures of association.

Odds ratio
Welcome, in this segment we will discuss the odds ratio. In this segment, you will learn
how to define, calculate and interpret the odds ratio. In a case-control study, we calculate the
exposure odds ratio. The odds ratio approximates the incident rate ratio or risk ratio under certain
conditions. Remember that odds is p divided by 1 minus p. The probability of an event occurring
divided by the probability of it not occurring. When Odds Ratio is equal to 1 then there is no
association between the exposure and the outcome of interest. When the Odds Ratio is greater
than 1. There is a positive association. And when the odds ratio is less than one, there is a
negative association. Be careful of how you set up your 2.2 table here. Cross, the cross product
formula only works if the table is set up correctly. For this epidemiology MOOC, we will use the
convention of disease on the top, and exposure on the side. However, outside this course, you
may see these switched. As I said before, in a case control study, the odds ratio is the exposure
odds ratio, which is the odds of being exposed in the cases equals a divided by c. Divided by the
odds of being exposed in the controls or b divided by d. Mathematically, this is the same as the
cross product, which is equal to a times d divided by b times c. The odds ratio is the ratio of the
odds of the health outcome or disease in the exposed Relative to the odds of the disease or health
outcome in the none exposed or less exposed group. Odds ratios can be calculated in cohorts
studies and in case control studies. Prevalence odds ratios can be calculated for cross sectional
studies. There are different ways that you can interpret a measure of association and words as
illustrated here. You could say ,those in a traffic ac, accident were 1.62 times as likely to have
been texting while driving compared with those who were not in a traffic accident in the past
year. Or those in a traffic accident were 62% more likely to have been texting while driving.
Then those who were not in a traffic accident in the past year. But the most precise interpretation
is as follows the odds of a traffic accident among those who texted while driving was 1.62 times
the odds of a traffic accident. Among people who did not text while driving. Be careful, you
cannot calculate a risk or rate directly from case control data. The denominators obtained in a
case control study do not represent the total number of exposed and non-exposed person in the
source population. The investigators arbitrarily decide how many controls will be selected to
compare with the cases. We cannot directly measure risks or rates because the population at risk
in the denominator is not ascertained. Under certain conditions, we can obtain a valid estimate of
the rate ratio risk ratio using the odds ratio. But we won't go into those conditions for this
MOOC. So let's review. We can't estimate a risk or rate directly from a case control study
Because we or the researchers, decide on the number disease people, the cases. And the non-
disease people are controls. When we design our study. So the ratio of controls to cases is not
biologically or substantively meaningful. Instead We estimate the risk ratio or the rate ratio in a
case control study using the odds ratio. Lets look at an example of calculating and interpreting
the odds ratio, using childhood vaccines and human papillomavirus. Researchers conducted a
case control study to examine whether childhood vaccines Protected children against HPV in real
world conditions i.e. Rural areas with little access to regular health care. Note this is a
hypothetical example. A total of 25 cases and 19 controls were identified. Data obtained from the
cases and controls found that 15 cases and 12 controls had received the vaccine. Another way to
interpret the effect of the vaccine is to compare the odds of those who did not get the vaccine to
those who did. To do so, you take the reciprocal of the odds ratio i.e. 1 over 0.875, which gives
you 1.14. So children who did not receive childhood vaccines were 1.14 times as likely to have
HPV compared to children who did receive childhood vaccines. Okay, let's try another example.
You are investigating an outbreak of paralytic shellfish poisoning among patrons of an Alaskan
restaurant. You conduct a case control study to identify food associated with the illness. A total
of 240 cases and 134 controls were identified. Data obtained from the cases and controls found
that 218 cases and 45 controls consumed scallops. Now I'd like you to create a two by two table
and calculate the odds ratio for this example and interpret your results, and then check back in a
minute. This is what your table would look like. 218 cases of paralytic shellfish poisoning and 45
controls who consumed scallops. And then we see there were 22 cases in 89 controls who did not
consume scallops. So to calculate the odds ration, you take the cross product, which is 218 times
89, divided by 45 times 22. And you get 19.6. To interpret this. You can use the following
interpretation. There are other permutations, but here's a short one. Cases were 19.6 times as
likely, compared to controls to have eaten the scallops.

Interpretation of Measures of Association


  Now so far we've covered both the definition and calculations for Measures of
Association. Next, let's learn how to correctly interpret the measures of association. Once you've
learned these, you might actually find yourself critiquing news reports or discussions in the news
that are incorrectly using the measures of association. After you have reviewed this segment you
should be able to do the items listed in the learning objectives. These include interpreting the
measures of association, and recognizing which measures of disease occurrence or frequency,
and association are commonly used with different study designs. How do I know if an exposure
has a positive or negative effect on the disease that I'm interested in? Or how do you know if the
exposure doesn't have any effect at all? Here are the guidelines for relative measures of
association i.e., ratios. If the risk rate is equal to one then there is no association between
exposure and the disease or health outcome. If the risk ratio or rate ratio is greater than one then
the risk in the exposed or rate in the exposed is greater than in the unexposed. If the risk ratio or
rate ratio is less than one, then the risk in the exposed is lower than the unexposed. For absolute
measures of association, i.e., differences, if the risk difference is equal to zero, then there is no
association, i.e., the risk is the same in the both groups. If the risk difference is greater than zero,
then the risk in the exposed is greater than in the unexposed. And if the risk difference is less
than zero, then the risk in the exposed is less than in the unexposed. Note that the null value for
differences is zero while for the ratios the null value is one. Let's preface the topic of which
measures of association are commonly found with different types of study design by first noting
that all the measures of association we have covered can be estimated in the cohort study design.
However, some of the other study designs are not able to directly calculate risks and rates as you
can in the cohort. So, here is a table illustrating the measures of association that can be
commonly used for different types of study designs. Note that prevalence and odds ratios and
differences are more commonly found with cross-sectional and case-control studies, while risk
and rate ratios and differences are more commonly used with cohort studies. Risks and rate ratios
cannot be directly calculated from case control and cross sectional studies. This concludes the
segment on interpreting measures of association.

Confidence Intervals
In this segment, we're going to talk about confidence intervals. Confidence intervals help
us understand the range of variability or uncertainty in either our measure of association or, our
measure of disease occurrence. After you, you have reviewed this segment, you should be able to
interpret both statistically significant and non statistically significant measures of association. Or
measures of a disease occurrence and their confidence intervals. Sometimes you might see a
measure of disease occurrence or a measure of association accompanied by a confidence interval.
What is a confidence interval? Confidence intervals are a statistical construct that provide us
with information about a range in which the true value lies with a certain degree of probability.
As well as information about the direction and strength of the effect. Since we don't know the
true value of say a risk ratio or an odds ratio. We calculate their estimates. Confidence intervals
let us know how much our estimates of these measures of association might vary. We can answer
the question, what is the range of uncertainty about our estimate? If we perform an experiment
100 times, and calculate an estimated risk ratio each time. The 95% confidence interval is
expected to contain the true value of the risk ratio 95 out of 100 times. 95 is a commonly used
confidence interval. However, sometimes you might also see 90 or a 99% confidence intervals.
A quick clarification on interpretation. When interpreting the 95% confidence interval. Is it
correct to say that there is a 95% probability that the true value lies within the interval? And the
answer is, no. That is not correct. A probability is relevant to a process, not a specific interval.
Here is the mathematical formula for 95% confidence interval. The measure of association could
be a risk ratio, a prevalence odds ratio, a rate ratio, etcetera. You take that estimate and then
subtract 1.96 times the standard error of the point estimate to get the lower 95% confidence
bound. To get the upper 95% confidence bound, you add 1.96 times the standard error. Note that
the 1.96 is specific to the 95% aspect of the confidence interval. If you wanted to calculate the
99% confidence interval, you would use the number 2.575. And for a 90% confidence interval,
you would use the number 1.645. Since this is an introductory epidemiology MOOC, we are not
going to get into the details of how you calculate the confidence intervals by hand,
mathematically. But it is possible to do so. What does the confidence interval look like? It has a
lower bound and an upper bound. In this example the 95% confidence interval is 1.9 to 4.1. In
this example if you conducted the study 100 times, approximately 95% of those times the true
value would be contained between the interval of 1.9 and 4.1. You might ask when looking at
this diagram here, why is the estimate not equal distance from the lower and upper bounds of the
95% confidence interval? The answer is that the confidence interval for ratio measures of effect,
such as the odd ratio, rate ratio or risk ratio. Are computed using a logarithmic scale. If you take
the logarithm, you will see that the point estimate is equidistant from the lower and upper
bounds. Here's an important point I would like you to remember. The measure of association or
point estimate ie, the risk ratio, odds ratio, etcetera, will always be somewhere between your
upper and lower confidence interval. If it isn't, this is a good indicator that something went
wrong in your calculation. For beginning epidemiology students, there's free software available
that you can use to calculate 95% confidence intervals. These include Open Epi and EpiSheet.
Using either of these spreadsheets, you can plug in the numbers for each of the four cells and a 2
by 2 table. These spreadsheets then calculate the standard error and those related 95% confidence
intervals. So what is a p-value? Study results are a combination of real effects and chance. The p-
value is a probability that tells you whether the study results are consistent with being due to
chance. The p-value does not tell you if the study result was due to chance. P-values alone do not
let us to directly say anything about the direction or size of a difference or measure of association
between different groups. So what do the p-value and 95% confidence interval tell us? Well first
we know that the 95% confidence interval has a relationship with the p-value. And if the 95%
confidence interval does not include the null value, it is called statistically significant. When a p-
value is less than alpha, which is usually chosen as 0.05, it may be called statistically significant.
When you have a statistically significant result, it means that you can reject the null hypothesis,
that there's no association between the exposure and the health outcome. Confidence intervals
contain more information than a p-value. A confidence interval also tells us the magnitude of the
association between the exposure and a disease. And it also tells us about the precision of the
estimate we obtained. The narrower the confidence interval, the more precise the estimate. A
clear distinction must be made between statistical significance and clinical relevance in
epidemiologic studies. The same numerical value for the results may be statistically significant,
if a large sample size was used. And not significant if the sample size was smaller. However,
study results of clinical relevance are not automatically unimportant just because there's no
statistical significance. So now let's look at these 95% confidence interval examples. Which of
these confidence intervals is or are, statistically significant? Which is the most precise? And
another question to think about. Are narrower confidence intervals more significant? So, if the
confidence interval does not cross the null value, in this case, 1.0. Because we're talking about a
ratio measure. Then the confidence interval is statistically significant. Of the examples listed, a,
b, and c. C is statistically significant. C is is statistically significant because it does not cross the
null value. A and B do. B is a more precise confidence interval compared with A and C. Why is
that? Well, B is more precise because the confidence interval is more narrow or smaller
compared with A and C. As A is the widest interval of these three examples. It is the least
precise confidence interval. As I pointed out in one of the previous slides, statistical significance
of the confidence interval, depends on whether the confidence interval includes the null value. So
while B is a more precise estimate, it is not statistically significant. Because it includes the null
value of one. So here's a quick example for you to test your understanding. In this example is the
risk ratio estimate 2.8 with its 95% confidence intervals of 1.9 to 4.1, statistically significant?
And the answer is yes. This risk ratio estimate is statistically significant as the 95% confidence
interval of 1.9 to 4.1 does not include the null value of 1.0. This concludes the segment on
confidence intervals. The most important things to take away from this segment. And
understanding how to interpret confidence intervals and their statistical significance. Is if the
confidence interval does not include the null value, then it is statistically significant. And so for
the null value for ratio measures is one. And for difference measures it's 0. So if for example if a
rate ratio confidence interval does not include the value one, then it is statistically significant.
And if we were talking about a difference measure, measure, such as a risk difference. And the
confidence interval around the risk difference did not include the null value of 0 then it would be
statistically significant. So hopefully you can use this information when you're reading articles or
reading a newspaper articles that it, that actually include a confidence interval. And it will help
you understand the uncertainty around that measure of association or measure of disease
occurrence.

Confidence Intervals Example


Welcome to Week 4, Confidence Intervals Example. This segment will focus on an
example of calculating the measure of risk and the associated 95% confidence intervals. Many of
the world's used electronics such as cell phones, TVs and computers get shipped to cities in
China for recycling. Both children and adults work to separate out rare metals in the electronic
devices. Some of the children have very high metal exposures to lead and cadmium. Our
example is adapted from a study by Yang and Ahn published in 2013 entitled, Effects of Lead
and Cadmium Exposure from Electronic Waste on Child Physical Growth. In our fictitious
example, we will look at a cohort of children in Guru, China who recycle metal and electronics.
This is a diagram of the cohort study. We start with a cohort children ages 5 years old who are
working on recycling electronics with an n equal to 1,725. Then we measure the concentrations
of the metals in their blood at baseline. Based on their blood concentrations of metals, we then
classify them into children with high metal concentrations in their blood where their n equals 589
and low metal concentrations where the number is 1,136. We then follow both groups of children
over 10 years, and measure their physical growth, measured by shorter height, at age 15. This is
how you would then set up a 2 by 2 table to calculate the risk of decreased growth, or shorter
height in children with high levels of metal concentrations in their blood compared to childrens
with low level concentrations of metals in their blood at baseline. Now, I would like you to
calculate the risk ratio of decreased growth or shorter height, potentially related to heavy metal
exposure. The answer is 0.139219 divided by 0.025528 which is equal to 5.45. The interpretation
of this risk ratio is, children with high metal exposures at age 5 were 5.45 times as likely to have
decreased physical growth, or shorter height over a 10 year period, compared with children with
low metal exposure. Now, how do you calculate the 95% confidence intervals associated with
this risk ratio? One possibility is with this free software. And you could either use Open Epi or
EpiSheet and the 2 links are provided here. So using the free software, we plug in the numbers to
our 2 by 2 table and calculate the 95% confidence interval. In this example, for the risk ratio of
5.45, the 95% confidence interval is 3.614 to 8.23. Is this risk ratio estimate statistically
significant? What do you think? The answer is yes. This risk ratio is statistically significant
because the 95% confidence interval does not include the null value of 1.

Determining Causality
Witches, demons, evil spirits, the wrath of gods, miasmas, or bad air were all once used
by people before the modern era of science to explain the cause of disease outbreaks and other
calamities. In this lecture, we will talk about modern ways of determining causality. After you
have listened to all the lectures for this week, you should be able to complete these learning
objectives. They include: define causality and causal inference. State the guidelines for accessing
whether an association is casual. Distinguish between real and spurious associations. Describe
the nine Bradford Hill Criteria and give examples of each. List other more recent models for
understanding causality. In ancient times, people believed that outbreaks and plagues were the
result of the will of a god or evil spirits. However, some people wanted to have a more
reasonable explanation for these occurrences. People have always tried to give meaning to what
they see around them and what affects them. Often, witchcraft was blamed for things such as
infant deaths and crop failures as a way to explain their occurrence. The outcome for those
accused of witchcraft were witch trials, which most often ended in the death of the accused. For
some time, people also believed in miasma, or bad air -- the idea that diseases such as cholera
and the Black Death, or the Great Plague were caused by bad air. Over time, the germ theory was
developed to explain how some diseases are caused by microorganisms, and the field of
epidemiology began with scientific observations of epidemics and other health outcomes. A
large part of the field of epidemiology is investigating the causes of disease. A formal definition
of causality may be, quote, an event, condition, or characteristic that preceded the outcome or
disease event and without which the event either would have not occurred at all or would have
not occurred until some later time. End quote. And this is from Rothman and Greenland.
American Journal of Public Health, 2005. Causality is not observed, but often inferred. This is
known as causal inference. Let's think about what is a causal relationship, and why we care about
causality. The primary goal of the epidemiologist is to identify those factors that have a causal
impact on disease or health outcome development. For example, the causes of malaria.
Determining causal relationships can provide a target for prevention and intervention, such as
insecticide treated nets to prevent malaria transmission. It is important to note that sometimes no
specific event, condition, or characteristic is sufficient on its own to produce a health outcome or
disease. Epidemiologists often use the term risk factor to indicate a factor that is associated with
a given health outcome. For example, some risk factors for heart disease include, high blood
pressure, a fatty diet, smoking or genetic makeup. If a person has any of these risk factors, they
should be regularly monitored by a medical professional. Again, a big part of epidemiology is
understanding what causes diseases. So, let's look at some recent headlines as an example of
determining causality. What really causes cancer and heart disease? Does one thing such as red
meat consumption really cause cancer or heart disease, even when so many other factors may
also play a role? For example, what about the role of overall diet, exercise, genetics and stress?
Imagine how hard it would be to conduct a randomized controlled trial, to study the effects of
eating red meat. Some study participants would be randomized to a very restrictive diet of red
meat over a long time period. For another example, let's consider smoking and its link to lung
cancer. There are some people due to their genetic makeup or previous experience are
susceptible to the effects of smoking, and others who are not susceptible, or as susceptible. These
susceptibility factors are part of the causal mechanisms through which smoking may cause lung
cancer. Remember, when studying causality, the causation is not observed, but is often inferred
based on data and health outcomes. Epidemiologists often employ the counterfactual model.
Meaning they ask, what would have been the experience of the exposed if the exposure had not
occurred? For example, what would have been the risk of lung cancer if smoking had not
occurred? If we determine than an exposure is associated with a health outcome, the next
question is whether the observed association reflects a causal relationship. Even if an exposure
precedes a health outcome, it does not always mean causality. Even if it is strongly associated.
Let's look at a classic example. We can say carrying a lighter is associated with lung cancer.
Carrying a lighter precedes lung cancer. So does carrying a lighter cause lung cancer? No. It is
important to distinguish between causal associations and spurious associations. When you have a
causal association, it means that the occurrence of an event depends upon the occurrence of one
or more other events. The event will not happen unless the other events or variables have
occurred. When you have a spurious association, it means that bias, failure to control for
extraneous variables, such as when there is confounding, misapplied statistics or models, etcetera
have played a role. There are a series of criteria that have been developed and refined over the
years that now serve as a guideline for causal inference. We will discuss these in another lecture.
But the most important point to remember is that causality is not determined by any one factor.
Rather, it is a conclusion built on the body of evidence. A cause is something that must proceed
the health outcome, and must be necessary for the health outcome to occur. A given health
outcome or disease can be caused by more than one causal mechanism, and every causal
mechanism involves the joint action of a number of component causes. There are events that
directly cause a health outcome, such as being bitten by a mosquito carrying the malaria parasite
and contracting malaria. There are also events that indirectly cause a health outcome as part of a
larger process, such as the combined role that genetics, smoking, and diet play in developing
cancer. It is reasonably safe to say that there are nearly always some genetic and some
environmental causes in every causal mechanism. So, why is it important to distinguish between
causal and noncausal associations? The reason is we want to know what causes disease or health
outcomes, but also causal relationships are used to make public health decisions and design
interventions. This concludes our lecture on causality.

Bradford Hill Criteria


In this lecture, you'll learn about the Bradford Hill criteria for causality. After you have
listened to this lecture, you should be able to describe, the nine Bradford Hill criteria for
causality, and give examples of each. You should also be able to list modern models of causality.
In 1965, English epidemiologist and statistician, Sir Austin Bradford Hill identified the nine
factors that constitute the current standards for determining causality. Hill's conclusions
expanded upon criteria that had previously been set forth in the U.S. Surgeon General's 1964
Smoking and Health Report. And were developed to answer the question of whether cigarettes
cause disease, especially lung cancer. It is important to note that satisfying these criteria may
lend support for causality. But failing to meet some criteria, does not necessarily provide
evidence against causality. Hill's causal criteria should be viewed as a guideline, not as a check
list that must be satisfied for a causal relationship to exist. Bradford Hill himself was even
critical of using them for determining causality. Hill stated in 1965, quote, what I do not believe,
and this has been suggested, is that we can usefully lay down some hard and fast rules of
evidence that must be obeyed before we can accept cause and effect. None of my nine
viewpoints can bring indisputable evidence for or against the cause and effect hypotheses. And
none can be required as sine qua non. What they can do with greater or less strength is to help us
to make up our minds on the fundamental question, is there any other way of explaining the set
of facts before us? Is there any other answer equally or more likely than cause and effect. End
quote. This is from The Environment and Disease Association Or Causation in the Proceedings
of the Royal Society of Medicine, May 1965. Hill's criteria outline the minimal conditions
needed to establish a causal relationship. These criteria were developed as a research tool for the
medical field, but may also be used in other fields. Hill stated in 1965 that quote, the cause of
illness may be immediate and direct. It may be remote and indirect, underlying the observed
association, end quote. This is from The Environment and Disease, Association or Causation, in
the proceedings of the Royal Society of Medicine from May 1965. The first criterion is strength
of association. Strength of association between the exposure of interest and the outcome is most
commonly measured via risk ratios, rate ratios or odds ratios. Hill believed that causal
relationships were more likely to demonstrate strong associations than were non-causal agents.
Strong associations occur when an exposure is a strong risk factor, and there are few other risk
factors for the disease. For example, Bradford Hill pointed out that smoking is a strong risk
factor for lung cancer. Smokers are 15 to 30 times more likely to have lung cancer or die due to
lung cancer when compared with people who do not smoke. In addition, studies have shown that
the risk of lung cancer may be increased 20-fold or more when heavy smokers are compared
with non-smokers. There are certainly examples of weak but causal associations, such as
smoking and heart disease, where smokers are two to four times more likely to develop heart
disease than non-smokers. In the case of heart disease, there are a number of other risk factors,
including diet, sedentary lifestyle, and genetic predisposition that are as strong, or stronger, than
smoking as risk factors. Another weak, but causal association, is exposure to environmental
tobacco smoke, which has a risk ratio for lung cancer of 1.2. In this case, the risk ratio for
exposure to smoke carcinogens is much lower than the risk ratio for exposure to active smoking.
One should not assume that a strong association alone is indicative of causality, as the presence
of strong confounding may erroneously lead to a strong causal association. The next tenet,
consistency, refers to the reproducibility of study results in various populations and situations.
Consistency is generally utilized to rule out other explanations for the development of a given
outcome. However, the lack of consistency does not rule out a causal association, because some
effects are only produced under specific combinations of causal components. These conditions
may not have been met in some studies of other populations. For example, only 10% of heavy
smokers develop lung cancer. The other causal components are still being investigated. In
general, the greater the consistency, the more likely a causal association. Another criterion is
specificity of association. This simply states that if a single risk factor consistently relates to a
single effect, then it likely plays a causal role. For example, this one-to-one relationship exists
with certain bacteria and the disease they cause. Tuberculosis is a good example. It is important,
however, to note that there are few diseases that have only one causal agent, and since most
diseases, even tuberculosis, is caused by a constellation of factors. Including poverty, crowding,
low immunity, inadequate therapy, and the tubercle bacilli. The specificity of association
criterion has also been proven to be invalid in a number of instances, with smoking being the
classic example. Evidence clearly demonstrates that smoking does not lead solely to lung
carcinogenesis, but to a myriad of other clinical disorders ranging from emphysema to heart
disease. So, keeping all this in mind, some feel that it is the weakest of all guidelines in the list
and may even be misleading. Temporality has been identified as being the most likely to be the
essential element or condition for causality. For an exposure to be causal its presence must
proceed the development of the outcome. Lack of temporality rules out causality. One example is
the relationship between Atrial Fibrillation and Pulmonary Embolism. It is widely thought that
Pulmonary Embolism caused Atrial Fibrillation. However, more recent evidence, and plausible
biological hypotheses, suggest that the reverse could be true. Determining the proper course of
care may hinge upon discovering if pulmonary emboli can indeed proceed, and thus perhaps
cause the development of atrial fibrillation. Temporality is the only necessary criterion for
causality. And finally, it is easier to establish a temporal relationship in a concurrent cohort study
than in a case control study or retrospective cohort study. The next Bradford Hill criterion, the
biological gradient criterion, relies on dose response, suggesting that as the dose of the exposure
increases, the risk of disease increases. The presence of the dose-response relationship between
an exposure and outcome provides good evidence for a causal relationship. However, its absence
should not be taken as evidence against such a relationship. Some diseases do not display a dose
response relationship with a causal exposure. They may demonstrate a threshold association
where a given level of exposure is required for disease initiation, and any additional exposure
does not affect the outcome. As an example of exposure response gradient, there is the gradient
of lung cancer by current amount smoked. However, some exposures do not cause disease until
the exposure threshold is reached. For example, skin burns and UV radiation, and cataracts and
ionizing radiation require that a certain exposure threshold level of UV or ionizing radiation be
reached before disease initiation. The dose response relationship is one of the strongest
guidelines, because a confounder is unlikely to cause the same disease gradient as a primary
exposure. Support for the next Bradford Hill criterion, plausibility, generally comes from basic
laboratory science. It is not unusual for epidemiologic conclusions to be reached in the absence
of evidence from a laboratory. Particularly in situations where the epidemiologic results are the
first evidence of a relationship between an exposure and an outcome. However, one can further
support a causal relationship with the addition of a reasonable biological mode of action. Even
though hard data may not yet be available. Laboratory experimental evidence increases our
confidence in drawing causal conclusions, but is not essential. Arguments about biologic
plausibility about an observed exposure response association are too often based only on prior
beliefs and the experience of the laboratory scientists. For example, arguments environmental
tobacco smoke cannot cause lung cancer because the doses are much below those causing cancer
in animals. Some associations lacking in laboratory experimental support. For example, that
some viruses can cause cancer as observed more than 30 years ago, have been subsequently been
confirmed in epidemiologic studies. Coherence represents the idea that for a causal association to
be supported, any new data should not be an opposition to the current evidence. That is,
providing evidence against causality. However one should be cautious in making definite
conclusions regarding causation, since it is possible that conflicting information is incorrect or
highly biased. The guideline is also interpreted to be satisfied when exposure is shown to result
in a cluster of related health events. As an example of a cluster of related health events consider
that smoking causes inflammation of the respiratory tract, release of damaging free radicals,
conversion of cells to pre-neoplastic states, transformation of cultured cells to cancer, activation
of oncogenes, and lung cancer in humans. The coherence guideline is more demanding than mere
biologic plausibility in that the evidence here must be extensive, cutting across disciplinary lines,
all of which mutually support a causal association between exposure and health outcome.
Experimental evidence is another Bradford Hill criterion. Today's understanding of Hill's
criterion of experimental evidence results from many areas: the laboratory, epidemiologic
studies, preventive, and clinical trials. Ideally, epidemiologists would like experimental evidence
obtained from a well-controlled study. Specifically randomized trials. These types of studies can
support causality by demonstrating that altering the cause alters the effect. For example, we can
control sun exposure to examine effects on skin cancer. We could randomize individuals to high
sun exposure and some to low sun exposure. By altering the cause, sun exposure, we could then
examine potential changes and effects on skin cancer development. Randomized trials are the
most persuasive studies, to establish causality. As they tend to balance unmeasured confounders
between exposed and unexposed. But their use is limited to risk factors that can ethically be
randomized among subjects. Even randomized interventions involving a complex exposure may
result in difficulty in pinpointing the specific causal agent, for example, a change in diet. Or in
our example on sun exposure and skin cancer, we could control sun exposure to examine the
affects on skin cancer, but what about those over age 40 who have had a lot of prior sun
exposure? Or, those who have gotten second-degree burns every summer for years. The final
Bradford Hill criterion is analogy. When a factor is suspected of causing an effect, then others
factors similar or analogous to the supposed cause should also be considered and identified as a
possible cause, or otherwise eliminated from the investigation. Analogy is perhaps one of the
weaker of the criteria, in that analogy is speculative in nature, and is dependent upon the
subjective opinion of the researcher. Absence of analogies should not be taken as evidence
against causation. In addition to assessing the components of Hill's list, it is also critically
important to have a thorough understanding of the literature to determine if any other plausible
explanations have been considered and tested previously. Note that this is not one of the
guidelines cited by Hill, but considering alternative explanations is important. Because we have
greater confidence that potential confounders were adequately controlled when multiple studies
address the confounders and still agree in finding an exposure response association. Often, a
single study cannot provide this assurance, even if it is well-designed and conducted. For
example, the first studies of smoking and lung cancer were viewed skeptically until they were
confirmed by many subsequent studies. Thus, in the case of smoking and lung cancer, there was
a need for the Surgeon General's Committee on Smoking and Health, in 1964, to critically
review all of the evidence. Some researchers consider these five guidelines to be the most
important ones. These guidelines are to be used to evaluate the body of knowledge on an
exposure response relationship, not merely applied to one or two studies. Temporality is
generally considered to be necessary. But the absence of one or more of the other four guidelines
does not negate the possibility of a causal relationship. Even well established causes may be later
shown to be inaccurately characterized, or further evidence may seriously challenge the
judgement that causality exists. By their nature, scientific conclusions, however valid at one
time, are subject to challenge, when new methods of generating data present new evidence
contrary to this established conclusions. Causal criteria like Bradford Hills are just one approach
to assessing causal relationships. Just remember that causal criteria should be viewed as a set of
guidelines, and not a check list. The consequences of defining exposure as causal, need to be
considered before taking any action. You have to ask yourself what if the evidence was
misleading? In addition to Bradford Hill's Guidelines for Causality, several more recent models
for understanding causality have been developed. These include Causal Pis, Counterfactual
Models, and Directed Acyclic Graphs. For the purposes of this class, we will only mention these
models here, but we have provided some suggested readings for those students interested in
further exploring these models. This concludes the lecture on the Bradford Hill criteria for
causality.

You might also like