You are on page 1of 103

Study Designs in Epidemiology

Kennedy Muthoka,
JKUAT
Study design: Definition
A study design is a specific plan or
protocol for conducting the study,
which allows the investigator to
translate the conceptual hypothesis
into an operational one
Study Designs: Types

 Qualitative
 Quantitative
 Experimental

 Observational
Quantitative designs

 Epidemiologic studies broadly


classified as:
 Observational - Non-Interventional

 Experimental - Interventional
OBSERVATIONAL
 Here nature allowed to take its course
 Changes and differences in one variable are

studied in relation to changes and


differences in another
 Investigator DOES NOT intervene

- (e.g. Smoking v/s lung cancer: Investigator


does not decide who smokes and who must
not)
EXPERIMENTAL:
 Investigator deliberately introduces change
in one variable and measures effect of this
on outcome
 Studies that entail manipulation of the study
factor (exposure) and randomization of
subjects to treatment (exposure) groups
- These have ethical problems in
humans as well as costs involved
- Therefore most studies are
OBSERVATIONAL
Observational studies

 These range from


- Relatively weak studies .e.g
Case reports and Ecological studies
to
- Strong studies e.g. Case-Control and
Cohort studies
Observational studies cont..:
 Observational studies are either:
- Descriptive
- Analytical
Descriptive studies:

 Weakest epidemiological design

 Investigator merely describes health


status of pop. or characteristics of a
number of patients by time, place and
person

 It also offers limited information about


group of patients, their clinical
characteristics and outcomes
Descriptive Studies cont…:

 These studies are weak because they


make no attempt to link cause and
effect thus no causal association can be
determined
 However, are often first step to well
designed epidemiological study
 Can define good hypothesis to be tested
using a stronger design
DESCRIPTIVE STUDIES cont…
 Description by
- Person - stating what pop. and
sub pop. do or do not
to develop disease.
- Place – Its geographical location

- Time - How does frequency of


disease vary with time.
DESCRIPTIVE STUDIES cont…

 Indices of person include demographic e.g.:


- Sex, age, life style e.g. alcohol use and

medicines.
 Characteristics of place e.g.:

- Variation between regions, or countries

 Time characteristics e.g.:

- Season variation, Frequency with time


DESCRIPTIVE STUDIES cont…:

 Examples of Descriptive studies:


- Case reports and Case series

 Case reports
- Considers single or grp of patients with similar
diagnoses

- Clinician identifies unusual feature related to


disease and formulates a hypothesis

- They represent important interface between


clinical medicine and epidemiology
DESCRIPTIVE STUDIES cont…:

 Case reports and series are among most


published in journals (Over 1/3 of
articles)

 Case series are collection of individual


case reports which may occur within a
fairly short time period
Case-Series and Case Reports…:

 This study type has important historical


place:
- Was used as early means to identify
presence or beginning of epidemic

- Even now, in routine surveillance,


case reports can suggest new
disease or epidemic
Case-Series and Case Reports cont…:

e.g. Early epidemiology of AIDS:


- Between October 1980 and May 1981, 5

cases of previous healthy homosexuals


in U.S. had pneumo-cystis Carinii
pneumonia (PCP)

- PCP was known to occur in older cancer


patients with immunosuppression due
to chemotherapy
Case-Series and Case Reports cont…:

 1981 young homosexuals seen with Karposis


Sarcoma
- Again this was known mostly common in
elderly men and women equally
- Thus hypothesis was formulated
 While case reports and case series are very

useful for hypothesis formulation;


- They can’t be used to test valid statistical
association
Case-Series and Case Reports…:

 In investigating affected individuals


this can lead to hypothesis formulation

 An analytic study, may then be


conducted to compare the experiences
of individuals with disease to identify
possible causal factors
Case-Series and Case Reports cont…:

 Limitations include;
- Based on experience of one person. Presence
of any risk factors may be co-incidental
- Frequently not large enough to permit
quantification of frequency of an exposure
- Interpretation limited for lack of appropriate
control grp
Ecological (Correlational)…:

 Are also weak designs


- Units of study are pops. other than
individuals
- Useful also in generating hypotheses

but no causal inference can be drawn


- An apparent ecological link may not be

true link
- May be confounded by several factors
Ecological (Correlational)….:

 Investigator measures charact. of entire pop. to


describe disease in relation to age, sex etc.
E.g.
 Pap Smears Correlation with mort. from Ca. Cervix

 Exam: the %↓ in Ca. Cervix in 2 periods

- 1950-1954 - Time pap started to be used


widely
- 1960-1964 - Time notable ↓in mortality could
have started
Ecological (Correlational)…..:

 % of women with pap screening was noted


in different states
 States with highest % of women screened
had strong and signif. correl. and had largest
% ↓ in mortality
 Conversely those with lowest % showed
significant smallest % ↓ in mortality
ECOLOGICAL (CORRELATIONAL):

 Example raises question (hypothesis) that:


- Screening ↓mortality from Ca. Cervix

- Hypothesis can’t be tested (answered) with


this data since:
a) Don’t know whether women screened are
those who experienced lower mortality i.e.
The survey involved ALL women NOT only
those screened
ECOLOGICAL (CORRELATIONAL):

b) Cant control for potential confounding factors


 E.g. In Study of per capita average daily intake of
pork in relation to Breast Cancer
- A strong +ve Correl. between death from Ca.
breast and pork eating was noted (Hypothesis)
- However ↑pork eating may be marker to other
factors e.g.
-↑fat in body
-↓ vegetable eating - These ↑risk of breast Ca.
-↑ Socio-economic status
Ecological (Correlational)…:

 Can’t separate confounders using correlat. data


- More dramatic illustration of this limitation is:
-Very strong +ve correlation between per capita
number of color TV sets and CHD mortality in
different nations
 Hypothesis – there is association between color

TVs and CHD mortality


- However color TVs are related with other lifestyles
and these ↑risk of CHD e.g. smoking, ↑cholesterol,
inactivity
Ecological (Correlational)…:

- Again absence of correlation does not mean


absence of valid statistical associations
E.g. In early 70s use of oral contraceptives ↑in
USA
At same time mort. rates from CHD
among all childbearing women 30%
-This information does not support an inverse
association between use of OCs and risk of
fatal CHD
ECOLOGICAL (CORRELATIONAL)…:

 Later, Cohort and Case-Control studies


have shown a two fold ↑ in risk of fatal
CHD in women using OCs than those
who don’t
 This is difficult to get or perceive by
correlational or ecological data
ECOLOGICAL (CORRELATIONAL)…:

 Correlation data provides average


exposure levels rather than actual
individual levels

 So an overall +ve or –ve linear


association may be shown but may be
masking more complicated relation
between exposure and disease
Ecological (Correlational)….:

E.g. In various countries the per capita alcohol


consumption showed a striking simple inverse
linear relationship with CHD mortality
 This meant that:-

- Countries with ↑per capita alcohol consumption had

↓CHD mortality risk

- Countries with ↓per capita alcohol consumption had


↑ CHD mortality.
ECOLOGICAL (CORRELATIONAL)…:

 Later it was shown relationship is not as


simple
 Actually the association is best represented
by a J-shaped curve
 Those consuming largest amounts (C) have
highest risk
 Those drinking small to moderate amounts
(B) have lowest risk
CHD Rate C

Alcohol Consumption
Ecological (Correlational)…:

 Those who don’t drink have slightly (A)


higher risk than (B)
 Such non-linear relationship can not be
identified easily from correlation studies in
which:
Exposure represents an average consumption
for pop. rather than the actual consumption
patterns of individuals
Ecological fallacy
 Is a logical fallacy in the interpretation of
statistical data where inferences about the
nature of individuals are deduced from
inference for the group to which those
individuals belong
 Occurs when you make conclusions about
individuals based only on analyses of group
data
Confounding variable
 Is an extraneous variable in a statistical
model that correlates (directly or inversely)
with both the dependent variable and the
independent variable
CROSS-SECTIONAL Studies

 Also Descriptive studies sometimes called


prevalence surveys
- Here exposure and disease are assessed
simultaneously in individuals of well defined
pop. (Diarrhoea/Bacterial infection)
- Its like a specific window e.g. calendar year,
in which community wide survey is
conducted or
Cross-sectional studies….:

 Fixed point in the course of events that


varies in real time from person to person e.g.
 Pre-employment exam
 Pre-school entrance exam
 These provide information about:
- Frequency and Characteristics of disease e.g.
affects men only by giving a “snapshot” of
health experience of pop. at specific time
E.g. KDHS – FP; morbidity etc.
CROSS-SECTIONAL Studies….:

 X-sectional studies also useful for disease


prevalence etc in certain occupations
- But since exposure and disease are assessed
at single point in time, can’t determine
whether exposure preceded or resulted from
the disease
- This “chicken/egg” dilemma is common in
virtually all X-sectional studies
X-Sectional Studies Cont…:

 Example:
- Investigators compared prevalence rates of
CHD among white farmers between;

a) Those who did their own labor


V/S
b) Those who did not do their own labor
X – Sectional studies..:
 Prevalence rates of CHD among (b) were
X5 higher than those in (a) i.e. 157.2
versus 33.3/1000. (Even after adjusting for
age).

- It is not possible to know whether it is truly


protective or those with CHD are likely to
reduce their physical labor
X-Sectional studies cont….:

 X-sectional studies consider prevalence rather than


incidence.
- They reflect determinants of survival and aetiology
 E.g.

- A study shows low prevalence of CHD among


blacks than whites
- Since there is nothing to show CHD develops less in
blacks than whites
X-Sectional studies cont….:

 It could be :
 True that whites develop more CHD or
 Blacks have higher rates of CHD but die at higher rates.
 So X-sectional data can not distinguish between
the two
 In most instances X-sectional data is used to
describe characteristic of individuals with disease
and formulate hypotheses, but not test them
Cross-sectional Studies (Summary)

 Characteristics: detects point prevalence; relative


conditions; allows for stratification
 Merits: feasible; quick; economic; allows study of
several diseases / exposures; useful for estimation of
the population burden, health planning and priority
setting of health problems
 Limitations: temporal ambiguity (cannot determine
whether the exposure preceded outcome); possible
measurement error; not suitable for rare conditions;
liable to survivor bias
 Effect measure: Odds Ratio
COHORT STUDY
DESIGN
COHORT STUDIES

 One of major types of observational


analytic designs
 Also called, follow-up study,
Longitudinal
 All persons in cohort must be free of
disease to start with
TYPICAL COHORT DESIGN

DISEASE

FOLLOW UP
EXPOSURE
NO DISEASE

DISEASE

FOLLOW UP
NO EXPOSURE
NO DISEASE
Advantages

 All disease free at beginning


- Thus temporal (time related)
sequence between exposure and
outcome can be established
 Suited for assessing effects of rare
exposures mainly occupational. Allows
investigator identify adequate numbers
of exposed and non-exposed
Advantages cont..:

 Allows for examination of multiple


effects of one exposure
 They minimize potential for selection
bias which is a concern in case control
studies
Disadvantages

 Time consuming cause follow up is for


many years. Also expensive

 Thus, must be conducted after a good


hypothesis is formulated

 They have potential for loss to follow up


because of lengthy follow ups
TYPES OF COHORT STUDIES

 Prospective (Concurrent)
 Retrospective (Historical, Non Concurrent)

 Retrospective Cohort Studies:


All relevant events (Exposure and
outcomes of interest) have already
occurred
RETROSPECTIVE, Example:

E.g. Retrospective Cohort Study of


asbestos exposure and lung cancer
 Used tax registers to establish Cohort

 Followed in retrospect for mortality of

lung cancer
 Excess in mortality from lung cancer

was noted
 Then another investigation done using

prospective cohort design


TYPES OF COHORT STUDIES

 Prospective Cohort Study:


Classify subjects according to presence
or absence of exposure
 Exposure may or may not have
occurred
 But outcome is yet to occur

 Subjects are followed and incidence


calculated
Ambidirectional Studies:

 In some study, can collect data both


retrospectively and prospectively i.e.
“Ambidirectional” design on the same
Cohort
 This study is best for exposures with
both long term and short term effects
e.g. Chemical with birth defects (few
years) and Cancer – many years
ISSUES IN DESIGN OF COHORT STUDIES

 Nature of questions being evaluated


e .g.
- Smokers easy to pick and follow
prospect. or retrospect.
- However others not so easy e.g.
Chemical exposure (Occupation or
environment)
ISSUES IN DESIGN OF COHORT STUDIES

 Advantage of selecting special exposure


group is that:
- It allows accrual of sufficient exposed
individuals in reasonable period of
time.
ISSUES IN DESIGN OF COHORT STUDIES

 Allows evaluation of rare outcomes that


would probably require prohibitively large
numbers to test
 They provide efficient means of identifying
risk factors in general population (Exposed
have ↑levels of exposure than general pop)
 E.g. Mesotheliomas incidence 5/milli/yr
- 20, 000 cases for 5 years
- Common among asbestos
SELECTING COMPARISON GROUP

Non-exposed:
 Group must be as similar to exposed as

possible except for exposure

(So that if no exposure disease rates in


pops. being compared will be same)
Cohort Studies (Summary)
 Characteristics: follow-up period (prospective;
retrospective)
 Merits: no temporal ambiguity; several outcomes
could be studied at the same time; suitable for
incidence estimation
 Limitations (of prospective type): expensive; time-
consuming; inefficient for rare diseases; may not be
feasible
 Effect measure: Risk Ratio (Relative Risk)
CASE-CONTROL
STUDY DESIGN
CASE-CONTROL STUDIES…:

 Another of the observational analytic


study designs (Commonest also).
 Grps of individuals are defined on the
basis of disease or no-disease to
suspected exposure factor.
 All subjects therefore must already
have developed disease at study start.
TYPICAL CASE-CONTROL STUDY
EXPOSED

CASES DISEASE
NOT EXPOSED

EXPOSED

NO
CONTROLS
DISEASE NOT EXPOSED
CASE-CONTROL STUDY DESIGN

 Determine what proportion were


exposed and what proportion were not
in both the cases and the controls.
 We trust that if exposure is related to
the disease,
- “Prevalence” of history of exposure
among the cases will be greater
than that in controls
Problems to keep in mind in selecting
cases:-

 If selected from single hospital, risk


factors that are identified may be
unique to that hospital as a result of
referral policies. So we may be
investigating hospital related risk
factors.
Problems to keep in mind in selecting
cases:-

 E.g. Severe cases in hospital, risk


factors are related to severe diseases so
best to select from several hospitals in
the community.
 Incident cases or prevalent: Case-
control is more of prevalent so that we
can get many cases quickly without
waiting.
Selecting controls:-

 Probability sample of the total population


may be impossible.
 Insurance company lists, school rosters.
 Neighbour to each case.
 Random digit dialing (Developed countries).
 Best friend, control, approach from case.
 Spouse or sibling.
Selecting controls:-

 Hospitalized Controls.
 Captive pop. So its easy.

 However not easy to characterize these


because they come from various pops.

 They also differ from general population


E.g. prevalence of smoking is higher than
in general population (Berkson Bias).
Selecting controls:-

 Many diagnoses for which patients are


admitted relate to smoking.
 So both cases and controls will be exposed

which weakens the association.


 In hospital choose control (from a number of
hospitals) that don’t have the disease
(cases).
Selecting controls:-

 However if cases with lung cancer from


smoking, then selecting other lung
disease e.g. Emphysema or even CHD
dilutes association.

 Can exclude these during recruitment


or stratify during analysis.
Selecting controls:-

 Another example of coffee drinking


and pancreatic cancers.
- Coffee drinkers also smoke. Could
be smoke (we know that) is the one
associated with cancer.
- So indirectly examining cigarettes
through coffee. So we stratify.
Selecting controls:

 Again Ca. Pancrease (Cases) and other


chronic G.I.T. problems (Controls)
against Coffee (exposure).
- G.I.T. patients may reduce Coffee
because of problem. So association
weak.
- Level of exposure must be close to
general pop.
Matching:

 Matching is a process of selecting


controls so that they are similar to
cases in certain characteristics e.g. age,
sex, socio-econo, race etc.
- Cases are poor and most controls
affluent, then have problem with
exposure
- So distribute by socio-economic
status.
Matching:

 Group and Individual matching.


Group:
If 25% of cases are married then 25%
controls will be married.
Individual:
Matched pairs; Control selected that is
similar to case in terms of specific
variables.
Matching:

Problems with matching:


- Matching for too many characteristics
makes difficult to get controls

- Characteristic matched for cannot be


studied as risk factor.
Multiple Controls

 More than one control for each case


increases the power of the study, makes
study more valid.

 However noticeable power increase is gained


upto a ratio of 1:4 (case: control). Beyond
this there is insignificant gain in power.
Nested Case-Control Studies:

 Hybrid study in which Case-Control


study is nested in a Cohort study.
- Recall bias eliminated through initial
interviews
- If biochemical abnormalities occur,
the change will be notable against
earlier baselines
Example:
 e.g.
- Take blood from everybody at start
of study
- Freeze this and follow up for cancer
- Separate into cases and controls
- Analyse their blood
 This approach saves the many

thousand that have to recruit for


study
NESTED CASE-CONTROL STUDIES
Exposure

Disease
No exposure
•Population
Exposure

No Disease
Collect blood and No exposure
keep
Case - Control Studies (summary)
 Characteristics: two source populations;
assumption that non-cases are representative of
the source population of cases
 Merits: least expensive; least time-consuming;
suitable for study of rare diseases (especially
NCDs)
 Limitations: not suitable for rare exposures;
liable to selection bias and recall bias; not
suitable for calculation of frequency measures
 Effect measure: Odds Ratio
INTERVENTIONAL
STUDIES

CLINICAL TRIALS
Clinical Trials

 Also referred to as Randomized Clinical Trials


(RCT)
 These studies are used to evaluate both
Effectiveness and the Side effects of new forms
of intervention
 They are also comparative studies like analytic
studies
 One grp is assigned the new intervention and
another, the old & follow up
Clinical Trials

 Are epidem. studies that provide high quality


data like in controlled experim. by basic
science researchers
 Like in cohort, enrolment is based on
exposure, but difference is that the investigators
allocate the exposure
Basic form of Clinical Trial

Treatment
grp
Refere Study
nce Pop
Pop
Non treatment
grp
Basic form of Clinical Trial

Improved
Exposure
(Treatment Grp)
A
Not
Improved

Improved
No Exposure
(Non Treatment
Grp. Placebo)
B
Not Improved
Advantage of Clinical Trials

 If allocations are at random in large enough


sample, these studies have potential of
assuring the validity of results not seen in
observational studies
Why Clinical Trials:

 Observational studies can not establish small to


moderate changes reliably because
confounding can not be controlled
 Randomized trials (RCTs) yield strongest
direct epidem. evidence that association is of
cause and effect
Designing, Conducting and Analyzing
intervention studies

Types of intervention studies


 Therapeutic or

 Preventive
Therapeutic (or Secondary prevention)

 These trials are conducted on pts with disease


to determine if an agent or procedure reduces
symptoms, prevent recurrence of disease or
risk of death
E.g. - Radical mastectomy for Breast cancer
introduced by William Halstead in
early 20th Century.
Therapeutic

 He felt that removing surrounding lymph


nodes, muscle and the tumor reduces risk of
recurrence and spread of cancer
 This became standard treatment for years
 In 70s scientists found that radical and less
extensive surgery were similar in mortality and
the 5 year survival
Problems with Interventional Trials

 In observat. analytic study, investigator is


passive observer
 In interventional there is active assignments of
participants into grps. (Bias!!!!)
 So must have good ethical and feasibility
belief in agent’s potential to justify exposing
or withholding the agent to one grp
Problems with Trials cont…

 Thus harm must not be allowed and benefits


must not be withheld
 It can be difficult for pple to forgo treat. that
they believe is beneficial for duration of trial
even if there is no evidence
- E.g. Vit. Supplem. in cancer etc.
Scientific evidence, non conclusive but
pple buying large amounts
Problems with Trials cont…

 If no controlled trials conducted soon, wide


spread use will be so large as to be impossible
to conduct them
 Trials are more costly than observational
because of extensive laboratory and hospital
based tests
Selection of Study Population

 Primarily trial design is for valid result and not


generalizability
 Must see whether experim. pop. is large
enough to get required sample size in trial
 Choose experim. pop. that gives sufficient
number of outcomes for comparison between
the treatments in reasonable period of time
Selection of Study Population cont..

 Must get complete & accurate subject follow


up information for duration of trial espec. if
frequent clinic visits are needed
 May need pilot study to see if experimental
pop. will be cooperative
E.g. Doctors may fill questionnaire by e-mail
which cuts down on costs
Selection of Study Population cont…

 In trials, participants are very likely to differ


from non-participants in ways that may affect
rate of development of outcomes of study
 Eligible pple who participate in trials tend to
show lower morbidity and mortality rates than
those who do not
 Volunteerism must be considered with care
Randomization

 To maximize probability that grps receiving


different treatments are comparable, ‘assign
grps randomly’
 ‘Assign randomly’ implies that each person
has same chance of getting each of the
possible treatments
 Probability that one will get given treatment is
independent of probability that any other
subject will receive the same treatment
Randomization cont…

 Randomization can be done by computer


generated random numbers
 Random. can also be done in blocks (block
randomization)
I.e. pple are grped by given variable then
random. done within the grp. This is good for
small samples!! but in large samples,
randomization guarantees comparability.
Randomization cont…

 If randomization is well done, nobody knows


grp to which one is assigned
 So potential for bias is removed & one can be
confident that differences seen at end of study
are not due to the selection of subjects
 If a system is predictable, then there is
potential for bias
Randomization cont…
 If allocation is by day of the week, this may be
suspect espec. when patients are brought in
close to midnight (or during weekend)
 In random., ‘on average’ , study grps tend to
be comparable in all variables except for factor
under study
 ‘On average’ means, the larger the sample
size, the more successful the random. process
in allocating these factors equally in the grps
Randomization cont…

 A crucial fact is ‘on average’ all known


confounders will be equally distributed
including all unknown potential confounders
about which there is limited knowledge
 Random. provides a degree of assurance about
the comparability of the study grps not found
in any observational study
Allocation of study regimens cont…

 If exposure is not by random. then must prove


that , all possible biases in allocation of pple to
grps, or known or unknown confounders that
may differ between the grps don’t account for
observed result
 Inherent confidence in results of a well
designed and conducted randomized trial
exists that isn’t there with other allocation
schemes
Maintaining & Assessing Compliance

 Subjects may not take treatment or may take the


alternative treatment on their own, or their
condition deteriorates
 Duration of trial contributes to non-compliance
 Must monitor compliance because non-
compliance statistical power to detect any true
effect of treatment
 Consider non compliance in interpreting result
Masking or Blinding

 This ensures that results are not biased


 Person investigating should not know the grp a
subject belongs to as this introduces bias
 Double blind in which neither investigator nor
subject knows grp they belong to
 Placebos usually used for blinding e.g. Double
Blind Placebo Controlled Clinical Trial
Masking or Blinding cont..

 The primary strength of a double blind design is


to eliminate the potential for observation bias
 Must have procedure for un blinding if side
effects occur
 Some studies e.g. Exercise, life style, diet are
not easy to blind. So one may either do single
blinding or non at all
~the end~

You might also like