You are on page 1of 164

Macario F. Reandelar, Jr., M.D.

, MSPH, FPAFP
Objectives:
 At the end of this session, the participants
will be able to:
 Identify the variables as to relationship given
a particular research topic.
 Give a definition/description of the variables
identified.
 It is important that the data used in the
study are operationalized in measurable
terms.
 There are two data types that researchers
collect – variables and concepts.
 Data are capable of taking on different
values.
 Usually easy to define and level or
measurement is objective, i.e., devoid of
any subjective interpretation.
 They can be either qualitative or
quantitative.
 Take categories for different levels of
measurement.
 For example, sex is a qualitative variable
and takes two levels of categories, male and
female.
 Civil status is another qualitative variable
and may be categorized as, single, married
or widowed.
 Take quantitative measures.
 Can either be counts or measurements.
 Age for example, is a quantitative variable when
measured in years or in months.
 Weight is another one when measured in kilograms
or in pounds.
 Number of patients consulting in an Out Patient
Department is a countable quantitative variable.
 Are mental images or perceptions
 Their meaning may vary from individual to
individual.
 Unlike a variable, a concept cannot be
measured objectively.
 It is therefore important for a concept to be
converted into a variable.
 Quality of hospital service is an example of
a concept.
 Stress is another one.
 Over eating is still another one.
 One must be able to formulate criteria to be
able to measure the concept and transform
it to a variable.
 Converting a concept to a variable needs a
validated questionnaire or a set of criteria
to make it usable in research that is devoid
of various interpretations and
measurements.
 Setting an indicator or indicators or a set of
criteria to define a concept is
operationalizing the concept.
 Below is an example of an
operationalization of a concept ‘Burnout’
 Burnout - syndrome of exhaustion,
cynicism and low professional efficacy
characterized by emotional exhaustion (EE),
depersonalization (PD) and diminished
feelings of personal accomplishment (PA)
 Burnout – measured by Maslach Burnout
Inventory (MBI) – a 22-item internationally
validated measure of burnout that examines
three domains including emotional
exhaustion, depersonalization symptoms
and level of personal accomplishment at
work, (see Questionnaire in the Appendix)
 EE subscale score ranges from 0 to 54, high and
low scores are ≥27 and ≤18, respectively.
 DP subscale score ranges from 0 to 30, high and
low scores are ≥10 and ≤5, respectively.
 PA subscale score ranges from 0 to 48; high and
low scores are ≥40 and ≤33, respectively.
 High Burnout: High scores on EE and DP
with a low score on PA
 Low Burnout: Low scores on EE and DP and
a high score on PA
 Average Burnout: Average scores on all
scales or not fulfilling high or low burnout
criteria
 To determine the risk factors for candidemia
among critically ill neonates in terms of age,
gender, manner of delivery, birth weight and age of
gestation
Macario F. Reandelar, Jr., MD, MSPH, FPAFP
 In a research when we want to look into the
relationship between an independent variable
and a dependent variable, between an
exposure and an outcome, between a risk
factor and a disease, major threats to the
validity of such a study can arise.
 These threats should always be considered as
alternative explanations in the interpretation
of the results
 Two error factors constitute these major
threats of a study:
(1) the systematic error (bias), and
(2) chance error or random (sampling
variability).
 Any trend in the collection, analysis,
interpretation, publication or review of data
that can lead to conclusions that are
systematically different from the truth.
 A systematic distortion of the
association between a determinant
and an outcome due to a deficiency in
the study design.
May arise due to:
(1) poor measurement (information bias),
(2) the sample being unrepresentative of the
target population (selection bias),
(3) the effects of other determinants on the
association of interest (confounding).
Sources of information bias:
 Faulty measuring instruments
 Different standards in different biochemical
laboratories
 Errors by clinicians in diagnosing diseases
 Bias by investigators consciously or
unconsciously reporting more favorable
results for treatments they believe in
Sources of information bias:
 Biased reporting of symptoms by patients
wishing to please doctors treating them
 Memory lapses by subjects in case-control
studies asked to recall past exposure to a risk
factor (recall bias)
 No validated definition of a variable
 Not validated questionnaire.
 May be reduced or eliminated by good study
design.
 Blinding of both investigators and subjects,
until after the response has been evaluated,
(double blind)
 If the evaluators know the treatment
allocation but the subjects do not, the study
is said to be single blind.
 Occurs when there is a systematic difference
between the characteristics of those selected
for the study and those who are not.
 Occurs when the exposure status influences
the likelihood of subjects being enrolled in
the study.
Sources of selection bias
 Survival and losses to follow-up differ in the
two exposure groups being compared
 Volunteer and non-response bias-Individuals
who volunteer for a study may possess
different characteristics than the average
individual in the target population.
Sources of selection bias
 Bias will be introduced if the association
between exposure and disease differs
between study volunteers and nonresponders
Sources of selection bias
 Hospital patient bias (Berkson’s Bias) –bias may
occur when hospital controls are used in a
case-control study.
Sources of selection bias
 Healthy worker effect – Generally, working
individuals are healthier than individuals who
are not working.
Sources of selection bias
 In Intervention studies - it occurs when there
are systematic differences between
comparison groups in response to treatment
or prognosis.
Example
 In a case-control study of smoking and chronic
lung disease, the association of exposure with
disease will tend to be weaker if controls are
selected from a hospital population (because
smoking causes many diseases resulting in
hospitalization) than if controls are selected
from the community
Example
 In this example, hospital controls do not
represent the prevalence of exposure
(smoking) in the community from which cases
of chronic lung disease arise. The exposure-
disease association has been distorted by
selection of hospital controls
 Confounding is one of the most intriguing
biases that can occur in epidemiologic
research.
 It arises whenever an outcome has two
determinants which are themselves
associated, and one is omitted from
consideration.
 Bias involves error in the measurement of a
variable.
 Confounding involves error in the
interpretation of what may be an accurate
measurement.
 A distortion in the estimated exposure effect
that results from differences in risk between
the exposed and the unexposed that are not
due to exposure.
 Two necessary criteria for a variable to
explain such differences in risk (and hence
explain confounding) are:
 1) It must be a risk factor for the disease
among the unexposed. That is, the
association between the confounding factor
and the disease is independent of the
exposure.
 2) It must be associated with the exposure
variable.
Main Determinant Outcome
Variable Variable

Confounding
Variable
 Control of confounders can be done either of
two ways:
 1. In the design stage
 2. In the analysis stage.
In the design stage:
 Restriction
 Matching
 Randomization
In the Analysis stage:
 Stratified analysis
 Multivariate analysis
Example of Confounding

Lung CA No Lung CA

Alcoholic 60 40

Non-alcoholic 40 60
 Smoking can be a confounder in the
supposed relationship between alcohol
consumption and lung CA.
 Smoking is both related to alcoholism and
Lung CA.
 Stratified Analysis –
Among Smokers
No
LCA
LCA

Alcoho
26 24
lic
Non-
alcoho 24 26
lic
 Stratified Analysis –
Among non-smokers
No
LCA
LCA

Alcoho
26 24
lic
Non-
alcoho 23 27
lic
 Multivariate Analysis – Its concept is like that
of stratified analysis across multi variables.
 The difference between the estimate and the
parameter.
 Also known as chance error.
 Stratified Analysis –
Among non-smokers
No
LCA
LCA

Alcoho
26 24
lic
Non-
alcoho 23 27
lic
 Odds Ratio = 1.08
 Odds ratio is greater than 1 but may not be
statistically significant.
 The reason for the supposed relationship may
be due to chance error.
Macario F. Reandelar, Jr., MD, MSPH, FPAFP
At the end of the session, the learner should be able
to:
1. Differentiate the various types of research design.
2. Identify the major components of a research
design.
3. Determine the suitable/ appropriate research
design for a chosen problem.
4. Determine the appropriate statistical tools to use
given a particular research design.
 A plan or course of action the researcher takes to
solve the research problem
 NOT the method of collecting data
 Strategy or approach by which the research
questions can be answered
 How to go about in solving the problem
… describes what will be done to answer the
research question.
PLANNING AND
IMPLEMENTATION
OF RESEARCH DESIGN
1. Choice of study population
2. Selection and
classification/assignment of subjects
3. Assessment or observation of
variables
4. Processing and analysis of data
 Descriptive
 Analytic
 An inquiry into the nature of an unknown
phenomenon or the occurrence of an
event
 Seeks to know the characteristics of the
phenomenon or event and categorize it
into some descriptive variables
 It does NOT explain relationships but
seeks knowledge for better understanding
of the nature of the subject of the study
 Describes/documents the distribution of
different diseases and the groups of
populations most affected in terms of
place, person and time characteristics.

 Measures the extent of relative importance


of the different health problems within
subgroups in a community and between
communities.
 Identifies
possible determinants,
problems, risk factors which could be
used as bases for hypothesis
formulation and subject to further
analytic studies
1. Case study and case series
2. Cross-sectional studies (or
prevalence studies)
 Describe characteristics and
clinical features of a single patient
(case report) or a group of patients
with similar manifestations (case
series);
 Patients in a case series may occur
in a relatively short period of time;
 Cases of very rare conditions
 Cases which are
epidemiologically unusual
 As part of surveillance and
investigation activities
Designed to test a hypothesis of
relationship
1. Observational –
 Cross-sectional
 Case-Control
 Cohort
2. Experimental –
 Test the hypothesis of relationship
between at minimum one independent
variable (exposure) and one
dependent variable (disease)
 The relationship is normally
depicted in a two by two table,
also known as two way table, if
both the independent and
dependent variables are
qualitative and dichotomous
Disease
No Disease

D D
Exposed

E
a b
Unexposed

E
c d
 We determine whether the exposure is
related with the disease.
 Exposure must precede the disease.
 Compare the number of subjects who
develop the disease among the
exposed versus the unexposed
 Also called “survey” or Prevalence studies
 Examines the relationship between
disease and exposure as they exist in a
defined population at one particular point
in time
 Measures the prevalence of the disease
ED
ED
ED ED ED
ED ED
ED ED ED
ED ED
ED ED
ED ED
 Provide no direct estimate of risk
 Unable to establish temporality to events
(temporal ambiguity)
 We do not know which caused which, the
exposure or the disease?
 To examine the relationship between
the exposure and the disease, we
compare the prevalence of the
disease among the exposed vs the
unexposed.
Disease
No Disease

D D
Exposed

E
a b
Unexposed

E
c d
 Prevalence
of the disease among
the exposed:

a
a  b 
 Prevalence
of the disease among
the unexposed:

c
c  d 
 We then compare the prevalence
of the disease between the 2
exposure groups by getting the
ratio
 Disease Measure is called the Prevalence
Ratio (PR).
a
( a  b)
PR 
c
(c  d )
 Ifthe numerator is equal
to the denominator, the
ratio = 1, no relationship.
 1 is a null value.
 Ifthe prevalence of the
disease is higher among the
exposed than among the
unexposed, the ratio is > 1.
 The exposure variable is a risk
factor to the disease.
 Ifthe prevalence of the
disease among the unexposed
is higher than among the
exposed, the ratio is < 1, the
exposure variable is protective
of the disease.
 Tests the hypothesis of relationship
between the exposure and the disease
 We select a study population and subject
the members to a preliminary screening.
 Those who already have the outcome (the
disease) are excluded
 Those who qualify are grouped into two:
a. the exposed
b. the unexposed
 The two groups are followed up for a
given period of time to identify members
who will develop the disease (re-
examination or surveillance)
ED
ED ED ED
ED
ED ED
ED
ED ED

ED
E D
ED D
ED
ED
ED ED E
D
 Measures the incidence of the
disease.
 Compares the incidence of the
disease among the exposed vs
unexposed.
 Incidence
of the disease among
the exposed:

a
a  b 
 Incidence
of the disease among
the unexposed:

c
c  d 
 We then compare the incidence of
the disease between the 2
exposure groups by getting the
ratio
 Disease Measure is called the
Incidence Ratio.

a
( a  b)
IR 
c
(c  d )
 Other names: Risk ratio, Relative Risk
(RR)
 Measures the risk of the disease among
the exposed vs the unexposed.
 Measures the risk of the disease among
the exposed relative to the unexposed
 Provides the possibility of estimating the
attributable risk (i.e. the difference in the
incidence of the disease between the
exposed and the unexposed.
 Attributable risk (AR) or risk difference is
the difference between the incidence of
the disease in the exposed and the
unexposed.
a c
AR  
ab cd
 Used to quantify risk in the exposed
group that is attributable to the
exposure.
Medical History Without
Stroke
Hypertension Stroke
No. % No. %

Yes 67 54.9 55 45.1


No 39 20.2 154 79.8
AR  0.549  0.202
AR  0.347

 The risk attributable to the medical


history of hypertension alone is 34.7%
 A type of study that attempts to
capture the advantages of both the
cross-sectional study and the cohort
study.
 It tries to eliminate temporal
ambiguity of the cross-sectional study
while at the same time shorten the
duration of the study.
 Weselect the cases from a target
population.
 Cases are a group of individuals with the
outcome or disease
 Then select another group of individuals
without the outcome or disease as
Controls
 Go back in time (retrospective) to determine
exposure in the cases and in the control
 Compares the exposure status among the cases
and among the controls
E
E D D
D
D D
D
E D D

D D
D D
E
 Measuresthe odds of the disease.
 Compares the odds of the disease
among the exposed vs
unexposed.
 Oddsof the disease among the
exposed:

a
b
Disease
No Disease

D D
Exposed

E
a b
Unexposed

E
c d
 Oddsof the disease among the
unexposed:

c
d
Disease
No Disease

D D
Exposed

E
a b
Unexposed

E
c d
 We then compare the odds of the
disease between the 2 exposure
groups by getting the ratio
 Disease Measure is called the Odds
Ratio (OR).
a
OR  b
c
d
Medical History Stroke Control
Hypertension
No. % No. %

Yes 67 54.9 55 45.1


No 39 20.2 154 79.8
 If the study is a prevalence study or a
cross-sectional study –

Those who have history of hypertension


have prevalence of Stroke more than 2
times (PR=2.72) compared to those
who do not have history of
hypertension.
 If the study is a cohort study –

Those who have history of


hypertension have more than 2
times (RR=2.72) the risk of
developing Stroke compared to
those who do not have history of
hypertension.
 If the study is a Case-control study –

Those who have history of hypertension have


close to 5 times (OR=4.81) the odds of
developing Stroke compared to those who
do not have history of hypertension.
 If
the 95% Confidence Interval of
the measure excludes the null
value, (1), the relationship is
statistically significant.
 Provides the strongest evidence of causal
relationship of all the study designs
 Affords the most control over the study
situation
 Enables the researcher to isolate the observed
effect of the exposure or intervention
 Researcher chooses a study
population without or free of the
outcome
 Assigns the subjects into two groups
by scientific technique
 Introduces the intervention to one
group and withhold it from the other
 Of two types:
◦ True experiments – there is randomization in the
allocation of treatment
◦ Quasi experiments – no randomization
 Randomized Controlled Trials – participants
are randomly assigned to groups, usually one
study group and one control group
 The study group receives a treatment, the
control group does not.
 Treatment can be an experimental preventive
or therapeutic procedure, maneuver, or
intervention.
 Results are assessed by comparison of the
outcome in the study group and in the
control group
 Preventive Clinical Trial - used to test the
effectiveness of a preventive measure
 Intervention Clinical Trial - manages risk factors to
prevent occurrence of a disease for which a factor
is known to cause
 Therapeutic Clinical Trial - used to test the
effectiveness of treatment, drugs and procedures
to arrest disease problems
 More commonly known as Randomized Clinical Trial
 Normally conducted in four phases: Phase 1 to Phase 4
 Each phase is treated as a separate clinical trial
 Drug-development process will normally proceed through
all four phases over many years.
 A single sub therapeutic doses of the study
drug are given to a small number of subjects
(10 to 15)
 Gathers preliminary data on the drug's
pharmacodynamics (what the drug does to
the body) and pharmacokinetics (what the
body does to the drugs).
 Documents the absorption, distribution,
metabolism, and excretion (ADME) of the
drug
 Tested within a small group of people (20–80) to
evaluate safety
 Determines safe dosage ranges, and identifies side
effects
 A drug's side effects could be subtle or long term,
or may only happen with a few of people
 Thus phase 2 trials are not expected to identify all
side effects
 Aimed at establishing the efficacy of the
drug, usually against a placebo.
 Tested with a larger group of people (100–
300) to see if it is effective and to further
evaluate its safety.
 The gradual increase in sample size allows for
less common side effects to be progressively
sought
 More commonly studied as a Randomized
Controlled Clinical Trial
 Randomization is done
 The study group receives the new treatment or drug
 The control group receives an alternative treatment, more
commonly a placebo
 The control group may also receive a standard treatment
 Blinding is usually done to prevent bias in the reporting of
outcomes
◦ Single blind – only the subjects are unaware of the treatment they
receive
◦ Double blind – both the subjects and the investigator are unaware
of the treatment the subjects are getting
 Post Marketing Studies
 Serves as final confirmation of safety and efficacy
 Tested with large groups of people (1,000–3,000) to
confirm its effectiveness, monitor side effects, compare it
to commonly used treatments, and collect information that
will allow it to be used safely.
 No randomization in the allocation of treatment
 Most community trials are done using this type of study
design
 Test of efficacy of fluoridation of drinking water in
preventing tooth decay
 Two comparable cities in New York state in 1940’s
(Newburgh and Kingston) were compared for the
occurrence of tooth decay and related dental problems in
children
 Newburgh received fluoride for about one decade and
Kingston did not
 In Newburgh the incidence of dental
problems decreased by about a half
compared to the period prior to fluoridation
 In Kingston dental problems slightly
increased
 This particular study utilized a quasi-
experimental study since subjects (cities)
were assigned the treatment arbitrarily and
not randomly
 By study design
◦ Parallel group – each participant is randomly assigned to a group,
and all the participants in the group either receive or do not
receive an intervention.
◦ Cross over – over time, each participant receives (or does not
receive) an intervention in a random sequence
 By outcome
◦ can be classified as "explanatory" or "pragmatic.“
◦ Explanatory RCT’s test Efficacy in a research setting with
highly selected participants and under highly controlled
conditions.
◦ Pragmatic RCT’s test Effectiveness in everyday practice
with relatively unselected participants and under flexible
conditions; in this way, pragmatic RCT’s can "inform
decisions about practice”.
 By hypothesis
◦ categorized them as "superiority trials,“ or "non-inferiority trials,"
◦ Most RCT’s are superiority trials, in which one intervention is
hypothesized to be superior to another
◦ Some RCT’s are non-inferiority trials which determines whether a
new treatment is no worse than a reference treatment
 Can be likened to a cohort study
 Measures are called Risk Ratio (RR), Absolute
Risk Reduction (ARR), Relative Risk Reduction
(RRR) and Numbers Needed to Treat (NNT).
 Calculation for RR is the same as that for the
cohort study
Disease
No Disease

D D
Exposed

E
a b
Unexposed

E
c d
 Experimental Event Rate (EER) –
Number of events or Disease (risk) in
the experimental group.

a
EER 
ab
 Control Event Rate (CER) - Number of
events or Disease (risk) in the control
group

c
CER 
cd
 Risk Ratio – ratio of the risk of the disease in
the experimental group (EER) and the risk in
the control group (CER).

EER
RR 
CER
 Absolute Risk Reduction (ARR) – also called
the Risk difference is the difference in the
event rates for the EER and CER.

ARR  CER - EER


 Relative Risk Reduction - the difference in
event rates expressed in a proportional or
relative manner, in relation to the control
event rate.
 Often more impressive than the risk
difference.
 The lower the event rate in the control group,
the larger the difference between relative risk
reduction and risk difference.
CER - EER
RRR 
CER
ARR

CER
 Number Needed to Treat (NNT) - the number
of patients who would have to receive the
treatment for one of them to benefit.
 Calculated as 1 divided by the absolute risk
reduction.

1
NNT 
ARR
Macario F. Reandelar, Jr., MD, MSPH, FPAFP
 At the end of the session, the participants will
be able to:
 Define Sampling and key terms
 Classify Sampling
 Give different sampling methodology
 Understand concepts of the different
sampling methodology
 anact of studying or
examining only a segment of
the population to represent
the whole.
It is cheaper
It is faster
 It has better quality of
information
 It can obtain more
comprehensive data
 It
is the only possible method
for destructive procedures
 Evaluating health status of a
population
 Investigating the factors
affecting health
 Evaluating the effectiveness of
health measures
 Assessing specific aspects in
the administration of health
services
 Evaluating the reliability and
completeness of record
systems
 Population – refers to the entire group
of individuals of items of interest in
the study.
 Target Population – the group from
which representative information is
desired and to which inferences will
be made.
 Sampling Population – the population
from which a sample will actually be
taken.
 Elementary unit or Element – an object
or a person on which a measurement
is actually taken, or an observation is
made.
 Sampling unit – refers to the unit
which are chosen in selecting the
sample and may be made-up of
non-overlapping collection of
elements or elementary units.
 Sampling frame – a listing from
which one draws a sample;
actually a collection of all the
sampling units.
 Sampling error – the difference
between the value of the
parameter being estimated and
the estimate of this value based
on the different samples.
 The sample to be obtained should
be representative of the
population
 The sample size should be
adequate
 Practicality
and feasibility of the
sampling procedure
 Economy and efficiency of the
sampling design
 There are a number of basic
sampling designs that a researcher
can choose from.
 The specific design that is best for a
particular study depends upon the
nature of the variables and the
population being studied the
purpose for which the research is
undertaken, as well as the
availability of information relevant to
 Of two types:
◦ Probability and Non-probability
sampling designs
 The probability of each member
of the population to be selected in
the sample is difficult to
determine or cannot be specified.
 Types:
◦ Judgement or purposive
sampling
◦ Accident or haphazard sampling
◦ Quota sampling
◦ Snowball technique sampling
 Therules and procedures for
selecting the sample and
estimating the parameters are
explicitly and rigidly specified
 The most basic type
 It’s main characteristic is that
every element in the population
has an equal chance of being
included in the sample
 A variation of SRS
 A sampling interval ‘K’ is first
determined where ‘K’ is the ratio of
the population size (N) to the
sample size (n)
𝑁
◦𝐾=
𝑛
 The population is first divided
into non-overlapping groups
called strata.
 A simple random sampling is then
selected from each stratum.
 When a sampling frame for the
elementary units is not readily
available, or when cost
considerations are important,
cluster sampling is often resorted
to.
◦ Steps:
 The population is first divided into
clusters, which serve as the sampling
units and a sample of units is
selected.
◦ Steps:
◦ Every element found in each sampling unit drawn
as a sample may or may not be included in the
study; if only a subset of the sampling units in
the cluster is selected, we have a multi-stage
sampling design.
◦ If all the sampling units in a cluster are selected,
we have a Single-stage Cluster Sampling Design.
 When the sample survey to be
conducted has a wide coverage as
in a nationwide surveys, a multi-
stage sampling design is generally
used.
◦ Steps:
◦ The population is first divided into a
set of primary or first-stage
sampling unit. A sample of each unit
is selected.
 Steps:
 Each primary sampling unit included in
the sample is further subdivided into
secondary or second-stage sampling
units from which a sample will again be
taken.
 The procedure continues until the
desired stage is reached

You might also like