Professional Documents
Culture Documents
, MSPH, FPAFP
Objectives:
At the end of this session, the participants
will be able to:
Identify the variables as to relationship given
a particular research topic.
Give a definition/description of the variables
identified.
It is important that the data used in the
study are operationalized in measurable
terms.
There are two data types that researchers
collect – variables and concepts.
Data are capable of taking on different
values.
Usually easy to define and level or
measurement is objective, i.e., devoid of
any subjective interpretation.
They can be either qualitative or
quantitative.
Take categories for different levels of
measurement.
For example, sex is a qualitative variable
and takes two levels of categories, male and
female.
Civil status is another qualitative variable
and may be categorized as, single, married
or widowed.
Take quantitative measures.
Can either be counts or measurements.
Age for example, is a quantitative variable when
measured in years or in months.
Weight is another one when measured in kilograms
or in pounds.
Number of patients consulting in an Out Patient
Department is a countable quantitative variable.
Are mental images or perceptions
Their meaning may vary from individual to
individual.
Unlike a variable, a concept cannot be
measured objectively.
It is therefore important for a concept to be
converted into a variable.
Quality of hospital service is an example of
a concept.
Stress is another one.
Over eating is still another one.
One must be able to formulate criteria to be
able to measure the concept and transform
it to a variable.
Converting a concept to a variable needs a
validated questionnaire or a set of criteria
to make it usable in research that is devoid
of various interpretations and
measurements.
Setting an indicator or indicators or a set of
criteria to define a concept is
operationalizing the concept.
Below is an example of an
operationalization of a concept ‘Burnout’
Burnout - syndrome of exhaustion,
cynicism and low professional efficacy
characterized by emotional exhaustion (EE),
depersonalization (PD) and diminished
feelings of personal accomplishment (PA)
Burnout – measured by Maslach Burnout
Inventory (MBI) – a 22-item internationally
validated measure of burnout that examines
three domains including emotional
exhaustion, depersonalization symptoms
and level of personal accomplishment at
work, (see Questionnaire in the Appendix)
EE subscale score ranges from 0 to 54, high and
low scores are ≥27 and ≤18, respectively.
DP subscale score ranges from 0 to 30, high and
low scores are ≥10 and ≤5, respectively.
PA subscale score ranges from 0 to 48; high and
low scores are ≥40 and ≤33, respectively.
High Burnout: High scores on EE and DP
with a low score on PA
Low Burnout: Low scores on EE and DP and
a high score on PA
Average Burnout: Average scores on all
scales or not fulfilling high or low burnout
criteria
To determine the risk factors for candidemia
among critically ill neonates in terms of age,
gender, manner of delivery, birth weight and age of
gestation
Macario F. Reandelar, Jr., MD, MSPH, FPAFP
In a research when we want to look into the
relationship between an independent variable
and a dependent variable, between an
exposure and an outcome, between a risk
factor and a disease, major threats to the
validity of such a study can arise.
These threats should always be considered as
alternative explanations in the interpretation
of the results
Two error factors constitute these major
threats of a study:
(1) the systematic error (bias), and
(2) chance error or random (sampling
variability).
Any trend in the collection, analysis,
interpretation, publication or review of data
that can lead to conclusions that are
systematically different from the truth.
A systematic distortion of the
association between a determinant
and an outcome due to a deficiency in
the study design.
May arise due to:
(1) poor measurement (information bias),
(2) the sample being unrepresentative of the
target population (selection bias),
(3) the effects of other determinants on the
association of interest (confounding).
Sources of information bias:
Faulty measuring instruments
Different standards in different biochemical
laboratories
Errors by clinicians in diagnosing diseases
Bias by investigators consciously or
unconsciously reporting more favorable
results for treatments they believe in
Sources of information bias:
Biased reporting of symptoms by patients
wishing to please doctors treating them
Memory lapses by subjects in case-control
studies asked to recall past exposure to a risk
factor (recall bias)
No validated definition of a variable
Not validated questionnaire.
May be reduced or eliminated by good study
design.
Blinding of both investigators and subjects,
until after the response has been evaluated,
(double blind)
If the evaluators know the treatment
allocation but the subjects do not, the study
is said to be single blind.
Occurs when there is a systematic difference
between the characteristics of those selected
for the study and those who are not.
Occurs when the exposure status influences
the likelihood of subjects being enrolled in
the study.
Sources of selection bias
Survival and losses to follow-up differ in the
two exposure groups being compared
Volunteer and non-response bias-Individuals
who volunteer for a study may possess
different characteristics than the average
individual in the target population.
Sources of selection bias
Bias will be introduced if the association
between exposure and disease differs
between study volunteers and nonresponders
Sources of selection bias
Hospital patient bias (Berkson’s Bias) –bias may
occur when hospital controls are used in a
case-control study.
Sources of selection bias
Healthy worker effect – Generally, working
individuals are healthier than individuals who
are not working.
Sources of selection bias
In Intervention studies - it occurs when there
are systematic differences between
comparison groups in response to treatment
or prognosis.
Example
In a case-control study of smoking and chronic
lung disease, the association of exposure with
disease will tend to be weaker if controls are
selected from a hospital population (because
smoking causes many diseases resulting in
hospitalization) than if controls are selected
from the community
Example
In this example, hospital controls do not
represent the prevalence of exposure
(smoking) in the community from which cases
of chronic lung disease arise. The exposure-
disease association has been distorted by
selection of hospital controls
Confounding is one of the most intriguing
biases that can occur in epidemiologic
research.
It arises whenever an outcome has two
determinants which are themselves
associated, and one is omitted from
consideration.
Bias involves error in the measurement of a
variable.
Confounding involves error in the
interpretation of what may be an accurate
measurement.
A distortion in the estimated exposure effect
that results from differences in risk between
the exposed and the unexposed that are not
due to exposure.
Two necessary criteria for a variable to
explain such differences in risk (and hence
explain confounding) are:
1) It must be a risk factor for the disease
among the unexposed. That is, the
association between the confounding factor
and the disease is independent of the
exposure.
2) It must be associated with the exposure
variable.
Main Determinant Outcome
Variable Variable
Confounding
Variable
Control of confounders can be done either of
two ways:
1. In the design stage
2. In the analysis stage.
In the design stage:
Restriction
Matching
Randomization
In the Analysis stage:
Stratified analysis
Multivariate analysis
Example of Confounding
Lung CA No Lung CA
Alcoholic 60 40
Non-alcoholic 40 60
Smoking can be a confounder in the
supposed relationship between alcohol
consumption and lung CA.
Smoking is both related to alcoholism and
Lung CA.
Stratified Analysis –
Among Smokers
No
LCA
LCA
Alcoho
26 24
lic
Non-
alcoho 24 26
lic
Stratified Analysis –
Among non-smokers
No
LCA
LCA
Alcoho
26 24
lic
Non-
alcoho 23 27
lic
Multivariate Analysis – Its concept is like that
of stratified analysis across multi variables.
The difference between the estimate and the
parameter.
Also known as chance error.
Stratified Analysis –
Among non-smokers
No
LCA
LCA
Alcoho
26 24
lic
Non-
alcoho 23 27
lic
Odds Ratio = 1.08
Odds ratio is greater than 1 but may not be
statistically significant.
The reason for the supposed relationship may
be due to chance error.
Macario F. Reandelar, Jr., MD, MSPH, FPAFP
At the end of the session, the learner should be able
to:
1. Differentiate the various types of research design.
2. Identify the major components of a research
design.
3. Determine the suitable/ appropriate research
design for a chosen problem.
4. Determine the appropriate statistical tools to use
given a particular research design.
A plan or course of action the researcher takes to
solve the research problem
NOT the method of collecting data
Strategy or approach by which the research
questions can be answered
How to go about in solving the problem
… describes what will be done to answer the
research question.
PLANNING AND
IMPLEMENTATION
OF RESEARCH DESIGN
1. Choice of study population
2. Selection and
classification/assignment of subjects
3. Assessment or observation of
variables
4. Processing and analysis of data
Descriptive
Analytic
An inquiry into the nature of an unknown
phenomenon or the occurrence of an
event
Seeks to know the characteristics of the
phenomenon or event and categorize it
into some descriptive variables
It does NOT explain relationships but
seeks knowledge for better understanding
of the nature of the subject of the study
Describes/documents the distribution of
different diseases and the groups of
populations most affected in terms of
place, person and time characteristics.
D D
Exposed
E
a b
Unexposed
E
c d
We determine whether the exposure is
related with the disease.
Exposure must precede the disease.
Compare the number of subjects who
develop the disease among the
exposed versus the unexposed
Also called “survey” or Prevalence studies
Examines the relationship between
disease and exposure as they exist in a
defined population at one particular point
in time
Measures the prevalence of the disease
ED
ED
ED ED ED
ED ED
ED ED ED
ED ED
ED ED
ED ED
Provide no direct estimate of risk
Unable to establish temporality to events
(temporal ambiguity)
We do not know which caused which, the
exposure or the disease?
To examine the relationship between
the exposure and the disease, we
compare the prevalence of the
disease among the exposed vs the
unexposed.
Disease
No Disease
D D
Exposed
E
a b
Unexposed
E
c d
Prevalence
of the disease among
the exposed:
a
a b
Prevalence
of the disease among
the unexposed:
c
c d
We then compare the prevalence
of the disease between the 2
exposure groups by getting the
ratio
Disease Measure is called the Prevalence
Ratio (PR).
a
( a b)
PR
c
(c d )
Ifthe numerator is equal
to the denominator, the
ratio = 1, no relationship.
1 is a null value.
Ifthe prevalence of the
disease is higher among the
exposed than among the
unexposed, the ratio is > 1.
The exposure variable is a risk
factor to the disease.
Ifthe prevalence of the
disease among the unexposed
is higher than among the
exposed, the ratio is < 1, the
exposure variable is protective
of the disease.
Tests the hypothesis of relationship
between the exposure and the disease
We select a study population and subject
the members to a preliminary screening.
Those who already have the outcome (the
disease) are excluded
Those who qualify are grouped into two:
a. the exposed
b. the unexposed
The two groups are followed up for a
given period of time to identify members
who will develop the disease (re-
examination or surveillance)
ED
ED ED ED
ED
ED ED
ED
ED ED
ED
E D
ED D
ED
ED
ED ED E
D
Measures the incidence of the
disease.
Compares the incidence of the
disease among the exposed vs
unexposed.
Incidence
of the disease among
the exposed:
a
a b
Incidence
of the disease among
the unexposed:
c
c d
We then compare the incidence of
the disease between the 2
exposure groups by getting the
ratio
Disease Measure is called the
Incidence Ratio.
a
( a b)
IR
c
(c d )
Other names: Risk ratio, Relative Risk
(RR)
Measures the risk of the disease among
the exposed vs the unexposed.
Measures the risk of the disease among
the exposed relative to the unexposed
Provides the possibility of estimating the
attributable risk (i.e. the difference in the
incidence of the disease between the
exposed and the unexposed.
Attributable risk (AR) or risk difference is
the difference between the incidence of
the disease in the exposed and the
unexposed.
a c
AR
ab cd
Used to quantify risk in the exposed
group that is attributable to the
exposure.
Medical History Without
Stroke
Hypertension Stroke
No. % No. %
D D
D D
E
Measuresthe odds of the disease.
Compares the odds of the disease
among the exposed vs
unexposed.
Oddsof the disease among the
exposed:
a
b
Disease
No Disease
D D
Exposed
E
a b
Unexposed
E
c d
Oddsof the disease among the
unexposed:
c
d
Disease
No Disease
D D
Exposed
E
a b
Unexposed
E
c d
We then compare the odds of the
disease between the 2 exposure
groups by getting the ratio
Disease Measure is called the Odds
Ratio (OR).
a
OR b
c
d
Medical History Stroke Control
Hypertension
No. % No. %
D D
Exposed
E
a b
Unexposed
E
c d
Experimental Event Rate (EER) –
Number of events or Disease (risk) in
the experimental group.
a
EER
ab
Control Event Rate (CER) - Number of
events or Disease (risk) in the control
group
c
CER
cd
Risk Ratio – ratio of the risk of the disease in
the experimental group (EER) and the risk in
the control group (CER).
EER
RR
CER
Absolute Risk Reduction (ARR) – also called
the Risk difference is the difference in the
event rates for the EER and CER.
1
NNT
ARR
Macario F. Reandelar, Jr., MD, MSPH, FPAFP
At the end of the session, the participants will
be able to:
Define Sampling and key terms
Classify Sampling
Give different sampling methodology
Understand concepts of the different
sampling methodology
anact of studying or
examining only a segment of
the population to represent
the whole.
It is cheaper
It is faster
It has better quality of
information
It can obtain more
comprehensive data
It
is the only possible method
for destructive procedures
Evaluating health status of a
population
Investigating the factors
affecting health
Evaluating the effectiveness of
health measures
Assessing specific aspects in
the administration of health
services
Evaluating the reliability and
completeness of record
systems
Population – refers to the entire group
of individuals of items of interest in
the study.
Target Population – the group from
which representative information is
desired and to which inferences will
be made.
Sampling Population – the population
from which a sample will actually be
taken.
Elementary unit or Element – an object
or a person on which a measurement
is actually taken, or an observation is
made.
Sampling unit – refers to the unit
which are chosen in selecting the
sample and may be made-up of
non-overlapping collection of
elements or elementary units.
Sampling frame – a listing from
which one draws a sample;
actually a collection of all the
sampling units.
Sampling error – the difference
between the value of the
parameter being estimated and
the estimate of this value based
on the different samples.
The sample to be obtained should
be representative of the
population
The sample size should be
adequate
Practicality
and feasibility of the
sampling procedure
Economy and efficiency of the
sampling design
There are a number of basic
sampling designs that a researcher
can choose from.
The specific design that is best for a
particular study depends upon the
nature of the variables and the
population being studied the
purpose for which the research is
undertaken, as well as the
availability of information relevant to
Of two types:
◦ Probability and Non-probability
sampling designs
The probability of each member
of the population to be selected in
the sample is difficult to
determine or cannot be specified.
Types:
◦ Judgement or purposive
sampling
◦ Accident or haphazard sampling
◦ Quota sampling
◦ Snowball technique sampling
Therules and procedures for
selecting the sample and
estimating the parameters are
explicitly and rigidly specified
The most basic type
It’s main characteristic is that
every element in the population
has an equal chance of being
included in the sample
A variation of SRS
A sampling interval ‘K’ is first
determined where ‘K’ is the ratio of
the population size (N) to the
sample size (n)
𝑁
◦𝐾=
𝑛
The population is first divided
into non-overlapping groups
called strata.
A simple random sampling is then
selected from each stratum.
When a sampling frame for the
elementary units is not readily
available, or when cost
considerations are important,
cluster sampling is often resorted
to.
◦ Steps:
The population is first divided into
clusters, which serve as the sampling
units and a sample of units is
selected.
◦ Steps:
◦ Every element found in each sampling unit drawn
as a sample may or may not be included in the
study; if only a subset of the sampling units in
the cluster is selected, we have a multi-stage
sampling design.
◦ If all the sampling units in a cluster are selected,
we have a Single-stage Cluster Sampling Design.
When the sample survey to be
conducted has a wide coverage as
in a nationwide surveys, a multi-
stage sampling design is generally
used.
◦ Steps:
◦ The population is first divided into a
set of primary or first-stage
sampling unit. A sample of each unit
is selected.
Steps:
Each primary sampling unit included in
the sample is further subdivided into
secondary or second-stage sampling
units from which a sample will again be
taken.
The procedure continues until the
desired stage is reached