You are on page 1of 134


Chapter 1: Introduction

• The study of the distribution and determinants of health-

related states or events in specified populations and the
application of this study to the control of health problems
• It refers to the patterns of disease and the factors that
influence those patterns

• Focuses on the health and disease status of a population

• The study of how disease is distributed in populations and the

factors that influence or determine this distribution

• Is a scientific discipline that involves the study of

• The frequency and distribution of health and disease in order
to find risk factors in populations for prevention and control
• Epidemiology is the study of the determinants, distribution,
and frequency of disease (who gets the disease and why)
• Epidemiologists study sick people
• Epidemiologists study healthy people
• To determine the crucial difference between those who get the
disease and those who are spared
• Epidemiologists study exposed people

• Epidemiologists study non-exposed people

• To determine the crucial effect of the exposure

Principles of epidemiology

• Distribution
• Epidemiology is concerned with the frequency and pattern of
health events in a population.
Principles of epidemiology
• Determinants
• Epidemiology is also used to search for causes and other
factors that influence the occurrence of health-related events.
Clinical questions and epidemiology

• Frequency
How often does a disease occur?

• Risk factors
What factors are associated with an increased risk of disease?
Clinical questions and epidemiology

• Cause
What conditions lead to disease?

What are the origins of the disease?

Clinical questions and epidemiology

• Prevention

Does an intervention on well people keep disease from arising?

Does early detection and treatment improve the course of

Health outcomes

• The 5 Ds
Clinical epidemiology

• The science of making predictions about individual patients by

counting clinical events (the 5 Ds) in groups of similar patients
and using strong scientific methods to ensure that the
predictions are accurate.
Purpose of epidemiology

• To provide a basis for developing disease control and

prevention measures for groups at risk

• This translates into developing measures to prevent or control

Uses of epidemiology

• To determine, describe, and report on the natural course of

disease, disability, injury, and death
• To aid in the planning and development of health services and
• To provide administrative and planning data
• To study the cause/etiology of disease(s), or conditions,
disorders, disabilities, etc.
• To determine the primary agent responsible or ascertain
causative factors
• To determine the characteristics of the agent or causative
• To study the cause/etiology of disease(s), or conditions,
disorders, disabilities, etc.
• To determine the mode of transmission
• To determine contributing factors
• To identify and determine geographic patterns
Epidemiology refers to the patterns of disease and the
factors that influence those patterns

• Endemic definition is - belonging or native to a particular

people or country

• The usual, expected rate of disease over time

• The disease is maintained without much variation within a


• Occurrence of disease in excess of the expected rate; usually

presents in a larger geographic span than endemics
• A widespread occurrence of an infectious disease in a
community at a particular time
• Epidemiology is the study of epidemics

• A pandemic is defined as an epidemic occurring worldwide, or

over a very wide area, crossing international boundaries and
usually affecting a large number of people.
Epidemic curve

• Visual description (a curve/graph) of disease cases plotted

against time
• Classic signature of an epidemic is a “spike” in time

• Any characteristic which is subject to change and can have

more than one value
• Age
• Gender
• Height, weight
• Intelligence
• Motivation

• Dependent

• Independent


• Variable affected by the independent variable

• It responds to the independent variable

• Variable that is presumed to influence other variable
• It is the presumed cause
• Whereas the dependent variable is the presumed effect

• A research to know the effect of dietary habits in BP
• Independent variable = dietary habits
• Dependent variable = blood pressure
• You can directly manipulate dietary components in your
subjects so as to measure the change in blood pressure
Population and sample

• Population
• All people in a defined setting or with certain defined
• E.g.
• All people in Belize
• All people >65 years of age
• All people with hypertension
• General population
• Hospitalized population
• Population of patients with a specific disease

• Subset of people in a defined population

• For practical reasons
• Clinical research ordinarily carried out on a sample
• Inference based on the data generalized to the whole
• Sample must be representative of the whole population
“Epidemiology . . .

is a Greek word that means to put

people to sleep with charts and
• Clinical epidemiology is the science of making predictions about
individual patients by counting clinical events in groups of similar
patients and using strong scientific methods to ensure that the
predictions are accurate.

• The purpose of clinical epidemiology is to develop and apply

methods of clinical observation that will lead to valid conclusions
by avoiding being misled by systematic error and the play of
chance. It is to foster methods of clinical observation and
interpretation that lead to valid conclusions and better patient

• Researchers call the characteristics of patients and

clinical events variables—things that vary and can be
• 3 kinds of variables:
• Independent variable- One is a purported cause or
predictor variable
• Dependent variable - The possible effect or outcome
• Still, other variables may be part of the system
under study and may affect the relationship
between the independent and dependent
variables. These are called extraneous variables
because they are extraneous to the main question,
though perhaps very much a part of the
phenomenon under study.
Numbers and Probability
• In most clinical situations, the diagnosis, prognosis, and results of
treatment are uncertain for an individual patient. An individual will either
experience a clinical outcome or will not, and predictions can seldom be so
exact. Therefore, a prediction must be expressed as a probability. The
probability for an individual patient is best estimated by referring to past
experience with groups of similar patients—for example, that cigarette
smoking more than doubles the risk of dying among middle-aged adults
• Probability is the study of
randomness and uncertainty.

• In the early days, probability was

associated with games of chance
Simple Games Involving
Game: A fair die is rolled. If the
result is 2, 3, or 4, you win $1; if it
is 5, you win $2; but if it is 1 or 6,
you lose $3.

Should you play this game?

Populations and Samples
• Populations are all people in a defined setting (such as North
Carolina) or with certain defined characteristics (such as being
age >65 years)
• Unselected people in the community are the usual population
for epidemiologic studies of cause. On the other hand, clinical
populations include all patients with a clinical characteristic
such as all those with community acquired pneumonia or
aortic stenosis.
• Therefore, one speaks of the general population, a hospitalized
population, or a population of patients with a specific disease.

• Clinical research is ordinarily carried out on a sample

or subset of people in a defined population. One is
interested in the characteristics of the defined
population but must, for practical reasons, estimate
them by describing the characteristics of people in a
sample. One then makes an inference, a reasoned
judgment based on data, that the characteristics of
• The extent to which a sample represents its
population, and therefore is a fair substitute for
it, depends on how the sample was selected.
Methods in which every member of the
population has an equal (or known) chance of
being selected can produce samples that are
extraordinarily similar to the parent
population, at least in the long run and for
• On the other hand, samples taken
haphazardly or for convenience (i.e., by
selecting patients who are easy to work
with or happen to be visiting the clinic
when data are being collected) may
misrepresent their parent population and
be misleading.
Bias (Systematic Error)

• Bias is “a process at any stage of inference

tending to produce results that depart
systematically from the true values” . It is
“an error in the conception and design of a
study—or in the collection, analysis,
interpretation, publication, or review of
data—leading to results or conclusions that
are systematically (as opposed to
Selection Bias
• Occurs when comparisons are made between groups of
patients that differ in ways other than the main factors under
study, ones that affect the outcome of the study.
• Groups of patients often differ in many ways—age, sex, severity
of disease, the presence of other diseases, the care they
receive, and so on. If one compares the experience of two
groups that differ on a specific characteristic of interest (e.g., a
treatment or a suspected cause of disease) but are dissimilar in
these other ways and the differences are themselves related to
outcome, the comparison is biased and little can be concluded
about the independent effects of the characteristic of interest.
In the herniorrhaphy example, selection bias would have
Measurement Bias

• Occurs when the method of measurement leads to

systematically incorrect results.

• Confounding can occur when one is

trying to find out whether a factor, such
as a behavior or drug exposure, is a
cause of disease in and of itself. If the
factor of interest is associated or “travels
together” with another factor, which is
itself related to the outcome, the effect
of the factor under study can be
• Variables such as age, sex, and race are almost
always analyzed for confounding because so
many health outcomes vary according to
them. Studies that involve human behavior
(such as taking antioxidants regularly), are
especially prone to confounding because
human behavior is so complex that it is

• A given sample, even if selected without

bias, may misrepresent the situation in the
population as a whole because of chance.

• Chance can affect all the steps involved in

clinical observations
Internal and External Validity

• When making inferences about a population from

observations on a sample, clinicians need to make up
their minds about two fundamental questions.
• First, are the conclusions of the research correct for
• the people in the sample?
• Second, if so, does the sample represent fairly the
patients the clinician is most interested in, the kind of
patients in his or her practice, or perhaps a specific
patient at hand?
Internal validity

• Is the degree to which the results of a study are correct for the
sample of patients being studied. It is “internal” because it
applies to the conditions of the particular group of patients
being observed and not necessarily to others.
• The internal validity of clinical research is determined by how
well the design, data collection, and analyses are carried out,
and it is threatened by all of the biases and random variation
discussed earlier. For a clinical observation to be useful,
internal validity is a necessary but not sufficient condition.
External validity

• Is the degree to which the results of an observation hold true in other

settings. Another term for this is generalizability. For the individual
clinician, it is an answer to the question, “Assuming that the results of a
study are true, do they apply to my patients as well?” Generalizability
expresses the validity of assuming that patients in a study are similar to
other patients.
• Every study that is internally valid is generalizable to patients very much
like the ones in the study. However, an flawless study, with high internal
validity, may be totally misleading if its results are generalized to the
wrong patients.
Measures of Disease Occurrence

• To study disease, need measures of its


• Some measures of disease occurrence

– Counts
– Prevalence
– Incidence
– Mortality
Occurrence of new cases of disease, injury, or other medical
conditions over a specified time period, typically calculated as a
rate or proportion.
Examples of incident cases or events include a person developing
diabetes, becoming infected with HIV, starting to smoke, or being
admitted to the hospital.
In each of those situations, individuals transition from an
occurrence-free state to an occurrence.
• The study of incident cases provides
information about the etiology (or cause) of
a disease and its outcome. It also allows
researchers to determine the risk factors for
a disease or other medical condition.
• Incidence is a measure of risk of developing disease

Number of NEW cases in population DURING specified time

Number of persons AT RISK of disease in population during that specified time

• Often multiplied by 100,000 (or 1000 or 100) and reported as

“Incidence per 100,000”
If population size is
3.81 million, then
I  100,000
 .00017 100,000
 17.1
Incidence Denominator – Who is at risk?
Incidence – Salmonella
Incidence of cases of infection with the outbreak strain as of July 15,
2008 9pm EDT
Incidence Proportion And Incidence Rate

• Incidence can be measured as a proportion or as a rate.

• Measured as a proportion, it quantifies the risk of an
occurrence in a given time period.
• Measured as a rate, it quantifies the number of new
cases in a population over time.
• Thus, to calculate incidence, three
elements must be defined:
• (1) the number of new cases
• (2) the population at risk
• (3) the time period.
Incidence proportion
• The numerator is the number of new cases of a disease or
condition that occur during a given time period, while the
denominator is the total population at risk during the defined
study period. To accurately measure incidence proportion, all
individuals at risk for the outcome under study must be
followed during the entire study period (or until experiencing
the outcome). Because complete follow-up is required to
directly compute incidence proportion, it is usually only
calculated for studies with a short follow-up period.

• On a seven-day cruise, 84 of 2,318

passengers report to the ship’s

infirmary with gastrointestinal illness.

• The incidence of disease on the ship
would equal 84 new cases of illness
divided by 2,318 total passengers at
risk, resulting in an incidence
proportion of 4 percent over the
seven-day period.
Incidence Rate
• The incidence rate numerator is likewise the number of new
• The denominator, however, is the total person-time, or the
amount of time that all at-risk persons were observed.
• For example, the hypothetical incidence rate of breast cancer
They studied women age 40 or older and 32 women were
newly diagnosed with breast cancer from a population of 3896
persons for the year.
• Women age 40 or older equals 32 women with breast
cancer divided by 3,896 person-years (persons per year) of
follow-up, which is equivalent to 821 per 100,000 at-risk
persons per year.
• The prevalence of a disease is the proportion of individuals in a
population with disease (cases):

Number cases in population at specified time

Number of persons in population at that specified time

• Prevalence is a proportion – range of 0 to 1

• Removes the effect of total population size – makes estimates

from different populations or over time more comparable.
• Often expressed as a percent (%) – Prevalence *100

• Also often expressed as the prevalence per 1,000 or 10,000 or

• Prevalence * 1,000 = prevalence per 1,000.
Prevalence – Salmonella
Cases infected with the outbreak strain of Salmonella Saintpaul,
as of July 15, 2008 9pm EDT
Incidence Vs. Prevalence
• Incidence contrasts with prevalence, which includes both new
and existing cases.
• For example, a person who is newly diagnosed with diabetes is
__________ case, whereas a person who has had diabetes for
10 years is _________ case.
• For chronic diseases, such as diabetes, a person can have an
incident case just once in a lifetime.
• For diseases that can be resolved (e.g., the common cold), a
person can have multiple incidences over his or her lifetime.
Prevalence and Incidence – Salmonella
Cases infected with the outbreak strain of Salmonella Saintpaul,
as of July 15, 2008 9pm EDT
Incidence and Prevalence

• Incidence and prevalence measure different aspects of disease

Prevalence Incidence
Numerator: All cases, no matter
Only NEW cases
how long diseased
All persons in Only persons at risk
Denominator of disease

Measures: Presence of disease Risk of disease

Most useful: Resource allocation Risk, etiology

 An empirical property that can take on two or more values or


 Any characteristic which is subject to change and can

have more thanone value

Age Intelligence
Gender Motivation
Height, weight

 Attributes

Specific values on a variable

 E.g.the variable sexhastwo attributes: male and


 Types based on causal relationship



 Dependent
Variable affected by the independent variable
It responds to the independent variable
The presumed effect

 Independent
Variable that is presumed to influence other variable
The presumed cause

 Example
A research to know the effect of dietary habits in
 Independent variable = dietary habits
 Dependent variable = blood pressure

 You can directly manipulate dietary components in your

subjects so as to measure the change in blood pressure

 Types of variables (based on measurement)

 Usually unmeasurable

 Classified by some characteristics

 E.g.: Current Alcohol Drinking: Yes & No

 Measurable and can be expressed numerically
 E.g.: blood pressure

 Categorical variable

 Nominal variable
 Current alcohol drinking: Yes, No

 Gender: Male and Female

 Ethnicity

 Religion

 Categorical variable

 Ordinal variable
Variable expressed in order
 Income: Low, Medium, High
 Knowledge: Poor, Medium, Good

 Seriousness of disease: Severe, Moderate, Mild


 Numerical variable

 Types
Discrete variable
 Usually a
whole unit, one that cannot be fractionated or
divided up into smaller parts
 E.g.: Number of drugs consumed

 Numerical variable

 Types
Continuous variable
 Can bedivided into fractional amounts in large or small
 E.g.: weight, height, waiting time

 Measurement is the process of observing and

recording the observations that are collected as
part of a research effort.

 Scales of measurement
 Attributes are only named
 Attributes can be ordered
 Distance is meaningful
 Absolute zero
Nominal scale

 “Name" the attribute uniquely

 Puts people into categories without specifying the
relationship between the categories
 “It’s either this or that”
 No ordering of the cases is implied
 E.g.
Gender, with 2 groups (male and female)
Ordinal scale

 Attributes can be rank-ordered

 Puts people into categories and specifies the
relationship between them (quality)
 Distances between attributes do not have any
 What is not known is how different the categories
are (quantity)
Ordinal scale

 E.g.
Mr. X is taller than Mr. Y
Class rank in medical school
Olympic medals
Education coded as
 0=illiterate; 1=primary; 2=secondary; 3=higher

secondary; 4=college degree; 5=post college

 In this measure, higher numbers mean more education

Interval between values is not interpretable

Interval scale

 Distance between attributes does have meaning

 The interval between values is interpretable

 Uses a scale graded in equal increments

 Allows usto say not only that 2 things are different,

but also by how much
Interval scale

 E.g.

Scale of length:1 inch is equal to any other inch

Temperature (in Fahrenheit), distance from 30-40 is

same as distance from 70-80

Ratio scale

 There is always an absolute zero that is meaningful

 Ordersthingsand contains equal intervals, but also

has a true zero point

 Zero is a floor, i.e., you can’t go any lower


Ratio scale

 E.g.
Measuring temperature using Kelvin scale

Number of patients in past six months

 You can have zero patients and because it is
meaningful to say that "...we had twice as many
patients in the past six months as we did in the previous
six months."
• Central tendency

• Dispersion

Data that are measured on interval scales are often

presented as a figure, called a frequency distribution,
showing the number (or proportion) of a defined group of
people possessing the different values of the measurement
Two basic properties of distributions are used
to summarize them

Central tendency
The middle of the distribution

How spread out the values are
Measures of central tendency
Central tendency
• Describes a single value which attempts to describe a set of
data by identifying the central (middle) value within that set

• Often called averages

There are several valid measures

Mean (X)

• Average

• Sum of the values of the observations divided by the numbers

of observations
Median (Md)

• Point on the scale which divides a group into 2 parts (upper

and lower half)

• The measurement below which half the observations fall is

50th percentile

• Most frequently occurring value in a set of observations


Given the distribution of numbers:

• 6, 14, 9, 10, 7, 16, 7, 5, 9, 12, 10, 7, 13, 3, 6
3 5 6 6 7 7 7 9 9 10 10 12 13 14 16

• Find the mode, median, and mean


• Mode = 7

• Median = 9

• Mean = 8.9

• Most important distribution is the ‘normal’ or ‘Gaussian’ curve

• Symmetric “bell-shaped” curve, with one side the mirror image

of the other
Normal distribution
• Data near the mean are more frequent in
occurrence than data far from the mean


Skewed distributions
Not all curves are normal
• Sometimes skewed positively or negatively
Positive skew

• Tail to the right = right-skewed

• Mean towards the right of median

• Mean greater than the median

Negative skew

• Tail to the left = left-skewed

• Mean towards left of median

• Median greater than the mean

• For skewed distributions, the median is a better

Skewed distributions
Measures of variability

• Simplest measure of variability in statistics
• Difference between the highest and the lowest score
• However, is unstable and can change easily
Measures of variability
Standard deviation (S or SD)
• A more stable and more useful measure of dispersion
Standard deviation (S or SD)

• Standard measure of deviation from the mean value

• Measures the amount of variability/dispersion from the

individual data values to the mean
Standard deviation (S or SD)
To calculate the SD

1. Subtract mean from each score to obtain deviations from the mean

2. Square the deviations to make them all positive

3. Add squared deviations and divide by number of cases

4. Take the square root of this average

• To calculate the Standard Deviation
• Mean= 82+93+98+89+88 = 450/5
• 90
• Sum-
• (82-90)2 + (93-90)2 + (98-90)2 + (89-90)2 + (88-90)2
• (-8)2 + (3)2+ (8)2 + (-1)2 +(-2)2
• (64 +9 +64 +1 + 4)/4
• 5.9

You might also like