Lecture 2

Lecture 2
Descriptive Epidemiology
1
Lecturer
Silvia S. Martins, MD, PhD

Associate Professor of Epidemiology
Department of Epidemiology
Columbia University
2
Learning objectives
• Understand the fundamental elements of
descriptive epidemiology
• Define exposures and health indicators
• Estimate measures of association including
the risk ratio, and rate ratio, as well as risk and
rate difference
3
Definition of epidemiology
Epidemiology is the study of the distribution and
determinants of health-related states or events
in specified populations, and the application of
this study to the control of health problems.
Last JM, editor. Dictionary of epidemiology. 4th ed. New York: Oxford University Press; 2001. p. 61
4
5
6
7
8
Simplified definition of epidemiology
Population thinking
Descriptive Epidemiology
Group comparison
Causal Inference Epidemiology
9
Simplified definition of epidemiology
Population thinking
Descriptive Epidemiology
Group comparison
Causal Inference Epidemiology
10
Fundamental elements of descriptive
epidemiology
• Descriptive epidemiology is the study of the
occurrence and distribution of disease
• Analysis of disease patterns according to the
characteristics of
– Person: Who is getting the disease?
– Place: Where is the disease occurring?
– Time: How is the disease changing over time?
11
Why descriptive epidemiology?
• Provide useful information for
– Understanding the health status of a population
– Formulating hypotheses about the causes of
disease
– Planning, implementing, and evaluating public
health programs to control and prevent adverse
health events
12
How do we study the occurrence and
distribution of disease?
• Step 1: Define the population of interest
• Step 2: Define cases
• Step 3: Calculate measures of disease occurrence
that allows you to understand what you want to
know and why
– A fundamental role of descriptive epidemiology is
measurement of the frequency of disease occurrence
and the frequency of death from disease.
– In many cases, this is a difficult endeavor!
13
Step 1: Define the population of
interest
• Epidemiology is concerned with understanding
the health of populations.
• Populations of interest can be defined by
geography, space, time or by characteristics of
participants or of events of interest.
• Example: Do we want to know what the
prevalence of diabetes is within New York City?
New York State? The United States? Among white
women over 40? Immigrant Mexicans? In 1980?
2000? 2015?
14
Step 1: Define the population of
interest
• Regardless of eligibility criteria, population
may be dynamic or stationary.
– Dynamic
– Stationary
• Careful definition of the source population
from which we conduct an epidemiologic
study underlies many of the core methods in
epidemiology.
15
Sources of population data
• Census: a periodic survey of every person in a
population
– US: www.census.gov
– NYC: http://maps.nyc.gov/census/
• Vital statistics
– NYC: http://www.nyc.gov/html/doh/html/data/vs.shtml
• Health statistics: based on surveys of a
representative sample of the population and other
data collection systems
– US: http://www.cdc.gov/nchs/
16
17
Step 2: Define cases
• Case definition is often based on a combination of:
– Clinical criteria
• History: report of physical symptoms
• Physical examination: fever, high pulse rate
– Laboratory criteria
• Diagnostic test results
• Note that disease definitions may change over
time as we learn more about a disease and its
various manifestations, or laboratory diagnosis
improves.
18
Examples of health-related outcomes
• Death
• Disease/illness
– Physical signs
– Laboratory abnormalities
• Discomfort/symptoms
– Pain, nausea, itching
• Disability
– Impaired ability to perform usual activities
• Dissatisfaction/emotional reaction
– Sadness, anger, hopelessness
19
Step 3: Understand what you want to
know and why
• Main types of measures in disease frequency:
– Counts tell us the number of people with a
disease.
– Proportions tell us what fraction of the population
is affected (numerator is subset of denominator).
– Rates tell us how fast the disease is occurring in a
population (always has time in the denominator).
– Ratios give us information about which groups are
at higher risk of disease than other groups.
20
Example: Tuberculosis in New York City
• Tuberculosis is a reportable condition:
– All diagnosed cases must be reported to the
department of health.
• In 2011, there were 689 cases of tuberculosis in
New York City.
Is this information useful?

No! We need to more carefully qualify the
numerator, and we need a denominator.
21
Counts
• Counts provide an absolute number of the
burden of disease
• However, counts are of limited utility, for two
reasons:
– The burden of disease in the population is very
different if the population size is 100,000 versus
1,000,000.
– Some people are not at risk for developing a new
onset of tuberculosis in 2011 (due to pre-existing
infection), thus we need to know not only the size of
the total population, but the size of the total
population at risk.
22
Prevalence and Incidence
• Two measures overcome many of the
limitations of a simple count of cases:
incidence and prevalence.
• Prevalence tells us about the proportion of
cases among the total population at any given
time.
• Incidence tells us the probability of a new
onset of disease among those at risk for
developing the illness.
23
Prevalence 百分比
Number of cases (existing and new)

Total population
Over a specified period of time
• The time period should be specified as much as
possible.
– For example, when we say “in Year 2” we mean over
the duration of time that spanned Year 2.
• As with all measures of occurrence, prevalence is
dependent upon the population of interest.
24
Prevalence of smoking in the
United States, 2012
Males Females
2012 22.2% 17.9%
Dwyer-Lindgren et al. Population Health Metrics 2014, 12:5

http://www.pophealthmetrics.com/content/12/1/5
25
Prevalence of smoking in the
United States, 2012
Males, range: 9.9-41.5% Females, range: 5.8-40.8%
Dwyer-Lindgren et al. Population Health Metrics 2014, 12:5

http://www.pophealthmetrics.com/content/12/1/5
26
Prevalence is dependent on a time period
Year 1, 5 individuals
developed the outcome
Year 2, an additional 7
people developed the
outcome
outcome
27
What is the prevalence of disease in
Year 2?
• What is the numerator?
– 5 cases in Year 1 + 7 cases in Year 2 = 12 cases
• What is the denominator?
– Total sample size = 30 people
• Prevalence = 12/30 = 0.40
The prevalence of disease in Year 2 is 40%.
28
What is the prevalence of disease in
Year 3?
– 5 cases in Year 1 + 7 cases in Year 2 + 4 cases in
Year 3 = 16 cases
– Total sample size = 30 people
• Prevalence = 16/30 = 0.533
The prevalence of disease in Year 3 is 53.3%.
29
Point Prevalence vs. Period Prevalence
• Point prevalence: a “snapshot” of the
proportion of the population with existing
cases at a given point in time
• Period prevalence: the proportion of the
population with existing cases during a period
of time—includes existing cases + new cases
30
Incidence
Number of new cases
Total population at risk of becoming a new case
应该是这一年的
• Perhaps the most widely used tool in
epidemiology
• Goes by many names. Most common alternative
name is “risk”, “cumulative incidence”, and less
commonly, “incidence proportion”
• The time period should again be specified as
much as possible
31
Incidence is also dependent on a time period
Year 1, 5 individuals
developed the outcome
outcome
outcome
32
What is the incidence of disease in
Year 2?
– 7 new cases in Year 2
– 25 people at risk (5 people already developed the
disease in Year 1 and are thus not at risk)
• Incidence = 7/25 = 0.28
The incidence (risk) of disease in Year 2 is

28%.
33
What is the incidence of disease in
Years 2 and 3?
– 7 new cases in Year 2 + 4 new cases in Year 3 = 11 new
cases
– 25 people at risk (5 people already developed the
disease in Year 1 and are thus not at risk)
• Incidence = 11/25 = 0.44
The incidence (risk) of disease in Years 2 and 3 is

44%.
34
Other measures of occurrence—
don’t be tricked!
• Mortality rate
• Case fatality rate These are all called
• Attack rate rates but are really
risks, or incidence
proportions.
35
Mortality Rate
Number of deaths
Total population
• The mortality rate is sometimes referred to as the
crude death rate
• Example: In the US in 2014, the suicide rate among
those age 85 or older was 19.3 per 100,000 individuals.
• Related measures
– Cause-specific mortality rate: number of deaths from a
particular disease / total population
– Birth rate: number of births / total population
36
Case Fatality Rate
Number of deaths
Number of cases with disease
• Seldom accompanied by a specific time period
• Example: case fatality rate of measles in the US:
1.5/1000 cases
• Related measure
– Survival rate: number of living cases / number of cases
with disease
37
Attack rate
Number of new cases
Number of persons at risk
• Used for outbreaks of infectious disease

• Example: In an outbreak of salmonella food
poisoning, 27 of the 135 people who ate
potato salad became ill over a one-week
period. What is the attack rate? 20%
38
Synonyms for Incidence
• Cumulative incidence
• Incidence proportion
• Risk
• Attack rate
But, Incidence ≠ Incidence Rate
39
Incidence
• We have learned that “incidence” or “risk” is
calculated as the number of new cases over the
population at risk of becoming a new case.
• Incidence is an accurate representation of a
sample’s experience of health and disease when
we have complete follow-up of a sample; i.e., in a
stationary population.
• That is, each individual is observed at every
measurement time point from the beginning of
the study to the end.
40
Example: Alcohol consumption and
liver cirrhosis
• Suppose we conduct a study to estimate the
association between heavy alcohol
consumption and liver cirrhosis.
– We follow 20 people over time.
– 10 are heavy alcohol consumers.
• First, let’s imagine that we had complete
follow-up data on all people in the study.
41
Disease incidence over time by population exposure
Incidence over 0.65
four time points = 13/20 = or 65%
42
Example: Alcohol consumption and
liver cirrhosis
• Now, let us imagine that we lost some people
over time.
• Thus, we do not know whether these
individuals became diseased or not.
43
Loss to follow-up in a sample over time
44
Incidence when there is loss to follow-up
• We know that the true incidence is 65%
• If we only analyzed the data based on who was present
at the end of the study, we would estimate incidence
as:
– 9/15 = 0.60 or 60%
• If we assumed that individuals who dropped out did
not become diseased we would get:
– 9/20 = 0.45 or 45%
• If we assumed that individuals who dropped out did
become diseased we would get:
– 14/20 = 0.70 or 70%
• There is one more option: a rate
45
Incidence rate
• Incidence rates are commonly used in
prospective studies in which some people are lost
over time.
• To estimate a rate over the time frame of the
study, we need to know how much total time
each person contributed to the study follow-up
before they either developed the outcome or
dropped out.
• We term the total time that each person
contributed as “person-time.”
46
Incidence rate
Number of new cases
Total person-time at risk
• The incidence rate refers to the number of

new cases divided by the person-time at risk
contributed by members of the study.
47
Proportion vs. rate: What’s the
difference?
• A proportion can range from 0 to 100; the numerator and
denominator are both counts and the numerator is contained in
the denominator.
• A rate can range from 0 to infinity; the numerator is a count
while the denominator is a unit of time.
• A rate can be conceptualized as a measure of speed.
– Example: Miles per hour
• Incidence rates can be conceptualized as the speed at which
disease is occurring in cases per unit of person-time.
• When we have complete follow-up of a sample or a population,
the rate can approximate the proportion (“risk”) of disease.
48
Person-time
• Individuals may be exposed to the risk of an
event for varying amount of times during a total
time period if they
– Enter the time period earlier or later
– Experience the event of interest
• Person-time is the sum of the individual units of
time that people have been exposed to the risk of
an event
• Units of time can be anything (days, weeks,
months, years)
49
Understanding person-time
Person 2 stayed in the study all
40 years and did not develop the
outcome.
Person 10 dropped out of the

study at Year 30.
Person 19 developed the

outcome at Year 10.
50
Understanding person-time
Table: Person-time and disease status among 20 subjects followed for forty years
Subject Years Contributed Developed Disease?
(1 = yes, 0 = no)
1 30 1
2 40 0
3 20 0
4 20 1
5 40 0
6 40 1
7 20 0
8 40 0
9 20 0
10 30 0
11 20 1
12 30 1
13 40 0
14 10 0
15 10 1
16 40 0
17 40 0
18 40 0
19 10 1
20 20 1
51
Calculating incidence rate
• The numerator is the number of new cases
• The denominator is the total person-time
• In our example:
Incidence rate = 8/560 = 0.014, or 14 cases per
1,000 person-years
52
Calculating incidence rate
• The incidence rate can be interpreted as the number of
expected cases in every set of 1,000 person-years.
• That is, if we were to observe 1,000 people for 1 year, we
would expect 14 cases.
• If we were to observe 500 people for 2 years, we would still
expect 14 cases.
• The assumption underlying this is that the incidence rate is
constant over time, so for every year in which 1,000
person-years are observed an additional 14 cases will be
expected.
• Given this assumption, the incidence rate tells us the
average number of cases per a specified set of person-time.
53
Understanding incidence rate and
prevalence: the bathtub example
54
Understanding incidence rate and
prevalence
Conceptually:
Prevalence ≈ Incidence Rate*Duration
Mathematically:
P/(1-P)=IR*D
55
Examples of the relation between
incidence rate and prevalence
• High incidence rate, steady prevalence
– Example: highly contagious infectious disease with
very short duration or a high case fatality rate
• Low incidence rate, high prevalence
– Examples: diseases with long duration such as
arthritis, diabetes, Crohn’s disease, and other
chronic illnesses
56
Example of the relation between
incidence rate and prevalence
Impact of a new treatment that prolongs life with the disease but does not cure it
People Living with HIV

New HIV Infections
57
Summary of the relation between
incidence and prevalence
• Prevalence is affected by incidence rate and
duration
• If a disease has short duration,
– Prevalence = incidence rate
• If a disease has long duration, in general,
– Prevalence > incidence rate
58
Conditional risks
• We can “condition” a risk estimate by other factors to
begin to examine whether certain factors are
associated with increased or decreased risk.
• Let’s return to our earlier example of alcohol

consumption an liver cirrhosis.
• In order to estimate whether heavy drinkers have a

different incidence of cirrhosis compared with non-
heavy drinkers, we measure the conditional risk in each
subgroup.
59
Conditional vs. marginal risks
Marginal risk of cirrhosis among all study subjects = 13/20 = 65%

Conditional risk of cirrhosis among heavy drinkers = 8/10 = 80%
Conditional risk of cirrhosis among non-heavy drinkers = 5/10 = 50%
60
Conditional risks
• It appears that heavy drinkers have a higher
incidence of cirrhosis compared with non-
heavy drinkers. Next week we will learn how
to quantify this.
• Building these 2x2 tables crossing exposure

with disease and using these 2x2 tables to
estimate associations will become a building
block of epidemiology.
61
Measures of association
• Risk ratio
• Risk difference
• Rate ratio
• Rate difference
• Odds ratio (will learn in Lecture 6)
62
Risk ratio
Numerator a
Risk of disease in exposed a+b
=
Denominator c
Risk of disease in unexposed c+d
63
Risk ratio interpretation
• Ratios > 1.0 indicate risk is higher among
exposed than unexposed
• Ratios = 1.0 indicate no association
• Ratios < 1.0 indicate risk is lower among
Example: The risk of CHD among women taking Hormone
Replacement Therapy (HRT) is 1.23 times the risk of CHD
among women who do not take HRT over 10 years.
64
Risk difference
Difference between two risks =
Interpretation: Excess risk due to the exposure

Example: There is an excess risk of 7 cases of CHD per 1000
women attributable to HRT use over 10 years.
65
Rate ratio
Numerator
Rate of disease in exposed
=
Denominator
Rate of disease in unexposed
66
Rate ratio interpretation
Similar to risk ratio
• Ratios > 1.0 indicate rate is higher among
• Ratios = 1.0 indicate no association
• Ratios < 1.0 indicate rate is lower among
Example: The rate of HIV transmission within two years to
infants who were breastfed was 2.53 times the rate of HIV
transmission to infants who were formula-fed.
67
Rate difference
• Difference between two rates
Interpretation: Similar to risk difference; excess rate due to

the exposure
Example: The excess rate of seroconversion within two
years attributable to breastfeeding compared to formula
feeding was 25.6 per 100 person-years of observation.
68
Risk/rate differences
Risk/rate ratios
• Difference measures (risk / rate difference)
provide a measure of the potential direct public
health benefit of intervention:
– How many amputations can we prevent if we build a
new dialysis center?
– How many cases of flu can we prevent if we provide
vaccines in schools?
• Ratio measures (risk / rate / odds ratio) provide
an intuitive summary of the magnitude of the
difference between the effects of the exposed
and unexposed, or the strength of the association
between the exposure and outcome.
69
What have we learned?
• Measures of disease occurrence and frequency in
epidemiology are the cornerstone of how we
build the science of population health.
• Today we have learned about:
– Incidence/risk, prevalence, incidence rates
– Incidence rate = prevalence when disease is short in
duration
– Incidence rates are more appropriate than incidence
proportions when there are losses to follow-up
70
What have we learned?
• Measures of occurrence:
– Prevalence (point and period)
– Incidence (a.k.a., incidence proportion, risk)
– Incidence rate
• Measures of association:
– Risk ratio
– Risk difference
– Rate ratio
– Rate difference
71
Questions?
72
Thank you!
73

Lecture 2 - Descriptive Epidemiology - Silvia Martins-1

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Lecture 2 - Descriptive Epidemiology - Silvia Martins-1

Uploaded by

Copyright:

Available Formats

Silvia S. Martins, MD, PhD

Is this information useful?

Number of cases (existing and new)

Dwyer-Lindgren et al. Population Health Metrics 2014, 12:5

Males, range: 9.9-41.5% Females, range: 5.8-40.8%

Dwyer-Lindgren et al. Population Health Metrics 2014, 12:5

The prevalence of disease in Year 2 is 40%.

The prevalence of disease in Year 3 is 53.3%.

The incidence (risk) of disease in Year 2 is

The incidence (risk) of disease in Years 2 and 3 is

• Used for outbreaks of infectious disease

But, Incidence ≠ Incidence Rate

• The incidence rate refers to the number of

Person 10 dropped out of the

Person 19 developed the

People Living with HIV

• Let’s return to our earlier example of alcohol

• In order to estimate whether heavy drinkers have a

Marginal risk of cirrhosis among all study subjects = 13/20 = 65%

• Building these 2x2 tables crossing exposure

Difference between two risks =

Interpretation: Excess risk due to the exposure

Interpretation: Similar to risk difference; excess rate due to

You might also like