You are on page 1of 73

Lecture 2

Descriptive Epidemiology

1
Lecturer

Silvia S. Martins, MD, PhD


Associate Professor of Epidemiology
Department of Epidemiology
Columbia University

2
Learning objectives
• Understand the fundamental elements of
descriptive epidemiology
• Define exposures and health indicators
• Estimate measures of association including
the risk ratio, and rate ratio, as well as risk and
rate difference

3
Definition of epidemiology
Epidemiology is the study of the distribution and
determinants of health-related states or events
in specified populations, and the application of
this study to the control of health problems.

Last JM, editor. Dictionary of epidemiology. 4th ed. New York: Oxford University Press; 2001. p. 61

4
Definition of epidemiology
Epidemiology is the study of the distribution and
determinants of health-related states or events
in specified populations, and the application of
this study to the control of health problems.

Last JM, editor. Dictionary of epidemiology. 4th ed. New York: Oxford University Press; 2001. p. 61
5
Definition of epidemiology
Epidemiology is the study of the distribution and
determinants of health-related states or events
in specified populations, and the application of
this study to the control of health problems.

Last JM, editor. Dictionary of epidemiology. 4th ed. New York: Oxford University Press; 2001. p. 61
6
Definition of epidemiology
Epidemiology is the study of the distribution and
determinants of health-related states or events
in specified populations, and the application of
this study to the control of health problems.

Last JM, editor. Dictionary of epidemiology. 4th ed. New York: Oxford University Press; 2001. p. 61
7
Definition of epidemiology
Epidemiology is the study of the distribution and
determinants of health-related states or events
in specified populations, and the application of
this study to the control of health problems.

Last JM, editor. Dictionary of epidemiology. 4th ed. New York: Oxford University Press; 2001. p. 61
8
Simplified definition of epidemiology

Population thinking
Descriptive Epidemiology

Group comparison
Causal Inference Epidemiology

9
Simplified definition of epidemiology

Population thinking
Descriptive Epidemiology

Group comparison
Causal Inference Epidemiology

10
Fundamental elements of descriptive
epidemiology
• Descriptive epidemiology is the study of the
occurrence and distribution of disease
• Analysis of disease patterns according to the
characteristics of
– Person: Who is getting the disease?
– Place: Where is the disease occurring?
– Time: How is the disease changing over time?

11
Why descriptive epidemiology?
• Provide useful information for
– Understanding the health status of a population
– Formulating hypotheses about the causes of
disease
– Planning, implementing, and evaluating public
health programs to control and prevent adverse
health events

12
How do we study the occurrence and
distribution of disease?
• Step 1: Define the population of interest
• Step 2: Define cases
• Step 3: Calculate measures of disease occurrence
that allows you to understand what you want to
know and why
– A fundamental role of descriptive epidemiology is
measurement of the frequency of disease occurrence
and the frequency of death from disease.
– In many cases, this is a difficult endeavor!

13
Step 1: Define the population of
interest
• Epidemiology is concerned with understanding
the health of populations.
• Populations of interest can be defined by
geography, space, time or by characteristics of
participants or of events of interest.
• Example: Do we want to know what the
prevalence of diabetes is within New York City?
New York State? The United States? Among white
women over 40? Immigrant Mexicans? In 1980?
2000? 2015?

14
Step 1: Define the population of
interest
• Regardless of eligibility criteria, population
may be dynamic or stationary.
– Dynamic
– Stationary
• Careful definition of the source population
from which we conduct an epidemiologic
study underlies many of the core methods in
epidemiology.

15
Sources of population data
• Census: a periodic survey of every person in a
population
– US: www.census.gov
– NYC: http://maps.nyc.gov/census/
• Vital statistics
– NYC: http://www.nyc.gov/html/doh/html/data/vs.shtml
• Health statistics: based on surveys of a
representative sample of the population and other
data collection systems
– US: http://www.cdc.gov/nchs/

16
17
Step 2: Define cases
• Case definition is often based on a combination of:
– Clinical criteria
• History: report of physical symptoms
• Physical examination: fever, high pulse rate
– Laboratory criteria
• Diagnostic test results
• Note that disease definitions may change over
time as we learn more about a disease and its
various manifestations, or laboratory diagnosis
improves.

18
Examples of health-related outcomes
• Death
• Disease/illness
– Physical signs
– Laboratory abnormalities
• Discomfort/symptoms
– Pain, nausea, itching
• Disability
– Impaired ability to perform usual activities
• Dissatisfaction/emotional reaction
– Sadness, anger, hopelessness

19
Step 3: Understand what you want to
know and why
• Main types of measures in disease frequency:
– Counts tell us the number of people with a
disease.
– Proportions tell us what fraction of the population
is affected (numerator is subset of denominator).
– Rates tell us how fast the disease is occurring in a
population (always has time in the denominator).
– Ratios give us information about which groups are
at higher risk of disease than other groups.

20
Example: Tuberculosis in New York City
• Tuberculosis is a reportable condition:
– All diagnosed cases must be reported to the
department of health.
• In 2011, there were 689 cases of tuberculosis in
New York City.

Is this information useful?


No! We need to more carefully qualify the
numerator, and we need a denominator.
21
Counts
• Counts provide an absolute number of the
burden of disease
• However, counts are of limited utility, for two
reasons:
– The burden of disease in the population is very
different if the population size is 100,000 versus
1,000,000.
– Some people are not at risk for developing a new
onset of tuberculosis in 2011 (due to pre-existing
infection), thus we need to know not only the size of
the total population, but the size of the total
population at risk.

22
Prevalence and Incidence
• Two measures overcome many of the
limitations of a simple count of cases:
incidence and prevalence.
• Prevalence tells us about the proportion of
cases among the total population at any given
time.
• Incidence tells us the probability of a new
onset of disease among those at risk for
developing the illness.

23
Prevalence 百分比

Number of cases (existing and new)


Total population
Over a specified period of time
• The time period should be specified as much as
possible.
– For example, when we say “in Year 2” we mean over
the duration of time that spanned Year 2.
• As with all measures of occurrence, prevalence is
dependent upon the population of interest.

24
Prevalence of smoking in the
United States, 2012

Males Females
2012 22.2% 17.9%

Dwyer-Lindgren et al. Population Health Metrics 2014, 12:5


http://www.pophealthmetrics.com/content/12/1/5

25
Prevalence of smoking in the
United States, 2012

Males, range: 9.9-41.5% Females, range: 5.8-40.8%

Dwyer-Lindgren et al. Population Health Metrics 2014, 12:5


http://www.pophealthmetrics.com/content/12/1/5

26
Prevalence is dependent on a time period

Year 1, 5 individuals
developed the outcome

Year 2, an additional 7
people developed the
outcome

Year 3, an additional 4
people developed the
outcome

27
What is the prevalence of disease in
Year 2?
• What is the numerator?
– 5 cases in Year 1 + 7 cases in Year 2 = 12 cases
• What is the denominator?
– Total sample size = 30 people
• Prevalence = 12/30 = 0.40

The prevalence of disease in Year 2 is 40%.

28
What is the prevalence of disease in
Year 3?
• What is the numerator?
– 5 cases in Year 1 + 7 cases in Year 2 + 4 cases in
Year 3 = 16 cases
• What is the denominator?
– Total sample size = 30 people
• Prevalence = 16/30 = 0.533

The prevalence of disease in Year 3 is 53.3%.

29
Point Prevalence vs. Period Prevalence
• Point prevalence: a “snapshot” of the
proportion of the population with existing
cases at a given point in time
• Period prevalence: the proportion of the
population with existing cases during a period
of time—includes existing cases + new cases

30
Incidence
Number of new cases
Total population at risk of becoming a new case
Over a specified period of time
应该是这一年的
• Perhaps the most widely used tool in
epidemiology
• Goes by many names. Most common alternative
name is “risk”, “cumulative incidence”, and less
commonly, “incidence proportion”
• The time period should again be specified as
much as possible
31
Incidence is also dependent on a time period

Year 1, 5 individuals
developed the outcome

Year 2, an additional 7
people developed the
outcome

Year 3, an additional 4
people developed the
outcome

32
What is the incidence of disease in
Year 2?
• What is the numerator?
– 7 new cases in Year 2
• What is the denominator?
– 25 people at risk (5 people already developed the
disease in Year 1 and are thus not at risk)
• Incidence = 7/25 = 0.28

The incidence (risk) of disease in Year 2 is


28%.

33
What is the incidence of disease in
Years 2 and 3?
• What is the numerator?
– 7 new cases in Year 2 + 4 new cases in Year 3 = 11 new
cases
• What is the denominator?
– 25 people at risk (5 people already developed the
disease in Year 1 and are thus not at risk)
• Incidence = 11/25 = 0.44

The incidence (risk) of disease in Years 2 and 3 is


44%.

34
Other measures of occurrence—
don’t be tricked!
• Mortality rate
• Case fatality rate These are all called
• Attack rate rates but are really
risks, or incidence
proportions.

35
Mortality Rate
Number of deaths
Total population
Over a specified period of time
• The mortality rate is sometimes referred to as the
crude death rate
• Example: In the US in 2014, the suicide rate among
those age 85 or older was 19.3 per 100,000 individuals.
• Related measures
– Cause-specific mortality rate: number of deaths from a
particular disease / total population
– Birth rate: number of births / total population

36
Case Fatality Rate
Number of deaths
Number of cases with disease
• Seldom accompanied by a specific time period
• Example: case fatality rate of measles in the US:
1.5/1000 cases
• Related measure
– Survival rate: number of living cases / number of cases
with disease

37
Attack rate
Number of new cases
Number of persons at risk
Over a specified period of time

• Used for outbreaks of infectious disease


• Example: In an outbreak of salmonella food
poisoning, 27 of the 135 people who ate
potato salad became ill over a one-week
period. What is the attack rate? 20%
38
Synonyms for Incidence
• Cumulative incidence
• Incidence proportion
• Risk
• Attack rate

But, Incidence ≠ Incidence Rate

39
Incidence
• We have learned that “incidence” or “risk” is
calculated as the number of new cases over the
population at risk of becoming a new case.
• Incidence is an accurate representation of a
sample’s experience of health and disease when
we have complete follow-up of a sample; i.e., in a
stationary population.
• That is, each individual is observed at every
measurement time point from the beginning of
the study to the end.

40
Example: Alcohol consumption and
liver cirrhosis
• Suppose we conduct a study to estimate the
association between heavy alcohol
consumption and liver cirrhosis.
– We follow 20 people over time.
– 10 are heavy alcohol consumers.
• First, let’s imagine that we had complete
follow-up data on all people in the study.

41
Disease incidence over time by population exposure
Incidence over 0.65
four time points = 13/20 = or 65%

42
Example: Alcohol consumption and
liver cirrhosis
• Now, let us imagine that we lost some people
over time.
• Thus, we do not know whether these
individuals became diseased or not.

43
Loss to follow-up in a sample over time

44
Incidence when there is loss to follow-up
• We know that the true incidence is 65%
• If we only analyzed the data based on who was present
at the end of the study, we would estimate incidence
as:
– 9/15 = 0.60 or 60%
• If we assumed that individuals who dropped out did
not become diseased we would get:
– 9/20 = 0.45 or 45%
• If we assumed that individuals who dropped out did
become diseased we would get:
– 14/20 = 0.70 or 70%
• There is one more option: a rate
45
Incidence rate
• Incidence rates are commonly used in
prospective studies in which some people are lost
over time.
• To estimate a rate over the time frame of the
study, we need to know how much total time
each person contributed to the study follow-up
before they either developed the outcome or
dropped out.
• We term the total time that each person
contributed as “person-time.”

46
Incidence rate
Number of new cases
Total person-time at risk

• The incidence rate refers to the number of


new cases divided by the person-time at risk
contributed by members of the study.

47
Proportion vs. rate: What’s the
difference?
• A proportion can range from 0 to 100; the numerator and
denominator are both counts and the numerator is contained in
the denominator.
• A rate can range from 0 to infinity; the numerator is a count
while the denominator is a unit of time.
• A rate can be conceptualized as a measure of speed.
– Example: Miles per hour
• Incidence rates can be conceptualized as the speed at which
disease is occurring in cases per unit of person-time.
• When we have complete follow-up of a sample or a population,
the rate can approximate the proportion (“risk”) of disease.

48
Person-time
• Individuals may be exposed to the risk of an
event for varying amount of times during a total
time period if they
– Enter the time period earlier or later
– Experience the event of interest
• Person-time is the sum of the individual units of
time that people have been exposed to the risk of
an event
• Units of time can be anything (days, weeks,
months, years)
49
Understanding person-time
Person 2 stayed in the study all
40 years and did not develop the
outcome.

Person 10 dropped out of the


study at Year 30.

Person 19 developed the


outcome at Year 10.

50
Understanding person-time
Table: Person-time and disease status among 20 subjects followed for forty years
Subject Years Contributed Developed Disease?
(1 = yes, 0 = no)
1 30 1
2 40 0
3 20 0
4 20 1
5 40 0
6 40 1
7 20 0
8 40 0
9 20 0
10 30 0
11 20 1
12 30 1
13 40 0
14 10 0
15 10 1
16 40 0
17 40 0
18 40 0
19 10 1
20 20 1
51
Calculating incidence rate
• The numerator is the number of new cases
• The denominator is the total person-time

• In our example:
Incidence rate = 8/560 = 0.014, or 14 cases per
1,000 person-years

52
Calculating incidence rate
• The incidence rate can be interpreted as the number of
expected cases in every set of 1,000 person-years.
• That is, if we were to observe 1,000 people for 1 year, we
would expect 14 cases.
• If we were to observe 500 people for 2 years, we would still
expect 14 cases.
• The assumption underlying this is that the incidence rate is
constant over time, so for every year in which 1,000
person-years are observed an additional 14 cases will be
expected.
• Given this assumption, the incidence rate tells us the
average number of cases per a specified set of person-time.

53
Understanding incidence rate and
prevalence: the bathtub example

54
Understanding incidence rate and
prevalence

Conceptually:
Prevalence ≈ Incidence Rate*Duration

Mathematically:
P/(1-P)=IR*D
55
Examples of the relation between
incidence rate and prevalence
• High incidence rate, steady prevalence
– Example: highly contagious infectious disease with
very short duration or a high case fatality rate
• Low incidence rate, high prevalence
– Examples: diseases with long duration such as
arthritis, diabetes, Crohn’s disease, and other
chronic illnesses

56
Example of the relation between
incidence rate and prevalence
Impact of a new treatment that prolongs life with the disease but does not cure it

People Living with HIV


New HIV Infections

57
Summary of the relation between
incidence and prevalence
• Prevalence is affected by incidence rate and
duration
• If a disease has short duration,
– Prevalence = incidence rate
• If a disease has long duration, in general,
– Prevalence > incidence rate

58
Conditional risks
• We can “condition” a risk estimate by other factors to
begin to examine whether certain factors are
associated with increased or decreased risk.

• Let’s return to our earlier example of alcohol


consumption an liver cirrhosis.

• In order to estimate whether heavy drinkers have a


different incidence of cirrhosis compared with non-
heavy drinkers, we measure the conditional risk in each
subgroup.
59
Conditional vs. marginal risks

Marginal risk of cirrhosis among all study subjects = 13/20 = 65%


Conditional risk of cirrhosis among heavy drinkers = 8/10 = 80%
Conditional risk of cirrhosis among non-heavy drinkers = 5/10 = 50%
60
Conditional risks
• It appears that heavy drinkers have a higher
incidence of cirrhosis compared with non-
heavy drinkers. Next week we will learn how
to quantify this.

• Building these 2x2 tables crossing exposure


with disease and using these 2x2 tables to
estimate associations will become a building
block of epidemiology.

61
Measures of association
• Risk ratio
• Risk difference
• Rate ratio
• Rate difference
• Odds ratio (will learn in Lecture 6)

62
Risk ratio

Numerator a
Risk of disease in exposed a+b
=
Denominator c
Risk of disease in unexposed c+d

63
Risk ratio interpretation
• Ratios > 1.0 indicate risk is higher among
exposed than unexposed
• Ratios = 1.0 indicate no association
• Ratios < 1.0 indicate risk is lower among
exposed than unexposed
Example: The risk of CHD among women taking Hormone
Replacement Therapy (HRT) is 1.23 times the risk of CHD
among women who do not take HRT over 10 years.

64
Risk difference

Difference between two risks =

Interpretation: Excess risk due to the exposure


Example: There is an excess risk of 7 cases of CHD per 1000
women attributable to HRT use over 10 years.

65
Rate ratio

Numerator
Rate of disease in exposed
=
Denominator
Rate of disease in unexposed

66
Rate ratio interpretation
Similar to risk ratio
• Ratios > 1.0 indicate rate is higher among
exposed than unexposed
• Ratios = 1.0 indicate no association
• Ratios < 1.0 indicate rate is lower among
exposed than unexposed
Example: The rate of HIV transmission within two years to
infants who were breastfed was 2.53 times the rate of HIV
transmission to infants who were formula-fed.
67
Rate difference
• Difference between two rates

Interpretation: Similar to risk difference; excess rate due to


the exposure
Example: The excess rate of seroconversion within two
years attributable to breastfeeding compared to formula
feeding was 25.6 per 100 person-years of observation.

68
Risk/rate differences
Risk/rate ratios
• Difference measures (risk / rate difference)
provide a measure of the potential direct public
health benefit of intervention:
– How many amputations can we prevent if we build a
new dialysis center?
– How many cases of flu can we prevent if we provide
vaccines in schools?
• Ratio measures (risk / rate / odds ratio) provide
an intuitive summary of the magnitude of the
difference between the effects of the exposed
and unexposed, or the strength of the association
between the exposure and outcome.
69
What have we learned?
• Measures of disease occurrence and frequency in
epidemiology are the cornerstone of how we
build the science of population health.
• Today we have learned about:
– Incidence/risk, prevalence, incidence rates
– Incidence rate = prevalence when disease is short in
duration
– Incidence rates are more appropriate than incidence
proportions when there are losses to follow-up

70
What have we learned?
• Measures of occurrence:
– Prevalence (point and period)
– Incidence (a.k.a., incidence proportion, risk)
– Incidence rate
• Measures of association:
– Risk ratio
– Risk difference
– Rate ratio
– Rate difference
71
Questions?

72
Thank you!

73

You might also like