# Estimating and Comparing Rates

Incidence Density Incidence Rate Difference and Ratio Confidence Intervals Standardized Rates and Their Comparison
1

A Definition
♦ Kleinbaum, Kupper and Morgenstern, Epidemiologic

Research: Principles and Quantitative Methods (1982), p.97: “A true rate is a potential for change in one quantity per unit change in another quantity, where the latter quantity is usually time. (…) Thus, a rate is not dimensionless and has no finite upper bound – i.e., theoretically, a rate can approach infinity.”

2

Rates
♦ A well-known example of rate is velocity, i.e., change of

distance per unit of time (given, e.g., in km/h).
• In practice, it does (should?) have an upper-bound 

♦ We can talk about instantaneous and average rates.
• Example instantaneous: your car velocity at a particular time-point (can depend on the time-point, e.g., city and highway). • Example average: your average speed after travelling a particular distance (assumed constant across the whole trip).

♦ In epidemiology, we usually talk about average rates.

3

Incidence/Mortality Rate
♦ Kleinbaum, Kupper and Morgenstern, Epidemiologic

Research: Principles and Quantitative Methods (1982), p.100:

“The incidence rate of disease occurrence is the instantaneous potential for change in disease status (i.e., the occurrence of new cases) per unit of time at time t, or the occurrence of disease per unit of time, relative to the size of the candidate (i.e., disease-free) population at time t”.
♦ We could similarly define the mortality rate.

4

Incidence Rate
♦ Other terms:
• an “instantaneous risk” (or probability); • a “hazard” (especially for mortality rates); • a “person-time incidence rate”; • a “force of morbidity”.

♦ It is expressed in units of 1/ time. ♦ It is sometimes confused with risk.

5

Rates and Risks
♦ Assume that the incidence rate is constant over time (=λ),

and the same for all individuals.
♦ The risk (probability) of developing disease in time T will

then be equal to 1-e-λT.
• Risk is sometimes called a cumulative incidence. • In a disease-free (at time 0) cohort of N individuals, you would thus expect N(1-e-λT) new cases after time T. • Similarly, we could talk about the risk of death.

♦ Thus, formally these are two different quantities.
6

Estimating Rates
♦ Rates require observations of incidence in time. Thus,

they are estimated from cohort studies.
♦ Instantaneous rates are seldom obtained. Rather, the

average rates are computed.
♦ The most basic estimator is the incidence density (ID):

no. of new cases in the calendar period (t 0 , t1 ) I ID = = PT accrued population time
♦ PT is expressed in person-years, person-days etc.

7

Incidence Density
♦ A hypothetical cohort of 12 subjects. ♦ Followed for the period of 5.5 years. ♦ 7 withdrawals among non-cases • three (7,8,12) lost to follow-up; • two (3,4) due to death; • two (5,10) due to study termination.

♦ PT = 2.5+3.5+…+1.5 = 26. ♦ ID=5/26=0.192 per (person-) year

or 1.92 per 10 (person-)years.
8

Population-Time Without Individual Data
♦ E.g., population-based registries. ♦ Person-years computed using the mid-year population. ♦ For rare events, periods of several years may be used.
• Ideally, one would like to use mid-year populations for each year. • Alternatively, one can use information for several time-points, or the mid-period population (these are less accurate solutions).
 One may face the problem of removing those not at risk (e.g., women for prostate cancer incidence).
9

Population-Time Without Individual Data: Example

10

Incidence Density: Remarks
♦ It is an estimate of an average rate.
• So we will sometimes refer to it as an “incidence rate”.

♦ Any fluctuations in the instantaneous rate are obscured and

• 1000 persons followed for 1 year • 100 persons followed by 10 years

produce the same number of person-years. If the average time to disease onset is 5 years, ID in the first cohort will be lower.
11

Incidence Density: Remarks
♦ If applied to the whole cohort/population, sometimes called

crude rate.
♦ However, sex, age, race etc. can have substantial influence

on the incidence of disease.
♦ Comparing crude rates for two populations, which differ

w.r.t., e.g., age, can be misleading (confounding!).
♦ Therefore, usually standardized rates are compared.
• E.g., for cancer, age- and sex-standardized rates are used. • They will be discussed later.
12

Confidence Interval for Incidence Density
♦ By using a Poisson model, standard error of ID=I / PT can

be estimated by:
SE( ID) =

I ( PT ) 2

♦ Thus, an approximate 95% CI for ID is given by:

ID ± 1.96∙SE(ID).
• 99% CI: ID ± 2.58∙SE(ID) .

13

Estimating Incidence Densities: Example
♦ Postmenopausal Hormone and Coronary Heart Disease Cohort Study:

• Stampfer et al., NEJM (1985). • Involving female nurses:
Hormone use Yes CHD Person-years 30 54308.7 No 60 51477.5 Total 90 105786.2

• ID1 = 30/54308.7 = 0.00055; SE(ID1) = (30/54308.72)1/2 = 0.00010 • 95% CI for ID1 = 0.00055 ± 1.96∙0.0001 = (0.00035, 0.00075) • ID0 = 60/51477.5 = 0.00116; SE(ID0) = (60/51477.52)1/2 = 0.00015 • 95% CI for ID0 = 0.00116 ± 1.96∙0.00015 = (0.00086, 0.00145)
14

Comparing Two Incidence Densities
♦ Assume data from a
Exposed Unexposed Cases Pop.-time I1 PT1 I0 PT0 Total I PT

cohort study:

♦ We get two estimates for non- and exposed subjects:

ID0=I0/PT0

and

ID1=I1/PT1.

♦ To compare them, we can look at
• Incidence rate difference: IRD = ID1 - ID0 . • Incidence rate ratio: IRR = ID1 / ID0 .

15

Comparing Two Incidence Densities: Example
♦ Postmenopausal Hormone and Coronary Heart Disease Cohort Study:

• Stampfer et al., NEJM (1985). • Involving female nurses:
Hormone use Yes CHD Person-years 30 54308.7 No 60 51477.5 Total 90 105786.2

• ID1 = 30/54308.7 = 0.00055; ID0 = 60/51477.5 = 0.00116 • IRD = ID1 - ID0 = -0.00061 • IRR = ID1 / ID0 = 0.474
16

Comparing Two Incidence Densities: Poisson Model Method
♦ By using a Poisson model,

standard error of IRD can be estimated by:

Exposed Cases Pop.-time I1 PT1

Unexposed I0 PT0

Total I PT

I0 I1 SE( IRD) = + 2 ( PT0 ) ( PT1 ) 2
♦ Thus, an approximate 95% CI

♦ Standard error of ln IRR can be

estimated by:

for IRD is given by: IRD ± 1.96∙SE(IRD).
• 99% CI: IRD ± 2.58∙SE(IRD)

SE(ln IRR ) =
is given by:

1 1 + I 0 I1

♦ Thus, an approximate 95% CI for IRR

exp{ ln IRR ± 1.96∙SE(ln IRR) }
• 99% CI: exp{ ln IRR ± 2.58∙SE(ln IRR) }
17

Comparing Two Incidence Densities: Example
60 30 SE( IRD) = + = 0.00018 2 2 51477.5 54308.7 CHD Hormone use Yes 30 54308.7 No 60 51477.5 Total 90 105786.2

SE( lnIRR ) =
♦95% CI for:

1 1 + = 0.224 60 30

Personyears

IRD: ln IRR: IRR:

-0.00061 ± 1.96∙0.00018 = (-0.00096, -0.00025) ln(0.474) ± 1.96∙0.22 = (-1.178, -0.315) (e-1.178, e-0.315) = (0.308, 0.729)

♦Both CIs allow to reject the null hypothesis of no difference.
18

Comparing Two Incidence Densities: “Test-Based” Method
♦ 95% “test-based” CI for IRD can

be computed as IRD ± 1.96 ∙ SE(IRD), where SE(IRD)= IRD / χ and

Exposed Unexposed Total Cases Pop.-time I1 PT1 I0 PT0 I PT

χ=

I1 − I ⋅

PT1 PT I ⋅ PT0 ⋅ PT1 PT 2

♦ Similarly, SE(ln IRR)= (ln IRR) / χ ♦ 95% “test-based” CI for ln IRR is

ln IRR ± 1.96 ∙ (ln IRR) / χ
♦ Can be written as

♦ Can be re-expressed as

(1 ± 1.96 / χ) ∙ IRD
• 99% CI: (1 ± 2.58 / χ) ∙ IRD

( 1 ± 1.96 / χ ) ∙ ln IRR
♦ 95% CI for IRR is thus

exp{ ( 1 ± 1.96 / χ ) ∙ ln IRR}
19

Comparing Two Incidence Densities: Example
χ=
I1 − I ⋅ PT1 PT I ⋅ PT0 ⋅ PT1 PT 2
Exposed Unexposed Total Cases Pop.-time I1 PT1 I0 PT0 I PT

χ=

54308.7 30 − 90 ⋅ 105786.2 = 3.41 90 ⋅ 54308.7 ⋅ 51477.5 105786.2 2

Hormone use Yes No Total CHD 30 60 90 Person 54308.7 51477.5 105786.2 -years

♦ 95% “test-based” CI for

IRD: ln IRR: IRR:

(1 ± 1.96/3.41) ∙ (-0.00061) = (-0.001, -0.0002)
 Close to the one based on the Poisson approximation (not in general).

(1 ± 1.96/3.41) ∙ ln(0.474) = (-1.176, -0.317) (e-1.176, e-0.317) = (0.309, 0.728)
20

“Exact” Confidence Interval for IRR
♦ The presented CIs for ln IRR (and IRD) assume that the

estimates of ln IRR vary according to the normal distribution.
• Hence their form, e.g., ln(IRR) ± 1.96 ∙ SE(ln IRR).

♦ The use of the normal distribution is an approximation.
• Can be problematic, especially in small samples.

♦ It is possible to construct a CI for ln IRR using the “exact”

distribution (i.e., without approximating it by the normal).
• The CI is valid in all samples; in large samples, it is close to the approximate CIs. • Computation is a bit more difficult (but easily handled by computers).
21

Standardized Rates
♦ We will introduce the standardization w.r.t. age. ♦ We will assume that our population is stratified by age (i.e.,

subdivided into age-groups).
• One needs to define age-groups (e.g., 0-4, 5-9,…).

♦ One needs to compute age-specific rates (ID).
• Population-time and no. of cases for each age-group are required.

♦ There are two methods of standardization:
• Direct; • Indirect.
22

Standardization
♦ Direct method
• Age-specific rates of the study population are applied to the agedistribution of the standard population (rates study → age standard)
• Theoretical rate that would have occurred if the rates

observed in the study population applied to the standard population.

♦ Indirect method
• Age-specific rates from the standard population are applied to the age-distribution of the study population. (rates standard → age study)
23

Direct Standardization
Age Group Observed <40 40-64 65+ Total Study Population Person-years Rate Standard Population (e.g., USA 1990) Observed Population Rate

I1 I2 I3 It

PT1 PT 2 PT 3 PT t

I1/ PT1 I 2/ PT 2 I 3/ PT 3 I t/ PT t

B1 B2 B3 Bt

N1 N2 N3 Nt

B1/N1 B2/N2 B3/N3 Bt/Nt

♦ Crude Rate in study population = It / PTt . ♦ Directly Standardized Rate (DSR):

DSR = { (I1/PT1)N1 + (I2/PT2)N2 + (I3/PT3)N3 } / Nt = (I1/PT1)(N1/Nt) + (I2/PT2)(N2/Nt) + (I3/PT3)(N3/Nt).
Make sure units are consistent!!!
24

Direct Standardization
♦ If there is no confounding, crude rate is adequate. ♦ DSR by itself is not meaningful – it makes sense only when

comparing two or more populations.
• If possible, compare age-specific rates.

• The rates should exhibit more or less similar trends (also in the standard).

♦ DSR depends on the choice of the standard population.
• The age-distribution of the latter should not be radically different from the compared populations. • There are several standard populations (e.g., for the world, continents etc.).
25

Indirect Standardization
♦ Direct standardization requires age-specific rates for all

compared populations.
♦ If these are not available, or they are imprecise, the

indirect method is preferred.
♦ Both should lead to similar conclusions; if they do not, the

reason should be investigated.

26

Indirect Standardization
Age Group Obs <40 40-64 65+ Total Study Population Personyears Rate Standard Population (e.g., USA 1990) Obs Population Rate Expected

I1 I2 I3 It

PT1 PT2 PT3

I1 / PT1 I2 / PT 2 I3 / PT 3

B1 B2 B3

N1 N2 N3

B1 /N1 B2 /N2 B3 /N3

E1= PT 1* (B1 /N1) E2= PT 2* (B2 /N2) E3= PT3* (B3 /N3) E1+ E2+ E3=∑Ej

♦ Standardized (Incidence or Mortality) Ratio (SIR or SMR):

SIR

= It / ∑ Ej = Observed / Expected .

♦ Take Indirectly Standardized Rate (ISR) as:

ISR

= SIR ∙(crude rate for the standard population).
27

Make sure units are consistent!!!

Standardization of Rates: Example
♦ Infant deaths (for children less than 1 year of age) in Colorado and

Louisiana in 1987.
• Colorado: 527 deaths out of 53808 life births; crude rate = 9.8 per 1000. • Louisiana: 872 deaths out of 73967 life births; crude rate = 11.8 per 1000. ♦ Crude infant mortality rate for Colorado is lower than for Louisiana. ♦ In the US, infant mortality depends on race.

Race Black White Other Total Life Births 641567 2992488 175339 3809394

USA, 1987 %Life Infant Births Deaths 16.8 11461 78.6 25810 4.6 1137 100 38408

Rate (x1000) 17.9 8.6 6.5 10.1

28

Standardization of Rates: Example
♦ The distribution of race of new-

Race

Black White Other ♦ Infant mortality rates depend on Total race.
• Race is a confounder.

born children is different in the two states.

Colorado Life Birth 3166 48805 1837 53808

% 5.9 90.7 3.4 100

Louisiana Life Births 29670 42749 1548 73967

% 40.1 57.8 2.1 100

Race
♦ Compare race-specific infant

(x1000)

Louisiana Rate
(x1000)

mortality rates.
• Unclear (differences in various directions).

Black White Other Total

16.4 9.6 3.3 9.8

17.7 8.0 1.9 11.8
29

Standardization of Rates: Example
♦ Direct standardization: apply state- and race-specific rates to the

standard race distribution (US, 1987).
Race US, 1987 Ni Ni / Nt Rate (x1000) (Births) 641567 0.168 16.4 2992488 175339 3809394 0.786 0.046 1 9.6 3.3 Colorado Rate*Ni 10521.7 28727.9 578.6 39828.2 Rate*Ni /Nt (x 1000) 2.76 7.54 0.15 10.45 Rate (x1000) 17.7 8.0 1.9 Louisiana Rate*Ni 11355.7 23939.9 333.1 35628.7 Rate*Ni /Nt (x1000) 2.98 6.28 0.09 9.35

Black White Other Total

♦ DSR for Colorado: 10.45 (per 1000 life births; crude: 9.8). ♦ DSR for Louisiana: 9.35 (per 1000 life births; crude: 11.8).

30

Standardization of Rates: Example
♦ Indirect standardization: apply race-specific rates of a standard

population (US, 1987) to the race-distribution of the states.
Race Black White Other Total US Rate
(x1000)

(PTi) (Obs.)

(Exp. Deaths)

Louisiana Life Births Deaths Rate*PTi
(PTi) (Obs.)

(Exp. Deaths)

17.9 8.6 6.5 10.1

3166 48805 1837 53808

52 469 6 527

56.7 419.7 11.9 488.3

29670 42749 1548 73967

525 344 3 872

531.1 367.6 10.1 908.8

♦ SMR for Colorado: 527/488.3 = 1.08 (8% higher than the US). • ISR = SMR x 10.1 = 10.9 (race-adjusted infant mortality-rate). ♦ SMR for Louisiana: 872/908.8 = 0.96 (4% lower than the US). • ISR = SMR x 10.1 = 9.7 (race-adjusted infant mortality-rate).
31

Standardization of Rates: Example
♦ Is it reasonable to use the

•The plot of race-specific rates shows similar trend (black>white>other). •The distribution of race in the US is similar to the two states (white>black>other). •Results for both standardization methods are similar.

32

Comparison of Directly Standardized Rates
♦ If we have two standardized rates, we may want to compare them. ♦ For the direct method, assume we have DSR1 and DSR2. ♦ 95% CI can then be obtained using the normal approximation:

(DSR1 - DSR2) ± 1.96 ∙ SE(DSR1 - DSR2) . • 99% CI: (DSR1 - DSR2) ± 2.58 ∙ SE(DSR1 - DSR2) .
♦ The standard error is given by

 Nk  SE( DSR1 − DSR2 ) = ∑  ⋅ SE( IRDk )   Nt 

2

where IRDk is the stratum-specific intensity rate difference.
33

Comparison of Directly Standardized Rates
♦Alternatively, we might look at the standardized rate ratio:

SRR=DSR1/DSR2.
♦95% CI for SRR can be written as: SRR 1 ± (1.96 / Z), where

DSR1 − DSR2 Z= SE( DSR1 − DSR2 )
• 99% CI can be written as: SRR 1 ± (2.58 / Z ).

34

Comparison of Directly Standardized Rates: Example
♦ DSR1 (Colorado): 0.01045 (10.45 per 1000 life births). ♦ DSR2 (Louisiana): 0.00935 (9.35 per 1000 life births).

Race Black White Other Total

US Colorado %Births Births Deaths Rate
(Ni/Nt) (PTi) (Ii) (IDi)

Louisiana Births Deaths Rate
(PTi) (Ii) (IDi)

(x1000)

IRDi

(x1000)

SEi

16.8 78.6 4.6 100

3166 48805 1837 53808

52 469 6 527

16.4 9.6 3.3 9.8

29670 42749 1548 73967

525 344 3 872

17.7 8.0 1.9 11.8

-1.3 1.6 1.4

2.4 0.6 1.7

SE( DSR1 − DSR2 ) =

( 0.168 ⋅ 0.0024) 2 + ( 0.786 ⋅ 0.0006) 2 + ( 0.046 ⋅ 0.0017 ) 2 = 0.0006
35

Comparison of Directly Standardized Rates: Example
SE( DSR1 − DSR2 ) =

( 0.168 ⋅ 0.0024) 2 + ( 0.786 ⋅ 0.0006) 2 + ( 0.046 ⋅ 0.0017 ) 2 = 0.0006

♦ DSR1 = 0.01045; DSR2 = 0.00935.

♦ DSR1 - DSR2 = 0.0011.

• 95% CI: 0.0011 ± 1.96∙0.0006 = (-0.0002, 0.002). • CI includes 0 - we cannot reject H0 of no difference.
♦ SRR = DSR1 / DSR2 = 1.12.

• Z = (DSR1 - DSR2) / SE = 1.83. • 95% CI: 1.12 1 ± (1.96 / 1.83) = (0.99, 1.26).
36

Comparison of Indirectly Standardized Rates
♦ In directly standardized rates, stratum specific-rates for different study

populations are combined using the same weights (relative stratumsizes in the standard population).
♦ In indirectly standardized rates, the weights (PTi / “expected Ii”) differ. ♦ Thus, technically speaking, ISRs (SIRs) should not be compared. ♦ On the other hand, it is valid to ask whether SIR (or SMR) is different

from 1.
♦ To do that, one can construct a 95% CI, e.g., as follows:

SIR ± 1.96∙(√observed events)/(expected events).
37

Standardization of Rates
♦ Standardization is a simple way to remove effect of

confounding.
♦ It can be extended to more than one confounder. ♦ Similar techniques can be used for differences or ratios of

rates.
♦ An alternative is a stratified analysis (later).

38