You are on page 1of 15

Binomial Distribution

Binomial variable X= X1+X2+…+Xn where:


• Xi is either 1 or 0 with probabilities p or 1-p
• Xis are independent
• Probability p is the same for all Xi

n−k
P ( X = k ) = ( ) p (1 − p)
n
k
k
where k = 0,1,2,....n

• Expected value (X) = np


• Variance (X) = np(1-p)
Poisson Distribution
And, if n is large and p is small, we can show:
n−k
P ( X = k ) = ( ) p (1 − p)
n
k
k

k −λ
k
(np ) e − np
λe
≅ =
k! k!
• Which is the Poisson distribution, with λ=np
• Expected value = variance = λ
Poisson Distribution

• So, just need to know λ (i.e. don’t need to know


n or p) for statistical analysis
• useful in disease surveillance and other epi; just
need count of cases, not size of population or
proportion with outcome
• mean=variance  basis for “square root” rule of
thumb (i.e. +/-2*sqrt(cases) is approx 95% CI
(but not symmetric…))
Density/Probability Example Using R

• 18 cases of GC in Santa Cruz in Jan/Feb of 2005, 7 in


2004—interesting?

• dpois(18,7)
• [1] 0.0002319329
• ppois(18-1,7,lower.tail=FALSE)
• [1] 0.0003617843

• OTHER BIONMIAL/POISSON EXAMPLES IN R


Poisson Regression

• Disease counts

• Assessing disease in different groups:


– Probability, Risk, Rate, Incidence Prevalence
– Need epi course to motivate & review all
these
“Risk Ratio”/“Relative Risk”/“Cumulative Incidence Ratio”

Group Gender “Cases” Not “Cases”

1 M a b

0 F c d

estimated risk (rate) among males is a/(a + b) = r1


estimated risk (rate) among females is c/(c + d) = r0
r1
estimated relative risk (rate ratio) = rr =
r0
variance, 95% CIs, and all is well worked out...
• Selvin “Most measures of association can be estimated,
assessed, and interpreted in context of a linear relationship”
(Steve Selvin, Epidemiologic Analysis. Oxford. 2001)

• Logistic model for estimation of “odds ratio” is well known,


etc

• “Poisson model for the relative risk (and rate ratios etc)
postulates that the logarithm of the probability of disease is
a linear function of the risk factors”
Poisson Model
Group Gender “Cases” Not “Cases”

1 M a b
0 F c d

log (probabilty diseasei ) = log(r ) = a + b * Group


i i
 r1 
log(rate ratio) = log  = log(r ) - log(r )
 r0  1 0

then log(rr) = (a + b *1) - (a + b * 0) = b

and e b = rr
lastly, need to note
log(rate ) = a + b*Exposure
i i
 casesi 
log  = a + b*Exposurei
 totali 
log (casesi ) − log (totali ) = a + b*Exposurei
log (casesi ) = a + b*Exposurei + log (totali )

• This idea extends to diseased/total;


cases/population; count/person-years
• This “term” on the right hand side of the
equation is call an “offset”; it is treated as
fixed—no coefficient is fit to this term
• SIMPLE POISSON REG EXAMPLE IN R
• Concepts we’ve learned elsewhere extend
the simple “bivariate model” to the
multivariate situation
• With Poisson regression we can estimate
rate ratios (etc.) adjusted (like weighted)
for other factors
Exposure 1 Exposure 2 Exposure N “Cases” “Total”
1 0 0 nij…k tij…k
2 0 0 nij…k tij…k
3 0 0 nij…k tij…k
1 1 0 nij…k tij…k
2 1 0 nij…k tij…k
3 1 0 nij…k tij…k
1 0 1 nij…k tij…k
2 0 1 nij…k tij…k
3 0 1 nij…k tij…k
1 1 1 nij…k tij…k
2 1 1 nij…k tij…k
3 1 1 nij…k tij…k

log (probabilty diseaseijk )


= log(r ) = a + b1*Var1 + b2Var 2 j +... + bnVarN k
i i
and, with no interaction, rr for Exposure N = e bn
Other Poisson Regression “Topics”

• Poisson regression model can be used


simply/directly to calculate indirect (age)
adjusted rates
• Can also model multilevel “contingency” tables
– “log linear” models
– no “denominator”/no offset in model
– interaction terms test for independence
– model produces estimates of “cell” frequencies
“Log Linear Model”
Variable 1 Variable 2 Variable N “Cases/Events”
1 0 0 nij…k
2 0 0 nij…k
3 0 0 nij…k
1 1 0 nij…k
2 1 0 nij…k
3 1 0 nij…k
1 0 1 nij…k
2 0 1 nij…k
3 0 1 nij…k
1 1 1 nij…k
2 1 1 nij…k
3 1 1 nij…k

• Variable can be “Case” or outcome status


“Log Linear Model”
Variable 1 Variable 2 “Case” Status “Cases/Events”
1 0 0 nij…k
2 0 0 nij…k
3 0 0 nij…k
1 1 0 nij…k
2 1 0 nij…k
3 1 0 nij…k
1 0 1 nij…k
2 0 1 nij…k
3 0 1 nij…k
1 1 1 nij…k
2 1 1 nij…k
3 1 1 nij…k
Quantifying the Drip Rate: Statistical
Assessment of Trends in Gonorrhea In
California

Michael C. Samuel, Denise Gilson, Gail Bolan


California Department of Health Services
STD Control Branch

You might also like