You are on page 1of 43

OD L)

G O U
OF A A
S S
E T ( H
M I
NA S
E F HE
HE A V I D
T S AS N 3 2
IN O
H IS NK RSE I
W IT A E
E
H EN D M H V
TH VE ED
SA AED
M
POISSON,
POISSON-GAMMA AND
ZERO-INFLATED REGRESSION MODELS OF MOTOR
VEHICLE CRASHES
STATISTICAL DISTRIBUTIONS IN TRAFFIC

 discrete probability distributions

the number of vehicles that will enter a car park


through a particular gate during the next five minutes.
 continuous probability distributions

the time a particular vehicle will spend


in the queue waiting to enter the car park via
that gate.
BINOMIAL FREQUENCY DISTRIBUTION

at a certain time of day, the car park is being


accessed by 300 veh/h and that 35% of these,
on average, use Gate A. Assume also that we
need to know the probability of more than 6
vehicles using Gate A over a two minute
period.
BINOMIAL FREQUENCY DISTRIBUTION
BERNOULLI TRIAL

independence among crashes and unequal

crash probabilities across drivers, vehicles,

roadways, and environmental conditions


BINOMIAL
DISTRIBUTION
POISSON DISTRIBUTION

Poisson distribution is the limit of the binomial


distribution as n approaches infinity

and p approaches zero while the product n.p


remains constant
POISSON DISTRIBUTION
POISSON EXAMPLE

if the average volume is 720 veh/h, the


probabilities that no vehicles will pass
during the next 10 s.
THE BINOMIAL DISTRIBUTION
APPROXIMATED BY A POISSON DISTRIBUTION

Because of the small probability of a crash and the


large number of trails these Bernoulli trails can be
well approximated as Poisson trials
NEGATIVE BINOMIAL DISTRIBUTION

the probability that the k th ‘success’ will occur


in the n th trial
CONCAVE RELATIONSHIPS

Bernoulli
hypergeometric
distributions
CONVEX
VARIANCE-TO-MEAN RELATIONSHIPS

negative binomial(NB)
 exponential
uniform
POISSON
DISTRIBUTION
BOTH CONVEX AND
CONCAVE

E(Z) = Var(Z) = λ.
OVER DISPERSION,
REPRESENTING CONVEX RELATIONSHIP

VAR(W)>>>>E(W)

the conditional variance is


greater than the conditional
mean,
 a result of Bernoulli trials
with non-equal success
probabilities
“excess” zeros
WHERE TO USE?
 Poisson models serve well under nearly
homogenous conditions
 Negative binomial models serve better in
other conditions.
MIXED POISSON DISTRIBUTIONS

n = 0,1,2, . . ., K are inflated counts


while the rest of the distribution K + 1,K + 2, . ., N
follows a Poisson process
DUAL-STATE PROCESS
how many times you have taken
mass transit to work during the past week?

perfect state
a normal count-process state
vanpool instead of mass transit

imperfect state
a zero-count state
a respondent may never take transit
CHARACTERISTICS OF DATA USED FOR
ZERO-INFLATED MODELS
OBSERVATIONS COMPARED TO PREDICTIONS

Crashes per year zero crashes per


Observations
year
Mean Mean 0 | 1+ total zero to total ratio delta=poisson estimated mean- observation
0.2 0.8 7610| 653 8263 0.92 0.12
0.68 0.32 19,480/10,320 29800 0.65 0.33
0.15 0.85 26,640/3,160 29800 0.89 0.04
0.04 0.96 28,609/1,191 29800 0.96 0
0.08 0.92 28,068/1,732 29800 0.94 0.02
0.29 0.71 2,238|542 2780 0.8 0.09
DEDUCED FACTS 1
• The mean number of crashes per year
is very low for each dataset.
• the datasets are shown to have between
2% and 12% more zeros than what would be
expected
RURAL HIGHWAYS
DEDUCED FACTS 2
 arterial and collector rural segments are
more dangerous than interstate highways
 a person is about five times more likely given
the exposure, to be involved in a crash on a
minor collector than a freeway respectively

 a person is about two times more likely given


the exposure, to be involved in a crash on a
principal arterial than a freeway respectively
HIGH HETEROGENEITY AND HIGH RISK
HIGH HETEROGENEITY AND HIGH RISK
HIGH HETEROGENEITY AND HIGH RISK
HIGH HETEROGENEITY AND HIGH RISK

 that the number of events follow a Poisson


distribution for sites with very low exposure, with about
92%of the sites having zero crashes.

 For low exposure, in contrast, the simulation generated


more zeros than what would be expected from the
Poisson and Poisson-gamma (NB) distributions.

 the Poisson-gamma (NB)distribution provides a


superior statistical fit than the Poisson distribution for
sites with medium exposure.
LOW HETEROGENEITY AND LOW RISK
LOW HETEROGENEITY AND LOW RISK
LOW HETEROGENEITY AND LOW RISK
LOW HETEROGENEITY AND LOW RISK
LOW HETEROGENEITY AND LOW RISK.
 low exposure: the Poisson distribution offers good
statistical fit
 the Poisson-gamma distribution provides a
better fit than the Poisson distribution for
medium exposure.

 the only situation where a site may be inherently


safe is when the exposure tends towards zero
VERY LOW HETEROGENEITY
VERY LOW HETEROGENEITY
• the Poisson distribution can be used to
approximate data generated by a nearly
homogeneous Bernoulli process both for high
and low crash risks.
A BETTER FIT
 multinomial-Poisson process
 multilogit-Poisson for modeling multi-state processes:
perfect safety, above average safety, average safety, less
than average safety, and least safe
 The ultimate statistical fit :non-parametric (including
spline functions) or semi parametric models
 disadvantage:overfitting
THE ROOT OF EXCESS ZEROS

(1) spatial or time scales that are too small


(2) under-or mis-reporting of crashes
(3) sites characterized by low exposure and high
risk
(4) important omitted variables describing the
crash process
SOLUTIONS FOR MODELING CRASH DATA
WITH EXCESS ZEROS

 changing the spatial or time scale of analysis:


 applying small-area statistical methods
Small-area statistics (SAS) (also known as small
area estimation or SAE) are such tools that can
be used for data characterized by low exposure
SOLUTIONS FOR MODELING CRASH DATA
WITH EXCESS ZEROS

 improving the set of explanatory variables


 modeling solutions: One solution is to
estimate NB and Poisson models with a term
added to capture unobserved heterogeneity
effects ,under- or misreporting of crashes and
omitted important variables in the crash
process, and will yield a fit similar to zero-
inflated models
FUTURE APPROACHES
 begin to develop models that consider the
fundamental process of a crash and avoid
striving for “best fit” models in isolation

 modeling Bernoulli trials with unequal


probability of events

 estimate the individual probability of crash


risk for different time periods and driving
conditions

You might also like