CrimeStatChapter 13 PDF

Chapter 13
Trip Generation
B ac k gro u n d
In t h is ch a pt er , th e t h eor y an d m ech a n ics of t h e t r ip gener a t ion st a ge will be

expla in ed. T rip gen eration is a m odel of t h e n u m ber of t r ip s t h a t or igin a t e a n d en d in ea ch
zon e for a given ju r is dict ion . Given a set of N dest in a t ion zon es a n d M or igin zon es (wh ich
inclu de a ll t h e dest ina t ion zon es a n d, possibly, zon es from a djacent ju r isdict ion s), sepa r a t e
m odels a r e pr odu ced of t h e n u m ber of crim es origin a t in g a n d en din g in ea ch of th es e zones .
Th a t is, a sep a r a t e m odel is pr odu ced of t h e n u m ber of cr imes or igina t ing in ea ch of t h e M
or igin zon es, a n d a n ot h er m odel is pr odu ced of th e n u m ber of crim es en din g in ea ch of th e
N destinat ion zones. The first is a crim e prod u ction model while the second is a crim e
attraction m odel.
Two poin t s s h ould be em ph a size d. F ir st , t h e m odels a r e pr edictive . Th a t is, t h e

r esu lt of t h e m odels a r e a pr edict ion of bot h t h e n u m ber of cr im e t r ip s or igin a t in g in ea ch
zone a n d t h e n u m ber of crim e t r ips s en din g in ea ch zon e (i.e., crim es occur r in g in a zone).
Beca u se t h e m odels a r e a pr edict ion , th er e is alwa ys er r or bet ween t h e a ct u a l nu m ber a n d
t h a t pr edict ed. As lon g a s t h e er r or is n ot t oo la r ge, t h e m odel ca n be a u sefu l t ool for bot h
a n a lyzing t h e cor r ela t es of crim e a s well a s bein g useful for for ecas t in g or for sim u la t in g
policy int ervent ions.
Secon d, becau se t h e n u m ber of crim es a t t r a cted t o th e st u dy ju r isd iction will

u su a lly be gr ea t er t h a n t h e n u m ber of crim es pr edicted for t h e origin zon es, du e pr im a r ily
t o cr ime t r ips com ing fr om ou t side t h e or igin a r ea s, it is n ecessa r y to bala n ce t h e
pr odu ctions a n d a t t r a ctions. Th is is don e in t wo st eps. On e, a n est im a t e of t r ips com in g
fr om ou t s id e t h e s t u dy a r ea (ext er n a l t r ip s) is a dd ed t o t h e p r ed ict ed or igin s a s a n ‘ext er n a l
zon e’. Two, a st a t is t ica l a dju st m en t is don e in or der t o en su r e t h a t t he t ot a l n u m ber of
or igins equa ls t h e t ot a l nu m ber of dest ina t ion s. This is called ba la n cin g a n d is es sen t ia l a s
a n in pu t in t o t h e secon d st a ge of cr im e t r a vel d em a n d m odelin g - t r ip dis t r ibu t ion .
In t h e followin g discus sion , fir st , t h e logic behin d t r ip gen er a t ion m odelin g is

pr es en t ed, in clud in g t h e calibr a t ion of a m odel, t h e a dd it ion of exter n a l t r ips in m a k in g a
m odel, a n d t h e ba la n cing of pr edicte d or igin s a n d p r edicte d d es t in a t ions . Secon d, t h e
m echa n ics of cond u ctin g t h e t r ip gen er a t ion m odel wit h Crim eS tat is discussed an d
illu st r a t ed wit h da t a fr om Ba lt im or e Cou n t y.
Mo d e l i n g Tr ip Ge n e r a ti o n
Th e pr ocess of m odelin g t r ip gen er a t ion is fair ly well developed, at lea st wit h

r espect t o or din a r y tr ips. It pr oceeds t h r ou gh a ser ies of logica l st eps t h a t m a ke u p t h e
a ggrega t e t r ip gen er a t ion m odel.
13.1
Tr ip P u r p o s e
Trip genera tion m odeling sta rt s with th e reasons behind tr avel. At a n individua l
level, people m a k e t r ip s for a r ea son - t o go t o wor k , t o go sh oppin g, t o go t o a m edica l
a ppoint m en t , to go for r ecr ea t ion , or , in t h e ca se of offen der s, t o com m it a cr ime. Th ese a r e
called trip pu rposes. Since t h er e a r e a very lar ge nu m ber of t r ip pu r poses, usu a lly t h ese
a r e ca t egor ized in t o a few m a jor groupin gs. In t h e ca se of t h e u su a l tr a vel dem a n d
for eca st in g, t h e d is t in ct ion s a r e h om e-to/ from -w ork (or h ome-based work tr ips), h om e-
to/ from -n on -w ork (or h ome -ba se d n on-wor k t r ips , e.g., sh oppin g), an d a n on -h om e trip
wh er e n eit h er t h e or igin n or t h e dest in a t ion a r e a t t h e t r a veler ’s r esid en ce loca t ion (n on -
h ome -ba se d t r ips ).
Sin ce t h e m odel h a s a ggr ega t ed t r ip s t o a zon e, t h e t r ip pu r poses a r e collect ion s of

t r ips from ea ch origin zon e t o each dest in a t ion zon e. Th u s, ea ch zone pr odu ces a cert a in
n u m ber of h om e-wor k t r ip s, h om e-n on -wor k t r ip s, a n d n on -h om e t r ip s a n d ea ch zon e
a t t r a cts a cert a in n u m ber of h ome -work t r ips , h ome -non -work t r ips , a n d n on-h ome t r ips .
Th is is t h e u su a l dist inction t h a t m ost t r a n sport a t ion m odeling or gan izat ion s m a ke. Th e
t r ip pu r poses a r e docu m en t ed du r in g a la r ge t r a vel s u r vey t h a t a sk s in divid u a ls t o fill ou t
t r a vel d ia r ies for on e or t wo da ys of t r a vel. In t h e t r a vel d ia r ies, d et a iled in for m a t ion
a bou t ea ch t r ip is docu m en t ed - tim e of da y, dest ina t ion of t r ip, pur pose of t r ip, tr a vel
m odes u sed in m a kin g th e t r ips, accom pa n yin g pas sen gers , rout e t a ken , an d t ime t o
complete the tr ip.
Cr im e T ri p Gr o u p i n g s
F or cr ime t r ips, however, th ese dist inction s a r e n ot very m ea n ingfu l. Ther e is very
lit t le in for m a t ion on h ow offen der s m a k e t r ip s. On e ca n n ot ju st t a k e a sa m ple of offen der s
a n d a sk t h em t o comp let e a t r a vel dia r y a bout h ow, wh en , a n d w h er e t h e t r ip t ook pla ce.
Wit h a r r est ed offen der s, it m igh t be possible t o pr odu ce su ch a dia r y, bu t bot h m em or y
pr oblem s a s well a s lega l con cer n s qu ick ly m a k e t h is a n u n r elia ble sou r ce of in for m a t ion .
Th er efor e, a s in dica t ed in cha pt er 11, a decision h a s been m a de t o refer en ce all t r ips wit h
r espect t o t h e r esiden t ial h om e loca t ion . All cr ime t r ips a r e a n a lyzed as h om e-crim e tr ips.
H owever , oth er dis t in ction s ca n be m a de. Th e m ost obviou s is by t ype of crim e.

Th er e a r e r obber y t r ips , bu r gla r y t r ips , veh icle th eft t r ips , a n d s o fort h . Sim ila r ly,
dis t in ct ion s ca n be m a de by t r a vel t im e su ch a s a ft er n oon t r ip s or even in g t r ip s. As
m en t ion ed in ch a pt er 12, t h ou gh , t h e sa m ple size will d ecr ea se wit h gr ea t er dis t in ct ion s.
Logica lly, one can divid e a sa m ple in t o a ver y la r ge n u m ber of im por t a n t dis t in ction s (e.g.,
a ft er n oon bur gla r y tr ips involvin g two or m or e offen der s). H owever, t h is r edu ces t h e
sa m ple size a n d in cr ea ses t h e er r or in est im a t ion , p a r t icu la r ly a t t h e t r ip dis t r ibu t ion a n d
subsequent st ages.
An im p or t a n t p oin t t h a t dis t in gu is h es t h e a ggr ega t e d em a n d t yp es of t r a vel d em a n d

m odels, a s is bein g im plem en t ed h er e, a n d t h e n ewer gen er a t ion of a ctivit y-ba sed t r ips is
t h a t t h er e a r e n o linked trips with t h e a ggr egat e a ppr oa ch (F H WA, 2001a). If a n offen der
firs t st ea ls a car , t h en u se s t h e car t o rob a gr ocer y st ore followed by a bu r gla r y, t h e
13.2
a ggr egat e a ppr oa ch m odels t h is a s t h r ee sepa r a t e t r ips, ra t h er t h a n a s a ser ies of t h r ee
lin k ed crim e t r ips (which t h e a ctivit y-ba sed m odels d o). Th is is a deficien cy wit h t h e
a ggrega t e t r a vel dem a n d m odel. In ord er t o ma k e t h e a ggrega t e m odels w ork , ea ch t r ip is
con sider ed ind epen den t of a n y ot h er t r ip. Wh ile t h is is not r ea listic beh a vior a lly, since we
k n ow t h a t m a n y crim es a r e com m it t ed in sequ en ce as p a r t of a sin gle jour n ey (or t our ), th e
zona l a pp r oach does lim it t h e u n der lyin g logic of crim e t r ips . Never t h eles s, t h e a ggr ega t e
a ppr oa ch ca n be very us efu l as lon g as it imp lemen t ed con sist en t ly. With t h e cu r r en t st a t e
of a ct ivit y-ba sed m odelin g, t h er e is n ot yet a n y eviden ce t h a t t h ey p r odu ce m or e a ccu r a t e
pr edict ion s t h a n t h e cr u der , a ggr ega t e a ppr oa ch (FH WA, 2001a ).
Correlate s of Crime
An y t r ip h a s con t ext u a l cor r ela t es a ss ocia t ed wit h it . It is well d ocu m en t ed t h a t t h e

likelih ood of m a k in g a t r ip (cr im e or ot h er wise) is n ot equ a l a cross a r ea s of a m et r opolit a n
r egion. Th er e a r e a ge cor r ela t es of t r a vel, socioecon omic corr ela t es of t r a vel, a n d la n d u se
cor r ela t es of t r a vel; t h e la t t er a r e u s u a lly a s socia t ed wit h t r ip pu r p os es (e.g., r et a il a r ea s
a t t r a ct s h oppin g t r ips ).
Th e t r ip gen er a t ion m odel bein g im plem en t ed in t h is ver sion of Crim eS tat is a n

a ggr ega t e m odel. Th u s, t h e pr edictor s a r e a ggr ega t e, r a t h er t h a n beh a viora l, in n a t u r e, a s
discuss ed in cha pt er 11. Th ey ar e cor r elat es of t r ips, not n ecessa r ily t h e reason s for t h e
t r ips . For exa m ple, t ypically popu la t ion is t h e bes t pr edictor of t r ips . Zones wit h m a n y
per sons will produ ce, on a vera ge, m or e cr ime t r ips t h a n zon es wit h fewer per sons. Th e
obser va t ion is n ot a r ea son , bu t is s im ply a by-pr odu ct of th e size of t h e zone. Sim ila r ly,
low-incom e zon es will t en d t o pr odu ce, on a vera ge, m or e cr ime t r ips t h a n wea lth ier zon es;
a gain , th is is not a r ea son, but a cor r elat e of t h e ch a r a ct er ist ics t h a t m igh t con t r ibut e t o
ind ividu a l likelihoods for com m itt ing cr imes .
As ment ioned in cha pter 12, th ere ar e a nu mber of different var iables tha t could be
u sed for pr edict ion , a lt h ou gh popu la t ion (or a pr oxy for popu la t ion , s u ch a s h ou seh olds),
in com e or pover t y, a n d la n d u se va r ia bles wou ld be t h e m ost comm on (NCH RP , 1998).
The ore tic al R e le va n ce of th e Varia ble s
In gener a l, t h e var iables t h a t a r e select ed sh ou ld be emp irically st a ble an d

t h eor et ically m ea n in gful. Th a t is, t h ey sh ould be s t a ble va r ia bles t h a t do n ot ch a n ge
dr a m a t ica lly fr om yea r t o yea r . Th ey s h ou ld be r elia bly m ea su r ed so t h a t a n a n a lyst ca n
depen d on t h eir va lues . Fin a lly, th ey should be m ea n ingfu l in some wa ys. Tha t is, th ey
sh ou ld be plau sible enough t h a t bot h cr ime a n a lyst s a n d r esea r ch er s a n d in for m ed
out sid er s s h ould a gr ee t h a t t h e r ela t ions h ip is pla u sib le. Th e va r ia bles eit h er sh ould h a ve
been dem ons t r a t ed t o be pr edictor s in ea r lier r esea r ch or else t o be so corr ela t ed wit h
k n own fact ors a s t o be con sid er ed m ea n in gful p r oxies.
13.3
S p u r iou s cor r ela tes
On t h e ot h er h a n d, if a va r ia ble is eit h er a cor r ela t e of a kn own pr edict or or

idiosyn cra t ic, t h en it is lia ble n ot be believed . For exa m ple, t h e n u m ber of t a xis u su a lly
corr elat es with th e amount of employment since taxis tend t o ply comm ercial ar eas for
t h eir t r a de. Addin g t h e n u m ber of t a xis in a pr edict ive m odel is lia ble t o pr odu ce
sign ifica n t st a t is t ica l effect s in pr edict in g cr im e dest in a t ion s. H owever , few per son s a r e
goin g t o be lieve t h a t t h is is a r ea l fa ct or sin ce it is u n der st ood t o be a cor r ela t e of a m or e
st r u ctu r a l va r ia ble.
Id iosyn cra t ic va r ia bles a r e t h ose t h a t a pp ea r in u n iqu e sit u a t ion s. F or exa m ple, in

som e cit ies, a dja cen cy t o a fr eewa y is a cor r ela t e of cr im e or igin s (e.g., in Ba lt im or e Cou n t y
wh er e low incom e popu la t ion s live) wh er ea s in oth er cit ies, it is a cor r ela t e of crim e
dest in a t ion s (e.g., in H ous t on wh er e t h er e a r e fron t a ge r oad s wit h m a jor com m er cial st r ips
t h a t a t t r a ct cr im es). Th e va r ia bles m a y be r ea l p r edict or s. H owever , t h e a n a lyst or
r esea r cher will h a ve difficult y per su a din g ot h er s t o believe in t h e m odel, a t lea st u n t il t h e
results can be replicat ed.
In ot h er wor ds, wh a t is r equ ir ed for t h e m odel is a set of r ea son a ble cor r ela t es of
cr im e t r ip s t h a t wou ld be pla u sible a n d st a ble over t im e. It is a n ecologica l m odel, n ot a
beh a vior a l on e.
So ci al D is org an iza tio n Varia ble s
Th er e is a ver y la r ge lit er a t u r e on t h e pr edict or s of cr im e, t yp ica lly followin g fr om

t h e social disorga n izat ion lit er a t u r e (for exam ple, Pa r k a n d Bu r gess, 1924; Thr a sh er , 1927;
Sh a w a n d McKa y, 1942; Newm a n , 1972; E h r lich , 1975; Coh en a n d F els on , 1979; Wils on
a n d Kelling, 1982; St a ck, 1984; Messn er , 1986; Chir icos, 1987; Koh feld a n d Sp r a gue, 1988;
Bu r sik a n d Gr a sm ick, 1993; H a gan , J . & R. Pet er son, 1994; F owles a n d Mer va, 1996;
Bower s a n d H irs ch field, 1999 am on g ma n y ot h er st u dies). Much of t h is lit er a t u r e
ide n t ifies corr ela t es t h a t a r e a ss ocia t ed wit h crim e in ciden t s. Am ong t h e factor s t h a t h a ve
been a ss ocia t ed wit h crim e a n d d elin qu en cy a r e pover t y, low in com e h ous eh olds ,
overcrowd in g, su bs t a n da r d h ous in g, low ed u cat ion levels, sin gle-pa r en t h ous eh olds, h igh
un employment , minority and imm igra nt populat ions.
Mul ti coli n ea r it y a m on g t h e in d ep en d en t v a r ia bl es
There ar e two sta tistical problems associated with using these var iables as
pr edictor s. Th e first is t h e h igh degr ee of over la p bet ween t h e va r ia bles . Zones t h a t h a ve
h igh pover t y levels t yp ica lly a ls o h a ve low h ou seh old in com e levels , h igh er popu la t ion
den sit ies, s u bst a n da r d h ou sin g, a h igh per cen t a ge of r en t er s, a n d h igh er pr opor t ion of
minority and imm igra nt populat ions. In a r egression m odel, th is overlap cau ses a
condition kn own as m ulticolinearity. Essent ially, th e independent var iables corr elat e so
h ighly am ong t h em selves t h a t t h ey pr odu ce am bigu ous , an d s omet im es st r a n ge, r esu lt s in
a r egres sion m odel. For exam ple, if t wo ind epen den t var iables a r e h igh ly cor r elat ed,
fr equ en t ly on e will h a ve a positive coefficient with t h e depen den t var iable wh ile t h e ot h er
13.4
will h a ve a n ega t ive coefficien t ; conver se ly, t h ey some t im es can can cel ea ch ot h er out .
Th u s, in sp it e of t h e cor r ela t es wit h crim e levels , in a m odel it is u su a lly best t o elimin a t e
co-lin ear var iables. The result is th at simple var iables usua lly end up being the m ost
st r a igh t for wa r d t o u se (popu lat ion , med ian h ou seh old incom e) with m a n y of t h e su bt le, but
t h eor et ica lly r eleva n t , va r ia bles t yp ica lly d r oppin g ou t of t h e equ a t ion .
Fa i l u r e t o d i s ti n gu i sh or i gi n s fr om d es ti n a t i on s
Secon d, in m u ch of t h is lit er a t u r e, however , th er e is not a clear dist inction bet ween

or igin p r edict or s a n d des t ina t ion pr edict or s. Tha t is, in m ost ca ses, t h e cor r elat es of cr imes
wer e ide n t ified bu t it is oft en u n clear wh et h er t h ese cor r ela t es a r e a ss ocia t ed wit h t h e
n eigh bor h oods of t h e offen der s (or igin s) or t h e loca t ion s wh er e t h e cr im es occu r
(dest in a t ion s). Th is ca n r esu lt in a set of va gu e cor r ela t es wit h ou t clea r dir ect ion a bou t
wh et h er t h e va r ia bles a r e a ss ocia t ed wit h pr odu cing or a t t r a ctin g cond it ion s. In fact , in
mu ch of th e ear ly literat ur e on social disorgan izat ion, it was implicitly assu med th at
crim es a r e pr odu ced in t h e n eigh borh oods wh er e t h e offend er s lived , a link a ge t h a t is
in crea sin gly becom in g discon n ected. For m odelin g cr im e t r ips , however , it is es sen t ia l t h a t
t h e pr edict or s of or igin s be kept sepa r a t e fr om t h e pr edict or s of dest in a t ion s.
Accu racy an d Reliability
A tr ip gen er a t ion m odel s h ould be a ccu r a t e a n d r elia ble. Accuracy m ea n s t h a t t h e

m odel s h ou ld r eplica t e a s clos ely a s possible t h e a ct u a l n u m ber of t r ip s or igin a t in g or
en din g in zon es a n d t h a t t h er e sh ou ld be no bia s (which is a s yst em a t ic u n der - or over-
es t im a t in g of t r ips ). R eliability m ea n s t h a t t h e a m oun t of er r or is m in im ized.
Th ese crit er ia h a ve two im plica t ion s wh ich a r e somewh a t a t odds . Fir st , we ha ve to

ch oos e m od els t h a t r ep lica t e a s clos ely a s pos sible t h e n u m ber of t r ip s or igin a t in g or en d in g
in a zon e. In gener a l, th is would be a m odel t h a t h a d t h e h ighes t overa ll pr edicta bility. But ,
secon d, we h a ve t o ch oose m odels t h a t m in im ize t ot a l p r edict ion er r or s. Th is a llows a
m odel t o rep lica t e t h e n u m ber of t r ips for a s m a n y zon es a s p ossible. Th e t wo cr it er ia a r e
somewh a t con t r a dict or y beca u se crim e t r ips a r e h igh ly sk ewed. Tha t is, a h a n dful of zon es
will h a ve a lot of cr im es or igin a t in g or en din g in t h em wh ile m a n y zon es will h a ve few or
n o cr im es. Th e zones wit h t h e m ost crim es will h a ve a dis pr opor t ion a t e im pa ct on t h e fina l
m odel. Th u s, a m odel t h a t obta in s a s h igh a pr ediction a s p ossible (i.e., h igh es t log-
lik elih ood or R 2 ) m a y actu a lly on ly pr edict a ccu r a t ely for a few zon es a n d m a y be ver y
wr ong for t h e m a jor it y.
Th e st r a t egy, th er efor e, is t o obt a in a m odel t h a t ba lan ces h igh pr edict a bilit y but by
k eep in g t h e t ota l pr ediction er r or low.
Co u n t Mo de l
An ot h er elem en t of t h e m od el is th a t t h e t r ip gen er a t ion m od el is for coun ts (or

volu m es), n ot for r a t es. Th e m odel pr edicts t h e n u m ber of crim es origin a t in g in ea ch origin
zon e a n d t h e n u m ber of cr imes occu r r ing in ea ch dest ina t ion zon e. The m odel cou ld be
13.5
con s t r u ct ed t o p r ed ict r a t es , bu t n or m a lly it is n ot d on e. F or m os t t r a vel d em a n d m od elin g,
a s m en t ion ed in cha pt er 11, th e m odel pr edict s t h e n u m ber of t r ips origin a t in g or en din g in
a zon e. Th u s , t h er e is a crim e prod u ction m odel t h a t pr edict s t h e n u m ber of cr imes
or igin a t in g in ea ch zon e a n d a crim e attraction model th at predicts t he nu mber crimes
Approa ch es Tow ards Trip Gen eration Mode ling
Trip Tab le s
There ar e two classic appr oaches to tr ip genera tion m odeling. The first u ses a trip
table (som et im es called a cross -cla ss ificat ion t a ble or a cat egory a n a lysis ). A t r ip t a ble is a
cross-class ifica t ion m a t r ix. Sever a l pr edictive va r ia bles a r e divided in t o ca t egories (e.g.,
t h r ee level of h ou seh old in com e; fou r levels of veh icle own er sh ip ; t h r ee levels of popu la t ion
den sit y) an d a m ea n n u m ber of t r ips is est im a t ed, u su a lly from a su r vey. F or exa m ple , a
su r vey of h ou seh old incom e m igh t sh ow t h e r elat ion sh ip betw een h ou seh old incom e a n d
t h e n u m ber of t r ips t a ken by in dividu a ls of t h e h ou seh olds. Bas ed on a sa m ple, estim a t es
of t h e average n u m ber of trips per person can be obt a in ed for ea ch in com e level (e.g., 3.4
t r ip s per da y for per son s fr om low in com e h ou seh old s; 4.5 t r ip s per da y for per son s fr om
m edia n in com e h ou seh olds; 6.7 t r ip s per da y for per son s fr om h igh in com e h ou seh olds).
Th ese var iables a r e fu r t h er su bdivided in t o t wo-wa y or t h r ee-wa y cr oss-ta bu lat ion t a bles
(e.g., low in com e a n d m ediu m veh icle own er sh ip; low in com e a n d h igh veh icle own er sh ip).
Table 13.1 illust ra tes a possible t r ip t a ble model in volving t wo var iables. In pr a ct ice, th r ee
or four var iables are used.
Th e m a in r ea son t h a t t r ip t a bles a r e u sed in a t r ip gen er a t ion m odel is beca u se of

t h e n on -lin ea r n a t u r e of t r ip s. P r edict ive va r ia bles a r e u su a lly n ot lin ea r in t h eir effect s on
t h e n u m ber of t r ip s. Th u s, u n less a soph is t ica t ed n on -lin ea r m odel is u sed, s izea ble er r or
ca n be in t r od u ced in a pr ed ict ion . It is u su a lly s a fer t o u s e a t r ip t a ble a p pr oa ch (Or t u za r
a n d Willum sen , 2001). Ther e a r e some m a jor h a n dbooks on t h e t opic (H en scher a n d
Bu t t on , 2002; ITE , 2003). In fact, t h e In st itu t e of Tr a n sport a t ion E n gin eer s pu blishes a
lar ge ha n dbook t h a t gives exten sive tr ip pr odu ct ion a n d t r ip a t t r a ct ion t a bles by deta iled
lan d u ses (ITE, 2003). These t a bles ar e oft en u sed in for m a l environm en t a l review
pr ocesses for sit e a n a lysis a n d a r e fr equ en t ly a ccept ed by cou r t s in lit iga t ion . Th ey a r e n ot
with ou t t h eir p r oblems , however , an d t h er e h a ve been n u m er ou s crit iques of t h e t a bles
(Sh oup , 2002; NCH RP , 1998). Th ey a lso ca n n ot be u sed in a t r a vel dem a n d m odel a n d will
produce err oneous r esults.
Th e pr oblem for cr ime a n a lysis, however, is th a t it is im possible t o obt a in t h ese

da t a . On e can n ot a sk a sa m ple of offend er s h ow ma n y cr im es t h ey u n der t a k e ea ch d a y in
or der t o es t im a t e t h e m ea n exp ect a t ion s for a t a ble. Th u s, on e h a s t o a dop t a m or e in dir ect
a ppr oa ch in m odelin g cr im e pr odu ct ion s a n d a t t r a ct ion s.
A s econ d pr oblem wit h t h e t r ip t a ble a p pr oa ch is it s u se wit h zon a l d a t a . Wh ile it

cou ld be a pp lied t o zona l da t a (e.g., usin g m edia n h ous eh old in com e a n d a ver a ge veh icle
own er sh ip in t a ble 13.1 in st ea d of in divid u a l h ou seh old in com e a n d veh icle own er sh ip ),
13.6
su ch a n a ppr oa ch r equ ires int er pr et a t ion a n d some d egree of a r bitr a r ines s. For exa m ple,
h ow does one su bdivide m edia n h ous eh old in com e? On e per son m igh t in t er pr et it sligh t ly
Ta ble 13.1
Illus tration of Po ssible Trip Table Approach to Trip Gen eration

Avera ge Trips P er Adu lt, Age 16+
Hou seh old in com e
Low Med iu m H igh
0-1 3.2 4.6 6.7

V eh i c l e
Ow nership
2+ 5.4 7.8 8.1
differ en t ly t h a n a n ot h er ; u n lik e sim ple n u m er ica l cou n t s (e.g., 0 veh icle own er sh ip ; 1
veh icle own er sh ip; 2 veh icle own er sh ip), th er e is t oo m u ch va r ia bilit y in cat egorizin g
va r ia bles a t t h e zona l level. 1
OLS Re g re s si on Mo de li ng
Th e s econ d a pp r oa ch is t o u s e a regression fr a m ewor k . In t h is ap pr oa ch , t h e n u m ber

of cr im es eit h er or igin a t in g or en din g in ea ch zon e a r e est im a t ed fr om zon e ch a r a ct er is t ics
u sin g a r egr ession m odel. Th is ca n be wr it t en in a n equ a t ion :
Yi = f(X1 , X2 , X3 ,....Xk ) + , (13.1)
Th e m ea n n u m ber of crim es , Yi (eit h er or igin a t in g or en din g in zon e I), is a fu n ct ion of a

n u m ber of in depen den t va r ia bles , X1 , X2 , X3 ,....Xk for t h ese zon es; th er e a r e k in d ep en d en t
var iables, inclu din g an y con st a n t s. Ther e is also an er r or t er m wh ich r epr esen t s t h e
dis crepa n cy bet ween t h e a ctu a l obser va t ion a n d wh a t t h e m odel pr edicts . Th is is
somet imes called resid u al error sin ce it is t h e differ en ce bet ween t h e obser ved an d pr edict ed
va lu es (O i - Yi ). The fu n ct ion is un specified a n d can be non-lin ea r . 2
Th e t r a dit ion a l ap pr oa ch t o r egres sion m odeling a ssu m ed t h a t t h e indep en den t

va r ia ble a r e lin ea r in t h eir effect on t h e depen den t va r ia ble. Th u s,
Yi = " + $1 X1 + $2 X2 + $3 X3 ....+ $k Xk + , (13.2)
In t h is m odel, th er e a r e K indep en den t var iables a n d one con st a n t t er m (") t h a t n eeds t o be

est im a t ed. For ea ch zone , I, ea ch of t h e in depen den t va r ia bles h a s a weigh t a ss ocia t ed wit h
13.7
it (th e coefficien t s, $). The pr odu ct of th e va lu e of t h e in depen den t va r ia ble t im es it s w eigh t
represent s its effect. The ind ividu a l effect s of ea ch of t h e K indep en den t var iables a r e
su m m ed t o pr odu ce an overa ll est im a t e of t h e depen den t va r ia ble, Y.
Th e m et h od for est im a t in g t h is equ a t ion u su a lly m in im izes t h e su m of t h e squ a r es of

th e residua l errors. Hence, th e procedure is called Ord in ary L east S qu ares (or OLS ). If t h e
equ a t ion is cor r ect ly sp ecified (i.e., all r eleva n t va r ia bles a r e in clud ed), th e er r or t er m , ,,
will be n orm a lly dis t r ibu t ed wit h a m ea n of 0 a n d a con st a n t va r ia n ce, F2 .
P roblem s w ith OLS Reg ress ion Modelin g
H owever , t h er e a r e a n u m ber of m a jor pr oblem s a ssocia t ed wit h OLS r egr ession

m odelin g.
S k e w n e ss of c r im e ev en t s
F ir st , cr im e even t s a r e ext r em ely st a t is t ica lly s k ewed. Som e loca t ion s h a ve a m u ch

h igh er likelih ood of a cr im e even t (eit h er a n origin or a dest in a t ion) t h a n oth er s. F igu r e 13.1
below sh ows t h e n u m ber of cr imes fr om 1993 to 1997 in Balt imore Coun t y th a t occu r r ed a t
ea ch loca t ion . Tha t is, th e gra ph sh ows t h e n u m ber of inciden t s t h a t occu r r ed a t every
loca t ion , plot t ed in d ecr ea sin g or der of fr equ en cy. Thu s, t h er e wer e 7,965 loca t ion s wh er e
on ly on e cr ime occu r r ed bet ween 1993 an d 1997. Ther e wer e 2,878 loca t ion s wh er e t wo
crim es occur r ed in t h a t per iod. Th er e wer e 1,138 locat ion s wh er e t h r ee cr im es occur r ed in
t h a t per iod . At t h e ot h er en d of t h e spect r u m , t h er e wer e 332 loca t ion s t h a t h a d 10 or m or e
crim es du r in g t h e per iod a n d t h er e we r e 97 loca t ions t h a t h a d 30 or m ore crim es occur . If
we a dd t o t h is t h e very lar ge nu m ber of loca t ion s t h a t h a d n o cr imes occu r , th e u n equ a l
likelih oods of crim e by loca t ion is even m ore d r a m a t ic. In oth er words , t h e da t a a r e h ighly
sk ewed wit h r espect t o t h e fr equ en cy of cr im es. Most loca t ion s eit h er h a d n o cr im es occu r
or ver y few, wh ile a few loca t ion s h a d m a n y cr im es occu r .
Aggr egat ing cr imes int o zon es t en ds t o r edu ce som e of t h e sk ewn ess. F or exam ple,
gr ou pin g t h e cr im es by or igin t r a ffic a n a lysis zon e (TAZ) r edu ced it a lit t le bit . N in et een of
t h e 525 or igin zon es in Ba lt im or e Cou n t y a n d Ba lt im or e Cit y d id n ot h a ve a n y cr im es occu r
in t h em wh ile 15 zon es h a d on ly on e cr im e occu r . Six zon es h a d t wo cr im es or igin a t e fr om
t h em wh ile 8 zon es h a d t h r ee cr imes or igina t e fr om t h em . At t h e ot h er en d, 1 zon e h a d 738
crim es origin a t e from it a n d a n oth er zon e h a d 53 3 origin a t e from it . Of th e 525 origin
zon es, 155 h a d 100 or m or e cr im e even t s. S im ila r r esu lt s a r e fou n d for t h e dest in a t ion
zon es. Figur e 13.2 gr a ph s t h e dist r ibut ion of or igins a n d des t ina t ion s by TAZ’s in bins of 50
in cid en t s ea ch .
Skewness in t he dependent var iable usua lly ma kes th e final model biased an d
u n r elia ble. P a r t icula r ly if th e sk ewn es s is posit ive (i.e., a h a n dfu l of cas es h a ve ver y la r ge
valu es), t h e r esu ltin g regr ession coefficient s will r eflect t h e ca ses wit h t h e h igh est valu es
r a t h er t h a n r epr esen t a ll t h e ca ses wit h a ppr oxim a t ely equa l weigh t s. Thes e so-ca lled
‘ou t liers’ ca n overwh elm a r egres sion equa t ion . In a n extr em e ca se, a ver y la r ge ou t lier m a y
t ota lly det er m in e t h e m odel. For exa m ple, an exper im en t wit h 100 ca ses wa s cr ea t ed wit h a
13.8
Figure 13.1:
Frequency Distribution of Baltimore Crimes:

1993-97
8000
Number of locations
6000
4000
2000
0
0 5 10 15 20 25 30+
Number of incidents
Figure 13.2:
Skewness in Crime Origins and Destinations:

Baltimore County: 1993-97
125
100
Number of TAZs
75
50
25
0
0 100 200 300 400 500 600 700
Number of events per TAZ
Origins Destinations
progressing dependent var iable and a r a n d o m ind epen den t var iable (i.e., t h e indep en den t
var iable h a d it s valu e select ed r a n domly). The depen den t var iable pr ogres sed from 1 t o
100. F or t h e firs t 99 cases, t h e in depen den t va r ia ble t ook va lu es from 0.12 t o 9.9, ra n dom ly
a ss ign ed. Th e cor r ela t ion bet ween t h es e t wo var ia bles for t h e first 49 ca se s w a s 0.04.
H owever, for t h e 100 t h ca se, th e indep en den t var iable wa s given a valu e of 100. Th e
cor r ela t ion bet ween t h e t wo var ia bles n ow sh ot u p t o 0.17. Even t h ough t h e F -test for t h is
wa s n ot s ignifican t , it r epr esen t ed a sizea ble ju m p. Repla cing one oth er in depen den t va lu e
wit h a 50 ca u se d t h e cor r ela t ion t o ju m p t o 0.23, wh ich w a s s t a t ist ically s ign ifica n t . In
ot h er wor ds , t wo ou t lier s ca u sed a r a n dom s er ies to a pp ea r sign ifica n t !
Sk ewn ess m a k es pr edict ion difficu lt . Th e OLS m odel a ssu m es t h a t ea ch

independent var iable cont ributes to th e dependent var iable at a n a rith met ic ra te; th ere is a
con s t a n t slop e s u ch t h a t a on e u n it ch a n ge in t h e in d ep en d en t va r ia ble is a ss ocia t ed wit h a
con s t a n t ch a n ge in t h e d ep en d en t va r ia ble. Wit h s kewn es s, on t h e ot h er h a n d, s u ch a
relationsh ip will not be foun d. Large cha nges in t he independent var iable will be necessary
t o pr odu ce sm a ll ch a n ges in t h e depen den t var iable, but t h e effect is not con st a n t . In oth er
wor ds, t h e OLS m odel t ypica lly ca n n ot explain t h e n on -linea r ch a n ges in t h e depen den t
var iable. 3
N eg a t i ve p r e d i ct i on s
A s econ d pr oblem wit h OLS is t h a t it ca n h a ve n ega t ive p r ed ict ion s . Wit h a cou n t
va r ia ble, su ch a s t h e n u m ber of crim es origin a t in g or en din g in a zon e, t h e m in im u m
nu mber is zero. That is, th e coun t var iable is always positive, bein g bou n ded by 0 on t h e
lower lim it a n d some la r ge nu m ber on t h e u pper lim it. Th e OLS m odel, on t h e ot h er h a n d,
can produce negat ive predicted values since it is additive in t he independent var iables.
Th is clea r ly is illogica l a n d is a m a jor pr oblem wit h da t a t h a t a r e ver y s k ewed. If t h e m ost
com m on va lu e is close t o zero, it is ver y poss ible for a n OLS m odel t o pr edict a n ega t ive
cou n t .
N on -con sis ten t s u m m a ti on
A t h ir d pr oblem wit h t h e OLS m od el is t h a t t he s u m of t h e in p u t va lu es do n ot

n ecessa r ily equ a l t h e su m of t h e pr edict ed va lu es. Sin ce t h e est im a t e of t h e con st a n t a n d
coefficien t s is obt a in ed by m in im izin g t h e s u m of t h e s qu a r ed r es id u a l er r or s , t h er e is n o
ba la n cin g m ech a n is m t o r equ ir e t h a t t h ey a dd u p t o t h e sa m e a s t h e in pu t va lu es. F or a
t r ip gen er a t ion m odel in wh ich t h e n u m ber of pr edict ed or igin s h a s t o equ a l t h e n u m ber of
pr edict ed des t ina t ion s (after a ddin g in t h e n u m ber of pr edict ed exter n a l tr ips), t h is ca n be a
big p roblem . In ca libr a t in g t h e m od el, a dju st m en t s ca n be m a de t o t h e con st a n t t er m t o
for ce t h e s um of t h e p red ict ed va lu es to be equ a l t o t h e s um of t h e in pu t va lu es . Bu t in
a pplying t h a t con st a n t a n d coefficient s t o a n ot h er da t a set , th er e is no gua r a n t ee t h a t t h e
con sis t en cy of s um m a t ion will h old . In ot h er wor ds , t h e OLS m et h od ca n n ot gu a r a n t ee a
consistent set of predicted values.
13.11
N on -l i n ea r e ffec t s
A fou r t h p r oblem wit h t h e OLS m od el is t h a t it a s su m es t h e in d ep en d en t va r ia bles

a r e lin ea r in t h eir effect . If t h e depen den t var iable wa s n or m a l or r elat ively ba lan ced, th en
a linear model might be appr opriat e. But, when th e dependent var iable is highly skewed,
a s is seen wit h t h ese da t a , t ypically th e a dd it ive effects of ea ch com pon en t can n ot u su a lly
a ccou n t for t h e n on -lin ea r it y. In depen den t va r ia bles h a ve t o be t r a n sfor m ed t o a ccou n t for
t h e n on-lin ea r it y a n d t h e r es u lt is ofte n a com ple x equ a t ion wit h n on-in t u it ive
relationsh ips. 4 It is fa r bet t er t o u se a n on -linea r m odel for a h igh ly sk ewed dep en den t
var iable.
G r ea t e r r es id u a l er r or s
The final problem with an OLS model an d a skewed dependent var iable is th at th e
m odel t en ds t o over - or u n der -pr edict t h e cor r ect va lu es, bu t r a r ely com es u p wit h t h e
cor r ect est im a t e. Wit h sk ewed da t a , t yp ica lly a n OLS equ a t ion pr odu ces n on -con st a n t
r es idu a l er r ors . Th a t is, on e of t h e m a jor a ss u m pt ions of t h e OLS m odel is t h a t a ll r eleva n t
va r ia bles h a ve been in clud ed. If th a t is t h e cas e, t h en t h e er r ors in pr ediction (th e r esidu a l
er r or s - th e differ en ce bet ween t h e obser ved an d pr edict ed valu es) should be u n cor r elat ed
with t h e pr edict ed valu e of t h e depen den t var iable. Violat ion of t h is con dit ion is ca lled
heteroscedasticity beca u se it in dica t es t h a t t h e r esid u a l va r ia n ce is n ot con st a n t . Th e m ost
com m on t yp e is an in cr ea se in t h e r es id ua l er r or s wit h h igh er va lu es of t h e p red ict ed
depen den t va r ia ble. Th a t is , t h e r esid u a l er r or s a r e gr ea t er a t t h e h igh er va lu es of th e
pr edict ed depen den t va r ia ble t h a n a t lower va lu es (Dr a per a n d Sm it h , 1981, 147).
A h igh ly sk ewed dist r ibut ion t en ds t o en cou r a ge th is. Beca u se t h e leas t squ a r es
procedure m inimizes the sum of th e squa red residuals, th e regression line balances the
lower r esidu a ls wit h t h e h igher r esidu a ls. Th e r esu lt is a r egr ession line t h a t n eit h er fit s
t h e low valu es or t h e h igh valu es. For exa m ple, motor vehicle cr a sh es t en d t o con cen t r a t e
a t a few locat ion s (cra sh h ot s pot s). In est im a t in g t h e r ela t ion sh ip bet ween t r a ffic volu m e
a n d cr a sh es, t h e h ot s pot s t en d t o un du ly influen ce th e r egr ession line. Th e r esu lt is a line
t h a t n eit h er fit s t h e n u m ber of expect ed cr a sh es a t m ost loca t ion s (wh ich is low) n or t h e
n u m ber of expect ed cra sh es a t t h e h ot spot loca t ion s (which a r e h igh ). The line en ds u p
over -es t im a t in g t h e n u m ber of cr a sh es for m os t loca t ion s a n d u n der -es t im a t in g t h e n u m ber
of cra shes at th e hot spot locat ions.
P o is so n Re g re s si on Mo de li ng
P ois son r egr ession is a n on -lin ea r m odelin g m et h od t h a t over com es som e of th e

pr oblem s of OLS r egr ession . It is p a r t icula r ly su it ed t o coun t da t a (Cam er on a n d Tr ivedi,
1998). In t h e m odel, t h e n u m ber of even t s is m odeled a s a P oiss on r a n dom va r ia ble wit h a
pr obabilit y of occu r r en ce bein g
e -8 8Yi
P r ob (Yi ) = ------------ (13.3)
Yi !
13.12
wh er e Yi is t h e cou n t for one gr oup or cla ss , i, 8 is th e mean coun t over all groups, and e is
t h e ba se of t h e n a t u r a l logar it h m . Th e dist r ibu t ion h a s a sin gle pa r a m et er , 8, wh ich is bot h
t h e m ea n a n d t h e va r ia n ce of th e fun ction .
Th e “law of r a r e event s” a ssu m es t h a t t h e t ot a l nu m ber of event s will a ppr oxim a t e a

P ois son d is t r ibu t ion if a n even t occur s in a n y of a la r ge n u m ber of t r ia ls bu t t h e pr obabilit y
of occu r r en ce in a n y given t r ia l is sm a ll (Ca m er on a n d Tr ivedi, 1998). Th u s, t h e P ois son
dis t r ibu t ion is ver y a ppr opr ia t e for t h e a n a lysis of r a r e even t s su ch a s cr im e in cid en t s (or
m ot or veh icle cr a sh es or r a r e dis ea ses or a n y ot h er r a r e even t ). Th e P ois son m odel is n ot
pa r t icu la r ly good if t h e pr oba bilit y of a n even t is m or e ba la n ced; for t h a t , t h e n or m a l
dis t r ibu t ion is a bet t er m odel a s t h e sa m plin g dist r ibu t ion will a pp r oxim a t e n orm a lit y wit h
increa sin g sam ple size. Figu r e 13.3 illust r a t es t h e Poisson dist r ibut ion for differ en t
expected mea ns.
Th e m ea n ca n , in t u r n , be m odeled a s a fu n ct ion of som e ot h er va r ia bles (t h e

in depen den t va r ia bles ). Given a set of obser va t ions on d epen den t va r ia bles , Xk i (X1 , X2 ,
X3 ,...,XK ), th e cond itional m ean of Yi ca n be specified a s a n expon en t ia l fu n ct ion of t h e X’s:
Xki $
E (Yi / Xk i ) = 8i = e (13.4)
wh er e Xk i is a set of in depen den t va r ia bles , $ is a set of coefficient s, an d e is t h e ba se of th e

n a t u r a l loga r it h m .. Now, t h e condit ion a l m ea n (t h e m ea n cont r ollin g for t h e effect s of th e
ind epen den t var iables) is n on -linea r . Equ a t ion 13.4 is somet imes wr itt en a s
Ln (8i ) = Xk i $ (13.5)
a n d is k n own a s t h e loglinear m odel. In m ore fa m iliar n ota t ion , t h is is
Ln (8i ) = " + $1 X1 i + $2 X2 i + $3 X3 i +..........+$k Xk i (13.6)
Th a t is, t h e n a t u r a l log of th e m ea n is a fun ction of K r a n dom va r ia bles .
Note, tha t in t his form ulat ion, ther e is not a ra ndom err or t erm . The data ar e
a ssu m ed t o r eflect t h e Poisson m odel. Ther e ca n be “r esidu a l err or s”, but t h ese a r e
a ss u m ed t o reflect a n in com ple t e specificat ion (i.e., n ot in clud in g a ll t h e r eleva n t va r ia bles .
Als o, s in ce t h e va r ia n ce equ a ls t h e m ea n , it is exp ect ed t h a t t h e r es id u a l er r or s sh ou ld
in cr ea se wit h t h e con dit ion a l m ea n . Th a t is , t h er e is in h er en t h et er osceda st icit y (Ca m er on
a n d Tr ivedi, 1998). Th is is ver y d iffer en t t h a n a n OLS wh er e t h e r esid u a l er r or s a r e
expect ed t o be con st a n t .
Th e m odel is est im a t ed u sin g a m a xim u m likelih ood p r ocedu r e, t ypica lly t h e

Newt on -Ra ph son m et h od. In Appen dix C, Lu c An selin pr esen t s a m or e for m a l t r ea t m en t of
both t h e OLS a n d P oisson r egr es sion m odels , in clud in g t h e m et h ods by wh ich t h ey a r e
estimat ed.
13.13
Figure 13.3:
Poisson Distribution
For Different Expected Means
0.5
0.4 E(Y) = 0.5

Probability of X
0.3 E(Y) = 1
E(Y) = 2
0.2 E(Y) = 3
0.1 E(Y) = 4
0.0
0
12
10
14
Count
Ad v a n t a g e s o f t h e P o i s s o n R e g re s s i o n Mo d e l
Th e P oisson m odel over com es som e of t h e pr oblem s of th e OLS m odel. F ir st , t h e

P oiss on m odel h a s a m in im u m va lu e of 0. It will n ot p r edict n ega t ive va lu es. Th is m a k es it
ide a l for a dis t r ibu t ion in wh ich t h e m ea n or t h e m ost t ypica l va lu e is close t o 0. Second , t h e
P oisson is a fun da m en t a lly sk ewed m odel; t h a t is, it is n on-lin ea r wit h a long ‘r igh t t a il’.
Again, this model is appr opriat e for coun ts of ra re events, such a s crime incidents.
Th ir d, becau se t h e P oisson m odel is est im a t ed by a m a xim u m likelih ood m et h od, t h e

est im a t es a r e a da pt ed t o t h e a ct u a l d a t a . In pr a ct ice, t h is m ea n s t h a t t h e su m of th e
pr edict ed va lu es is vir t u a lly iden t ica l t o t h e su m of t h e in pu t va lu es, wit h t h e except ion of
ver y s ligh t r ou n d in g off er r or . In t h e s u bs equ en t ba la n cin g of t h e p r ed ict ed or igin s a n d t h e
pr edict ed des t ina t ion s, t h is leads t o a m or e st a ble estim a t e sin ce t h e on ly differ en ce bet ween
t h e pr edict ed or igin s a n d pr edict ed dest in a t ion s is t h e n u m ber of t r ip s t h a t com e fr om
ou t side t h e st u dy ar ea (exter n a l tr ips). Since th e exter n a l tr ips a r e a dded t o t h e pr edict ed
or igin s, t h e ba la n cin g op er a t ion is les s p ron e t o a dju st m en t er r or .
F ou r t h , com pa r ed t o t h e OLS m odel, th e Poisson m odel gener a lly gives a bet t er

est im a t e of t h e n u m ber of crim es for ea ch zone . Th e pr oblem of over - or u n der -est im a t in g
t h e n u m ber of in ciden t s for m ost zon es wit h t h e OLS m odel is u su a lly less en ed wit h t h e
Poisson, at least for crime an d oth er ra rer event s. When t he residual errors a re calculat ed,
gen er a lly t h e P oisson h a s a lower t ota l er r or t h a n t h e OLS.
In sh or t , th e Poisson m odel ha s some d esir a ble st a t ist ica l pr oper t ies t h a t m a ke it very
u se ful for pr edictin g crim e in ciden t s (origin s or dest in a t ions ).
P r ob le m s w i th th e P o is so n Re g re s si on Mo de l
On th e oth er ha nd, the Poisson m odel is not perfect. The prima ry problem is tha t
cou n t d at a a r e u s u a lly over-d ispersed .
O v er -d i s p er si on i n th e r es id u a l er r or s
In t h e P ois son dis t r ibu t ion , t h e m ea n equ a ls t h e va r ia n ce. In a P ois son r egr ession
m odel, th e m a t h em a t ica l fu n ct ion , th er efor e, equa t es t h e con dit ion a l mea n (t h e m ea n
con t r ollin g for a ll t h e pr edict or var iables) wit h t h e con dit ion a l va r ian ce. However , most r ea l
da t a ar e over-disper sed; th e var ian ce is gener a lly gr eat er t h a n t h e mea n . Figu r e 13.4 sh ows
t h e dis t r ibu t ion of Ba lt im or e Cou n t y a n d Ba lt im or e Cit y cr im e or igin s a n d Ba lt im or e Cou n t y
cr ime d est ina t ion s by TAZ (r epea t of figu r e 13.2) a n d a lso ind ica t es t h e var ian ce-t o-m ea n
r a t io of ea ch va r ia ble. F or t h e origin dis t r ibu t ion , t h e r a t io of t h e va r ia n ce to t h e m ea n is
14.7; t h a t is , t h e va r ia n ce is 14.7 t im es t h a t of t h e m ea n ! F or t h e d es t in a t ion d is t r ibu t ion ,
t h e r a t io is 401.5!
In oth er words , t h e va r ia n ce is m a n y t im es gr ea t er t h a n t h e m ea n . Most r ea l-wor ld

cou n t d at a a r e s im ila r t o t h is ; t h e va r ia n ce will u s u a lly be a lot gr ea t er t h a n t h e m ea n .
Wha t t h is m ea n s in pr a ct ice is th a t t h e r esidu a l err or s - th e differ en ce bet ween t h e obser ved
13.15
Figure 13.4: Skewness in Crime Origins and Destinations:
Baltimore County, MD 1993-97
Over-dispersion
125
100
Number of TAZs
75
50
25
0
0 100 200 300 400 500 600 700
Number of events per TAZ
Origins Destinations
Origins:
Mean = 75.8
Variance = 7848.8
Ratio of variance to mean = 14.7
Destinations:
Mean = 129.1
Variance = 51,849.1
Ratio of variance to mean = 401.5
a n d pr edict ed valu es for ea ch zon e, will be grea t er t h a n wh a t is expect ed. The P oisson m odel
ca lcu la t es a s ta n da r d er r or a s if t h e va r ia n ce equ a ls th e m ea n . Th u s, t h e s ta n da r d er r or will
be u n der est im a t ed u sin g a P ois son m odel a n d, t h er efor e, t h e sign ifica n ce t est s (t h e
coefficien t divided by t h e st a n da r d er r or) will be gr ea t er t h a n it r ea lly sh ould be. Th is would
h a ve t h e effect of iden t ifyin g var ia bles a s bein g m ore s t a t ist ically sign ifican t in a m odel t h a n
wh a t t h ey act u a lly should be. In oth er words , in a P oisson m u ltiple regress ion m odel, we
wou ld en d u p s elect in g va r ia bles t h a t r ea lly sh ould n ot be select ed beca u se we t h in k t h ey a r e
st a t is t ica lly s ign ifica n t wh en , in fa ct , t h ey a r e n ot .
Anoth er problem with t he Poisson, which is tru e for m ost of th e comm on r egression
m et h ods, is t h e la ck of a sp a t ia l pr edictor com pon en t . As m en t ion ed in cha pt er 12, in t h e
cr im e t r a vel d em a n d m odel, spa t ia l in t er a ct ion is h a n dled du r in g t h e second st a ge of th e
m odel - t r ip d ist r ibu t ion. Th u s, a n y er r ors in t r odu ced in t h e first st a ge - tr ip gen er a t ion, a r e
u su a lly comp en sa t ed for du r in g t h e secon d. N ever t h eles s, t h e in clus ion of a sp a t ia l
com pon en t in a r egr ession m odel wou ld gen er a lly im pr ove t h e pr edict ion . F or t h is ver sion of
Crim eS tat, n on -s pa t ia l m et h od s a r e u s ed for t h e fir s t st a ge.
D is pe rs io n Co rre c ti on P a ra m e te r
Th er e a r e a n u m ber of m et h ods for cor r ect in g t h e over -disper sion in a cou n t m odel.
Most of t h em in volve m odifyin g t h e a ssu m pt ion of t h e condit ion a l va r ia n ce equ a l t o t h e
con dit ion a l m ea n . F or exa m ple, t h e n ega t ive bin om ia l m od el a ss um es a P ois son m ea n bu t a
ga m m a -dist r ibu t ed va r ia n ce ter m (Cam er on a n d Tr ivedi, 1998, 62-63; Ven a bles a n d Riple y,
1997, 242-245). Th a t is , t h er e is a n u n obs er ved va r ia ble t h a t a ffect s t h e dis t r ibu t ion of th e
cou n t . Th e m odel is t h en of a P ois son m ea n bu t wit h a ‘lon ger t a il’ va r ia n ce fu n ct ion . As
a n oth er exa m ple, t h e zer o-in flat ed P oiss on m odel a ss u m es a P oiss on fun ction com bin ed wit h
a degener a t e fu n ct ion with a pr oba bilit y of 1 for zero cou n t s (Ha ll, 2000). Such m ixed
fu n ct ion m od els a r e a cu r r en t t op ic of r es ea r ch . In gen er a l, t h ou gh , t h ey a r e com p lica t ed a n d
require estima ting several par am eters.
Th er e is a sim ple cor r ect ion for over -dis per sion t h a t u su a lly wor k s (Ca m er on a n d
Tr ivedi, 1998, 63-65). The m odel pr oceed s in t wo st eps. In t h e firs t , t h e P oiss on m odel is
fit t ed t o t h e da t a a n d t h e degr ee of over - (or u n der -) dis per sion is est im a t ed. Th e dis per sion
par am eter is defined as:
1 (Yi - P i )2
M = ----------- G {----------------} (13.7)
N - K -1 Pi
wh er e N is t h e sa m ple size, K is t h e n u m ber of in depen den t va r ia bles , Yi is t h e obser ved

nu mber of events t ha t occur in zone I, an d P i is t h e pr edicte d n u m ber of even t s for zone I.
Th e t est is sim ila r t o a n a ver a ge ch i-s qu a r e in t h a t it t a k es t h e squ a r e of th e r esid u a ls (Yi -
P i ) a n d divid es it by t h e pr edict ed va lu es, a n d t h en a ver a ges it by t h e degr ees of fr eedom .
Th e disper sion pa r a m et er is a s t a n da r dized nu m ber . A valu e grea t er t h a n 1.0 in dica t es
over-disper sion wh ile a va lu e of less t h a n 1 in dica t es u n der -disper sion (wh ich is r a r e, t h ough
possible). A valu e of 1.0 in dica t es equ id ispersion (or t h e va r ia n ce equ a ls t h e m ea n ).
13.17
In t h e second st ep, t h e P ois son st a n da r d er r or is m u lt ip lied by t h e squ a r e r oot of th e
disper sion pa r a m et er t o pr odu ce a n ad ju sted stan d ard error:
SE a d j = SE * SQRT[ M ] (13.8)
Th e n ew s t a n da r d er r or is t h en u sed in t h e t -test t o pr odu ce an a dju st ed t -va lu e. Th is

a djus t m en t is fou n d in m ost P oisson r egres sion pa cka ges us ing a Gen er a lized Linea r Model
(GLM) a ppr oa ch , such a s SAS (McCu llagh a n d N elder , 1989, 200). Cam er on a n d Tr ivedi
(1998) h a ve s h own t h a t t h is a dju s t m en t p r od u ces r es u lt s t h a t a r e vir t u a lly id en t ica l t o t h a t
of t h e n ega t ive bin om ia l, bu t in volvin g fewer a ssu m pt ion s.
D i a g n o st i c T ests
Th er e a r e a n u m ber of dia gn ost ics t est s t h a t a r e u sed in a r egr ession fr a m ewor k ,

wheth er OLS, Poisson, or oth er met hods.
Ske w ne ss Tests
F ir st , t h er e a r e t es t s of sk ewn es s in t h e depen den t va r ia ble. As m en t ioned a bove, t h e

OLS m odel ca n n ot be a pplied t o da t a t h a t a r e h igh ly sk ewed. If t h ey a r e sk ewed, a n on -
linea r m odel, such a s t h e Poisson , mu st be us ed. Ther efor e, it is essen t ial t o evalu a t e t h e
degree of skewness.
A com m on ly u s ed m ea s u r e of s kewn es s is t h e g st a t is t ic (Micr osoft , 2000):
n n
Sk ewn ess (g) = -------------------- E [ ( X - Mea n X)/s ]

i
3
(13.9)
(n –1) * (n –2) I=1
wh er e n is t h e sa m ple size, X i is obs er va t ion I, Mean X is th e mean of X, an d s is t h e sa m ple

st a n da r d devia t ion (corr ect ed for degr ees of freedom ):
n ( Xi - X)2
s = SQRT [ E -------------- ] (13.10)

I=1 (n –1)
Th e st a n da r d er r or of sk ewn ess (SE S) ca n be a ppr oxim a t ed by (Ta ba ch n ick a n d

F id ell, 1996):
SES = SQRT [ --------- ] (13.11)

n
13.18
An a pp roxim a t e Z-t es t ca n be obt a in ed fr om :
g
Z(g) = ----------- (13.12)
SE S
Th u s, if Z is gr ea t er t h a n +1.96 or sm a ller t h a n -1.96, t h en t h e sk ewn ess is sign ifica n t a t t h e

p #.05 level.
As an exa m ple, for t h e d at a on t h e or igin s of cr im es by TAZ in Ba lt im or e Cou n t y:

_
X = 75.108
s = 96.017
n = 325
n
E [ ( X - Mea n X)/s ]
i
3
= 898.391
I=1
Th er efor e,
325
g = --------------- * 898.391 = 2.79
324*323
SES = SQRT [ --------- ] = 0.136

325
Z(g) = 20.51
Th e Z of th e g va lu e sh ows t h e da t a a r e h igh ly sk ewed a s w e, of cou r se , kn ew.
Li k e li h o o d R a ti o Te s t
Secon d, t h er e a r e t es t s of th e over a ll m odel. In a m a xim u m likelih ood fr a m ework , t h e

firs t t est is of t h e log-lik elih ood fu n ct ion . A lik elih ood fun ction is t h e join t den sit y of all t h e
obser va t ions , given a va lu e for t h e pa r a m et er s, $, a n d t h e va r ia n ce, F2 . Th e log-likelih ood is
t h e n a t u r a l log of t h is pr odu ct , or t h e su m of t h e logs of t h e in divid u a l d en sit ies. F or t h e
OLS model, th e log-likelihood is:
(Yi - Xk i $k )2
L = - (N/2) ln (2 B) - (N/2) ln (F ) - (½ F ) - (½) [-------------]
2 2
(13.13)
F2
13.19
wh er e N is t h e sa m ple size, F2 is t h e va r ia n ce, Yi is the observed nu mber of event s for zone I,
an d Xk i $k is a s er ies of K ind epen den t pr edict or s m u ltiplied by th eir coefficient s.
In t he Poisson m odel, th e log-likelihood is:
L= G [ -8i + Yi Xk i$k - ln Yi ! ] (13.14)
wh er e 8i is t h e con dit iona l m ea n for zone I, Yi is t h e obser ved n u m ber of even t s for zon e ii,
an d Yi Xk i $k is a cr oss-pr odu ct of t h e obs er ved even t s t im es t h e K in depen den t pr edict or s
m u ltiplied by th eir coefficient s. As m en t ion ed a bove, Luc An selin pr ovides a m or e det a iled
discus sion of t h ese fun ct ion s in App en dix C.
Sin ce th e m a xim u m likelih ood m et h od a chieves t h e m odel wit h t h e h igh es t log-

likelihood, th e log-likelihood is a n egat ive n u m ber . Even t h ou gh t h e m odel with t h e h igh est
log-lik elih ood is con sid er ed ‘bes t ’, it is n ot a n in t u it ive n u m ber . Cons equ en t ly, t h e
Lik elihood R atio com pa r es t h e log-lik elih ood of t h e r egr ession m odel wit h t h e log-lik elih ood
t h a t would be obt a in ed if on ly t h e m ea n n u m ber of cou n t s w a s t a k en . Th is la t t er log-
likelihood is:
L R = -N (Me a n Y) +[ln (Mea n Y) (GYi )] - G ln Yi ! (13.15)
The Likelihood Ratio test is:
LR = 2(L - L R ) (13.16)
where L is th e model log-likelihood an d L R is t h e log-likelihood of t h e m ea n cou n t . The

Lik elih ood Ra t io is t wice t h e differ en ce bet ween log-lik elih ood va lu es of t h e r egr ession a n d
mean models respectively. It follows a P2 dis t r ibu t ion wit h K degr ees of free dom (wher e K is
t h e n u m ber of in depen den t va r ia bles). 5
A d j u s t ed l i k e l i h oo d r a t i o
Th e Likelihood Ra t io is a m or e int u itive index sin ce it is a ch i-squ a r e t est . However ,

it is pr on e t o t h e pr oblem of a ll r egres sion m et h ods of over-fitt ing - t h e m or e indep en den t
var iables a r e a dded t o t h e m odel, th e h igh er is t h e Likelihood Ra t io. Con sequ en t ly, th er e a r e
sever a l m et h ods t h a t a dju st for t h e n u m ber of pa r a m et er s fit . On e is t h e Aka ike
Inform at ion Criterion (AIC) which is defined as:
AIC = -2L + 2 (K+1) (13.17)
wh er e L is t h e log-likelih ood a n d K is t h e n u m ber of in depen den t va r ia bles . A secon d on e is

th e Schwar tz Criterion (SC), which is defined as:
SC = 2L+[(K+1)ln (N)] (13.18)
13.20
Th es e t wo m ea su r es a dju st t h e log-lik elih ood for degr ees of freedom, a n d flip t h e sign
ar oun d. The model with th e highest AIC or SC values ar e ‘best’.
R -s q u a r e Te s t
Th e m ost fa m ilia r t est of a n over a ll m odel is t h e R-s qu a r e (or R 2 ) test. This is th e

per cen t of t h e t ot a l va r ia n ce of t h e depen den t va r ia ble a ccou n t ed for by t h e m odel. Mor e
for m a lly, it is defined a s:
G (Yi - P i )2
R2 = 1 - -------------------- (13.19)
G (Yi - Mea n Y)2
wh er e Yi is t h e obse r ved n u m ber of even t s for a zone, I, P i is t h e pr edict ed n u m ber of event s

given a set of K in depen den t va r ia bles, a n d Mea n Y is t h e m ea n n u m ber of even t s a cr oss
zon es. The R-squa r e valu e is a n u m ber fr om 0 to 1; 0 indicat es n o pr edict a bilit y while 1
in dica t es per fect pr edicta bilit y.
F or a n OLS m odel, R-squ a r e is a ver y con sist en t est ima t e. It in cr ea ses in a lin ea r
m a n n er wit h pr edicta bilit y a n d is , t h er efor e, a good indica t or of h ow effect ive one m odel is
com p ar ed to a n ot h er . As wit h a ll d ia gn os t ic t es t s, t h e va lu e of t h e R-s qu a r e in cr ea ses wit h
m or e in depen den t va r ia bles. Con sequ en t ly, R-squ a r e is u su a lly a dju st ed for degr ees of
fr eedom :
[G (Yi - P i )2 ] / (N-K+1)
2
Ra = 1- ------------------------------- (13.20)
G (Yi - Mea n Y)2 / (N - 1)
where N is the sam ple size an d K is th e num ber of independent var iables.
R -sq u a r e for P oiss on m od el
Wit h t h e P ois son m odel, h owever , t h e R-s qu a r e va lu e (wh et h er a dju st ed or n ot ) is n ot

n ecessa r ily a good m ea su r e of overa ll fit. While th e Poisson R-squ a r e var ies fr om 0 to 1,
sim ila r t o t h e OLS, it is n ot m on ot on ic. Th a t is , t h e a ddit ion of a n ew va r ia ble t o a n
equ a t ion ofte n h a s u n pr edicta ble effect s; some t im es it will in cre a se su bs t a n t ia lly a n d
som et im es it will in cr ea se on ly a lit t le in depen den t of h ow st r on g is a va r ia ble’s a ssocia t ion
wit h t h e depen den t va r ia ble (Mia ou , 1996). Th is in con sis t en cy com es fr om t h e
decomposition of th e tota l sum of squar es:
G (Yi - Mea n Y)2 = G(Yi - P i)2 + G(P i - Mea n Y)2 + 2 G(Yi - P i)(P i - Mea n Y) (13.21)
Th e fir st t er m in t h e equ a t ion is t h e r esid u a l s u m of squ a r es (or er r or t er m ) wh ile t h e secon d

t er m is t h e explain ed su m of squ a r es. In a n OLS m odel, th e t h ird t er m is zero if a n int er cept
is in clud ed (Cam er on a n d Tr ivedi, 1998, 153). H en ce, th e t ota l su m of squ a r es is br oken in t o
t wo pa r t s - t h a t wh ich is exp la in ed a n d t h a t wh ich is u n exp la in ed. H owever , for t h e P ois son
13.21
a n d oth er n on -linea r r egres sion m et h ods, t h e las t t er m is not zer o. Con sequ en t ly, a t est t h a t
com p a r es t h e exp la in ed su m of s qu a r es t o t h e t ot a l s u m of s qu a r es will n ot p r od u ce
con sis t en t r es ult s .
Cons equ en t ly, alt er n a t ive R-squ a r e m ea su r es a r e somet im es u sed. On e of t h ese is

Devian ce R -squ are. It is defined as:
G[Yi * Ln{ P i / Mea n Y } – (Yi - P i ) ]

2
RD = 1 - ------------------------------------------------ (13.22)
G [Yi * Ln{ Yi / Mea n Y
wh er e Yi is t h e obse r ved n u m ber of even t s for ea ch zon e, I, P i is t h e pr edict ed n u m ber of

event s for ea ch zon e bas ed on K ind epen den t pr edict or s, an d Mea n Y is t h e m ea n n u m ber of
even t s a cross a ll zones.
Th e Devia n ce R-s qu a r e m ea su r es t h e r edu ct ion in t h e Lik elih ood Ra t io d u e t o t h e

in clus ion of pr edictor va r ia bles . It pr odu ces a sligh t ly differ en t R-squ a r e, one t h a t is
t yp ica lly h igh er t h a n t h e t r a dit ion a l R-squ a r e. Wh er ea s t h e t r a dit ion a l on e m igh t n ot sh ow
a lar ge in cr ea se u pon t h e int r odu ct ion of a n ind epen den t var iable, th e Devia n ce R-squ a r e
often does show the increase.
Ne vert h eless, it h a s p r oblem s, t oo. Miaou (1996) a r gues t h a t t h er e is n ot a sin gle R-

sq u a r e in dex t h a t is p er fect ly con sis t en t a n d s u ggest s t h a t u se r s n eed t o us e m u lt iple ones.
Th er e a r e ot h er R-s qu a r e va lu es t h a t h a ve been pr oposed, bu t t h ese t wo a r e su fficien t for
n ow. In s h or t , a u ser m u st look a t bot h a s a n ind ica t or of h ow good is a m odel com pa r ed t o
a n oth er m odel.
D is pe rs io n P a ra m e te r
F ina lly, in t h e Poisson m odel on ly, th e disper sion pa r a m et er ind ica t es t h e exten t t o
wh ich t h e var ian ce is differ en t fr om t h e m ea n . This wa s defined in equa t ion 13.7 above.
Coefficien ts, Stan dard Errors, and Sign ifican ce Tes ts
Th e secon d t ype of diagn ost ic t est s a r e t h ose for t h e in dividu a l pr edictor s in t h e

model. In both th e OLS an d Poisson m odels, th ere ar e thr ee tests:
1. Th e coefficient . This ind ica t es t h e ch a n ge in t h e depen den t var iable a ssociat ed

wit h t h e cha n ge in t h e in depen den t va r ia ble. In t h e cas e of t h e OLS , it is a
lin ea r t er m (i.e., th e va lu e of t h e depen den t va r ia ble is m u lt iplie d by t h e
coefficien t ) wh ile in t h e P oisson m odel, it h a s t o be con ver t ed by r a isin g t h e
pr odu ct t o an exponen t ia l t er m (i.e., e$X ).
2. Th e st a n da r d er r or . E a ch est im a t ed coefficien t in a m odel a ccou n t s for som e of

t h e va r ia n ce in t h e depen den t va r ia ble. Th is va r ia n ce is t h e con t r ibu t ion of
13.22
t h e pa r t icula r in depen den t va r ia ble t o th e va r ia n ce of th e depen den t va r ia ble.
Th e squ a r e r oot of t h a t var ian ce is th e stan d ard error.
3. Th e sign ifica n ce level. Th e r a t io of t h e coefficien t t o t h e st a n da r d er r or

pr odu ces a significa n ce t est of t h e coefficient . In t h e OLS m odel, it is a t -t est
wit h N-K-1 de gr ees of free dom wh er ea s in t h e P oiss on m odel it is a n
a sym pt ot ic t -t est , which is effect ively a Z-t est . The a ppr opr iat e t a bles (t -t est
or st a n da r d n or m a l) p rod uce a pp roxim a t e p roba bilit y levels of a Typ e I er r or
(t h e lik elih ood of fals ely r eject in g a t r u e n u ll h yp ot h esis of no r ela t ion sh ip ).
Testing for Multicoline arity
On e of t h e m a jor pr oblem s wit h a n y r egr ession m odel, wh et h er OLS or P oiss on, is

m u lt icolin ea r it y a m ong t h e in depen den t va r ia bles . In t h eory, ea ch in depen den t va r ia ble
sh ou ld be st a t is t ica lly in depen den t of t h e ot h er in depen den t va r ia bles. Th u s, t h e a m ou n t of
va r ia n ce for t h e depen den t va r ia ble t h a t is a ccoun t ed for by ea ch in depen den t va r ia ble
sh ould be a u n iqu e con t r ibu t ion . In pr a ctice, h owever , it is r a r e t o obt a in com plet ely
in depen den t pr edictive va r ia bles . More like ly, tw o or m ore of th e in depen den t va r ia bles will
be cor r ela t ed . Th e effect is t h a t t he es t im a t ed st a n d a r d er r or of a p r ed ict or va r ia ble is n o
lon ger u n ique sin ce it sh a r es some of t h e var ian ce with ot h er ind epen den t var iables. The
gr ea t er t h e com m un ality of t h e var ian ces, th e more a m bigu ou s t h e pr edict ed effect s. If t wo
va r ia bles a r e h igh ly cor r ela t ed, it is n ot clea r wh a t con t r ibu t ion ea ch m a k es t owa r ds
pr edict ing t h e depen den t var iable. In effect , mu lticolinea r ity m ea n s t h a t var iables a r e
m ea su r in g t h e s am e effect .
Mu lt icolin ea r it y a m ong t h e in depen den t va r ia bles can pr odu ce very st r a n ge effects in

a r egr ession m odel. Am ong t h ese effects a r e: 1) If two ind epen den t va r ia bles a r e h ighly
cor r ela t ed , bu t on e is m or e cor r ela t ed wit h t h e d ep en d en t va r ia ble t h a n t h e ot h er , t h e
st r on ger one will u su a lly h a ve a cor r ect sign wh ile t h e wea ker on e will somet imes get flipped
a r ou n d (e.g., fr om positive to negat ive, or t h e r evers e); 2) Two var iables can ca n cel each oth er
ou t ; ea ch coefficient is significa n t wh en it a lon e is inclu ded in a m odel but n eith er a r e
sign ifica n t wh en t h ey a r e t oget h er ; 3) On e in depen den t va r ia ble ca n in h ibit t h e effect of
a n ot h er cor r elat ed ind epen den t var iable so th a t t h e secon d var iable is n ot significa n t wh en
com bin ed wit h t h e firs t one; a n d 4) If t wo ind epen den t va r ia bles a r e vir t u a lly per fect ly
cor r elat ed, m a n y regr ession r ou t ines br ea k d own beca u se t h e m a t r ix ca n n ot be inver t ed.
All t h es e effect s in d ica t e t h a t t h er e is n on -in d ep en d en ce a m on g t h e in d ep en d en t

var iables. Aside from pr odu cing con fu sin g coefficient s, m u lticolinea r ity ca n overs t a t e t h e
a m oun t of pr edict ion in a m odel. Sin ce ever y in depen den t va r ia ble a ccoun t s for som e of th e
var ian ce of t h e depen den t var iable, with m u lticolinea r ity, th e overa ll m odel will a ppea r t o
improve when it pr obably ha sn’t.
Toler a nce t est
A u s er h a s t o be a wa r e of t h e p r oblem of m u lt icolin ea r it y a n d s eek t o m in im ize it .

Th e sim plest solu t ion is t o dr op var iables t h a t a r e co-linea r with ot h er ind epen den t var iables
13.23
a lrea dy in t h e equa t ion . A r elat ively sim ple t est for a sses sin g th is is ca lled t olera n ce.
Tolera nce is defined as lack of predictability of ea ch ind epen den t var iable by th e ot h er
in depen den t va r ia bles, or :
Tol = 1 - (R ijk ..l )2 (13.23)
wh er e (R ijk ..l )2 is t h e R-sq u a r e of a n equ a t ion wh er e in depen den t va r ia ble I is p r edicte d by t h e

oth er in depen den t va r ia bles , j, k, l, an d s o fort h . Th a t is, ea ch in depen den t va r ia ble in t u r n
is r egres sed a gain st t h e ot h er ind epen den t var iables in t h e equa t ion . The R 2 a ss ocia t ed wit h
t h a t m odel is s u bt r a cted fr om 1. Th e h igher t h e t oler a n ce level, th e les s a pa r t icula r
independent var iable shar es its var iance with th e oth er independent var iables.
F i x e d Mo d e l v s . S t e p w i s e Va r ia b le S e l e c t i o n
Th er e a r e sever a l st r a t egies design ed t o red u ce mu lt icolin ea r it y in a m odel. On e is t o

st a r t with a defin ed m odel a n d elimin a t e t h ose var iables t h a t h a ve a low t olera n ce. The
t ot a l model is est ima t ed a n d t h e coefficient s for ea ch of t h e var iables a r e est ima t ed a t t h e
sam e time. This is somet imes called a fixed m od el. Then , va r iables t h a t a r e co-linea r a r e
r em oved fr om t h e equ a t ion , a n d t h e m odel is r e-r u n .
An ot h er s t r a t egy is t o es t im a t e t h e coefficien t s a st ep a t a t im e, a p r oced u r e k n own a s

stepwise r egres sion . Ther e a r e severa l st a n da r d st epwise pr ocedu r es. In t h e firs t pr ocedu r e,
va r ia bles a r e a dded one a t a t im e (a forw ard selection m odel). Th e in depen den t va r ia ble
h a vin g t h e st r ongest lin ea r cor r ela t ion wit h t h e depen den t va r ia ble is a dd ed firs t . Next , t h e
independent var iable from t he rem aining list of independent var iables with th e highest
cor r ela t ion wit h t h e depen den t va r ia ble, con trollin g for t h e one va r ia ble a lr ea dy in t h e
equ a t ion , is a dded n ext a n d t h e m odel is re-est ima t ed. In ea ch st ep, th e indep en den t
va r ia ble wit h t h e h igh est cor r ela t ion wit h t h e depen den t va r ia ble con t r ollin g for t h e
va r ia bles a lr ea dy in t h e equ a t ion is a dd ed t o th e m odel, a n d t h e m odel is r e-est im a t ed. Th is
pr oceeds u n t il eith er a ll t h e indep en den t var iables a r e a dded t o t h e equa t ion or else a
st oppin g cr it er ion is m et . Th e u su a l cr it er ion is on ly va r ia bles wit h a cer t a in sign ifica n ce
level ar e a llowed t o en t er (ca lled a p-to-en ter).
A back w ard elim ina tion procedure work s in r everse. All independent var iables are
in it ia lly a dd ed t o th e equ a t ion. Th e va r ia ble w it h t h e we a k es t coefficien t (as define d by t h e
sign ifican ce level) is r em oved, an d t h e m odel is r e-est im a t ed. Next , t h e va r ia ble wit h t h e
wea k est coefficien t in t h e secon d m odel is r em oved, an d t h e m odel is r e-est im a t ed. Th is
pr ocedu r e is rep ea t ed u n t il eith er t h er e a r e n o m or e indep en den t var iables left in t h e m odel
or else a s t oppin g cr iter ion is m et . The u su a l cr iter ion is t h a t a ll r em a inin g va r iables pa ss a
cer t a in s ign ifica n ce level (ca lled a p-to-rem ove).
Th er e a r e com bin a t ion s of t h ese pr ocedu r es, for exa m ple a dd in g a va r ia ble in a
for wa r d s elect ion m a n n er bu t t h en r em ovin g a n y va r ia bles th a t a r e n o lon ger sign ifica n t or
u sin g a ba ckwa r d elim in a t ion pr ocedu r e bu t a llowin g new va r ia bles t o ent er t h e m odel if
t h ey s u dden ly becom e sign ifica n t .
13.24
Th er e a r e a dva n t a ges t o ea ch a ppr oa ch . A fixed m odel a llows sp ecified var iables t o be
in clu d ed . If eit h er t h eor y or p r eviou s r es ea r ch h a s in d ica t ed t h a t a p a r t icu la r com bin a t ion of
var iables is im port a n t , th en t h e fixed m odel a llows t h a t t o be tes t ed. A st epwise pr ocedu r e
m igh t dr op on e of t h ose va r ia bles. On t h e ot h er h a n d, a st epwis e pr ocedu r e u su a lly ca n
obta in t h e sa m e or h igher pr edicta bilit y t h a n a fixed p r ocedu r e (wh et h er pr edicta bilit y is
m ea su r ed by a log-lik elih ood or a n R-squ a r e).
Wit h in t h e st epwis e pr ocedu r es, t h er e a r e a ls o a dva n t a ges a n d dis a dva n t a ges t o ea ch

m et h od, th ou gh t h e differ en ces a r e gener a lly very sm a ll. A for wa r d selection pr ocedu r e a dds
va r ia bles one a t a t im e. Th u s, t h e con t r ibu t ion of each n ew va r ia ble ca n be s een . On t h e
ot h er h a n d, a va r ia ble t h a t is sign ifica n t a t a n ea r ly st a ge cou ld becom e n ot sign ifica n t a t a
la t er st a ge beca u se of t h e u n iqu e com bin a t ion s of va r ia bles. Sim ila r ly, a ba ckwa r d
elimina t ion pr ocedu r e will en su r e t h a t a ll var iables in t h e equa t ion m eet a specified
significa n ce level. But , th e con t r ibut ion of ea ch var iable is n ot ea sily seen ot h er t h a n
t h r ou gh t h e coefficient s. In pr a ct ice, on e u su a lly obt a ins t h e sa m e m odel with eith er
pr ocedu r e, so t h e differ en ces a r e n ot t h a t crit ical.
A st epwise pr ocedu r e will n ot gua r a n t ee t h a t m u lticolinea r ity will be rem oved

en t irely. H owever, it is a good pr ocedu r e for n a r r owing down t h e var iables t o t h ose t h a t a r e
significa n t . Then , an y co-linea r var iables can be dr opped m a n u a lly a n d t h e m odel r e-
est ima t ed. In t h e Crim eS tat t r ip gen er a t ion , bot h a fixed m odel a n d a ba ckwa r d elim in a t ion
procedure ar e allowed.
Altern at iv e Re gre ss io n Mode ls
Th er e a r e a n u m ber of a lt er n a t ive m et h ods for est im a t in g t h e lik ely va lu e of a cou n t

given a set of in depen den t pr edictor s. Th e n ega t ive bin omia l h a s a lr ea dy been m en t ioned.
Th er e a r e a n u m ber of va r ia t ion s of t h ese in volvin g d iffer en t a ssu m pt ion s a bou t t h e
dis per sion t er m . Th er e a r e a lso a n u m ber of differ en t P oisson-t ype m odels . Am ong t h es e a r e
t h e zer o-in flat ed P oisson (or ZIP ; ; H a ll, 2000), th e Weibu l fun ction , t h e Ca u chy fu n ction , a n d
t h e lognorm a l fun ction (see NIS T 2004 for a list of com m on n on-lin ea r fun ction s).
Th er e a r e a lso a set of sp a t ia l r egr ession t ype m odels t h a t cor r ect for sp a t ia l

a u t ocorr ela t ion in t h e depen den t va r ia ble, su ch a s geogr a ph ically-weigh t ed r egr ession u sin g
a P oiss on fun ction (Foth er in gh a m , Br u n sd on, a n d Ch a r lt on, 2002), a h ier a r chical Ba yesia n
m odel (Cla yt on a n d Ka ld or , 1987), a n d a Ma r k ov Ch a in Mon t e Ca r lo s im u la t ion m et h od
(Miou w, Son g, a n d Ba lilick, 2003).
In fut ur e versions of Crim eS tat, s ever a l of t h ese m et h ods will be in t r odu ced. F or t h e
t ime bein g, t h ou gh, t h e Poisson m odel is ava ilable as it is t h e m ost com m on ly u sed
fu n ct ion a l m od el for fit t in g cou n t d at a .
Ad d in g S pe c ia l Ge n e ra to rs
In a t r a vel dem a n d m odel, th er e a r e special gen erators. These a r e u n ique la n d u ses

or en vir on m en t s t h a t p r od u ce a n ext r a la r ge n u m ber of t r ip s. F or r egu la r t r a vel d em a n d
13.25
m odelin g, st a diu m s, a ir por t s, t r a in st a t ion s, la r ge pa r k s, a n d ‘m ega -m a lls’ gen er a t e m or e
t h a n t h eir sh a r e of t r ip s, or a t lea st t h a n wh a t wou ld be pr edict ed by t h e a m ou n t of
em p loym en t a t t h os e loca t ion s . Th ey a r e u s u a lly a t t r a ct or s , n ot p r od u cer s . In a n or m a l
t r a n spor t a t ion t r a vel d em a n d m odel, t h ese zon es a r e exclu ded fr om t h e cr oss-cla ssifica t ion
a n d in depen den t est im a t es a r e m a de of t h em .
F or cr im e t r ips , t h er e a r e a lso special gen er a t ors . Typically, t h es e a r e zones t h a t h a ve

m or e cr im es bein g a t t r a ct ed t o t h em t h a n a r e exp ect ed on t h e ba sis of t h e popu la t ion a n d
em ploym en t a t t h ose loca t ion s.
Sin ce we a r e u sin g a r egr es sion m odel t o est im a t e t h e pr odu ction s a n d a t t r a ction s, a

sim ple wa y to model a sp ecial gener a t or is t o cr ea t e a sim ple d u m m y var iable. This is a
va r ia ble wh er e zones wit h t h e special gen er a t or get a va lu e ‘1' a n d zones wit h out t h e special
gen er a t or get a ‘0'. E ssen t ia lly, t h e va r ia ble is a cr oss-cla ssifica t ion of t h e specia l gen er a t or
vers u s every oth er zon e.
On e h a s t o be ca u t iou s is doin g t h is , h owever . Typ ica lly, specia l gen er a t or s a r e

iden t ified by ha vin g a gr ea t er n u m ber of cr imes being a t t r a ct ed t o a zon e t h a n is pr edict ed
by th e m odel. In oth er wor ds, t h ey ha ve a gr ea t er positive residu a l err or (obser ved -
pr edict ed) an d a r e ‘ou t liers’ in t h e r esidu a l err or dist r ibut ion . By a ddin g a va r iable t o
expla in t h ose ca se s, t h e r es idu a l er r or d ecr ea se s.
Bu t , in doin g so, we ar en ’t r ea lly explain ing wh y th e zon e h a s m or e cr imes t h a n

expe cted, bu t sim ply h a ve a ccoun t ed for it by pu t t in g in a n em pir ical va r ia ble. In r e-ru n n in g
th e model, th ere will be, usu ally, new out liers t ha t h ave a great er positive residual err or. If
t h is logic is t o be rep ea t ed, th en we wou ld cr ea t e n ew special gener a t or s for t h ose zon es a n d
r e-est ima t e t h e m odel. If con t inu ed wit h ou t lim its , even t u a lly t h er e wou ld n ot be a m odel
a n ym ore bu t just a collect ion of du m m y va r ia bles , one for ea ch zon e.
Th er efor e, a u ser sh ould be cau t iou s in int r odu cing special gen er a t ors . It is gen er a lly
a lr igh t t o in t r odu ce a few for t h e t r u ly except ion a l zon es. Th ese a r e zon es wh er e it is logica l
t o tr ea t t h em a s s pecial gen er a t ors a n d w h er e one wou ld expect cont in u it y over t im e. In
oth er wor ds , t h ey sh ould be u se d if t h e special gen er a t or s t a t u s is expect ed t o las t over t im e.
F or exa m ple, a st a diu m or a n a ir por t or a t r a in st a t ion is lia ble t o r em a in a t it s loca t ion for
m a n y yea r s. A pa r t icu la r sh oppin g m a ll, on t h e ot h er h a n d, m a y a t t r a ct cr im es a t on e
pa r t icu la r poin t in t im e bu t n ot n ecessa r ily in t h e fu t u r e. U n less it is a m a ll t h a t is so m u ch
la r ger t h a n a n y ot h er m a ll in t h e r egion (a ‘m ega -m a ll’), it s h ou ld n ’t be given a sp ecia l
gen er a t or s t a t u s.
Ad din g Ex te rn a l Trip s
E xt er n a l t r ips a r e, by d efin it ion, t r ips t h a t com e from ou t sid e t h e r egion. Th ey a r e

pa r t of t h e origin /pr odu ction m odel in t h a t t h es e a r e t r ips t h a t a r e n ot a ccoun t ed for by t h e
m odel. Ther e a r e a lso t r ips t h a t or igina t e with in t h e st u dy ar ea , but en d out side t h e a r ea ;
h owever , t h ose a r e u su a lly n ot m odeled s in ce th e focus will be on t h e st u dy a r ea it se lf. In t h e
u su a l t r a vel dem a n d fr a m ewor k , ext er n a l t r ips a r e t h ose com in g fr om m a jor cor r idor s in t o
13.26
t h e r egion. Est im a t es of t h e t r a vel on t h es e cor r idor s a r e obt a in ed by cord on cou n ts, cou n t s
of veh icles comin g in t o t h e r egion a n d lea vin g t h e r egion (n et in flow). E st im a t es of fut u r e
gr owt h of t h os e ext er n a l t r ip s h a s t o ba s ed on exp ect a t ion s of fu t u r e p op u la t ion gr owt h t h e
met ropolita n r egion a nd in near by regions.
For crime tr ips, extern al tr ips ar e defined as tr ips th at originat e out side the stu dy
a r ea . But t h ey mu st be est ima t ed by th e differ en ce bet ween t h e t ot a l nu m ber of cr imes
occur ring in t he destinat ion st udy area an d th e tota l originat ing in t he origin zones. That is,
of a ll t h e cr im es occu r r in g in t h e s t u dy a r ea , t h e or igin zon es a r e m od eled . Th os e t r ip s t h a t
or igin a t e fr om ou t s id e t h e or igin zon es a r e ext er n a l t r ip s. Th ey m u s t be a d ded t o t h e
pr edict ed n u m ber of or igin t r ip s t o pr odu ce a n a dju st ed est im a t e of t ot a l or igin s, or :
Oj = Opi + Oe (13.24)
wh er e O j is t h e t ota l n u m ber of crim e origin s for crim es com m it t ed in st u dy a r ea , j, Op i is t h e

tota l nu mber of crimes originat ing in t he origin zones, I, an d O e is th e t ot a l n u m ber of
crim es origin a t in g out sid e t h e r egion, e.
In oth er words , for t h e pr odu ction (or igin) model on ly , we add a n exter n a l zon e t o
a ccoun t for crim e t r ips t h a t origin a t ed out sid e t h e m odeled r egion. If we d on’t do t h a t , in t h e
ba lan cing st ep, we’ll overes t ima t e t h e n u m ber of cr imes or igina t ing in ea ch zon e beca u se t h e
pr edicted origin s will be m u lt iplied by a fact or t o ens u r e t h a t t h e t ota l n u m ber of origin s
equa ls th e tota l nu mber of destinat ions.
Not in clu din g t h e ext er n a l t r ip s ca n lea d t o bia s in t h e m odel. If t h e n u m ber of

exter n a l tr ips is a sizeable per cen t a ge of a ll cr ime origins occu r r ing in t h e st u dy ar ea , th en
t h e coefficient s of t h e or igin m odel cou ld be mislea din g. In pr a ct ice, most t r a vel dem a n d
m odeler s a ssu m e t h a t if t h e per cen t a ge of exter n a l tr ips is n ot grea t er t h a n 5%, th er e
u su a lly is lit t le bia s in t r odu ced (Or t u za r a n d Willu m sen , 2001). If it is gr ea t er t h a n 5%,
t h en origin zon es from a dja cent jur isd ictions n eed t o be inclu ded in t h e origin m odel.
B a l a n c i n g P r e d i c t e d O ri g i n s a n d P r e d i c t e d D e s t i n a t io n s
Th e t r ip gen er a t ion ‘m odel’ is a ctu a lly two sep a r a t e m odels: 1) a m odel of t r ips
pr odu ced by ever y zon e a n d 2) a m odel of t r ips a t t r a cted t o every zon e. Sin ce a t r ip h a s a n
or igin a n d a d es t in a t ion (by d efin it ion ), t h en t h e t ot a l n u m ber of p r od u ct ion s m u st equ a l t h e
tota l nu mber of at tr actions:
n n
GO i = GD j (13.25)
I=1 j=1
wh er e O is a t r ip or igin , D is a t rip destina tion, and I an d j ar e zone nu mbers.
To en su r e t h a t t h is equ a lit y is t r u e, a ba la n cing oper a t ion is con du cte d. E ss en t ia lly,

t h is m ea n s m u lt ip lyin g eit h er t h e n u m ber of pr edict ed or igin s in ea ch or igin zon e or t h e
13.27
n u m ber of pr edict ed dest in a t ion s in ea ch dest in a t ion zon e by a const a n t wh ich is t h e r a t io of
eith er th e tota l destinat ions t o th e tota l origins (to mu ltiply th e num ber of predicted origins)
or t h e r a t io of t h e t ot a l or igin s t o t h e t ot a l d es t in a t ion s (t o m u lt ip ly t h e n u m ber of p red ict ed
destinat ions).
With cr ime a n a lysis, t h e n u m ber of dest ina t ion s would gener a lly be con sider ed a
m or e r eliable da t a set t h a n t h e n u m ber of or igins . Beca u se crim es a r e enu m er a t ed wh er e
t h ey occu r , th e n u m ber of cr imes occu r r ing a t a n y on e loca t ion is m or e a ccu r a t e t h a n t h e
loca t ion of t h e offen d er s . Th u s , we a d ju s t t h e p r ed ict ed or igin s so t h a t t h ey equ a l t h e
predicted destinat ions. 6
S u m m a ry o f t h e Tr ip Ge n e r a t io n Mo d e l
In sum ma ry, th e trip genera tion m odel is estimat ed in four steps:
1. A m odel of t h e pr edict or s of t h e n u m ber of cr im es or igin s (a cr im e pr odu ct ion

m odel);
2. A m od el of p red ict or s of t h e n u m ber of cr im e d es t in a t ion s (a cr im e a t t r a ct ion
m odel);
3. Extern al tr ips ar e estima ted an d added to th e num ber of predicted origins as
a n ext er n a l zon e; a n d
4. Th e t ot a l n u m ber of pr edict ed cr im e or igin s is ba la n ced t o be equ a l t o t h e t ot a l
nu mber of predicted crime destinat ions.
Th e Cr i m eS t a t Tr ip Ge n e r a t io n Mo d e l
In t h is s ection , we describe t h e t r ip gen er a t ion m odel im plem en t ed in Crim eS tat. As

m en t ioned a bove, t h is s t ep in volves calibr a t in g a r egr es sion m odel a ga in st t h e zona l da t a .
Two sepa r a t e m odels a r e developed, on e for t r ip or igins a n d one for t r ip dest ina t ion s. The
depen den t va r ia ble is t h e n u m ber of crim es origin a t in g in a zon e (for t h e t r ip or igin m odel)
or t h e n u m ber of cr im es en d in g in a zon e (for t h e t r ip des t in a t ion m od el). Th e in d ep en d en t
va r ia bles a r e zon a l va r ia bles t h a t m a y p r edict t h e n u m ber of or igin s or dest in a t ion s.
Th er e a r e t h r ee s t eps t o th e m odel, ea ch cor r espon din g t o a s epa r a t e t a b in

Crim eS tat:
1. Ca libra t e t h e m odel
2. Ma k e a pr ediction
3. Ba la n ce t h e pr edict ed or igin s a n d t h e pr edict ed dest in a t ion s
F igur e 13.5 s h ows a n im a ge of t h e t r ip gen er a t ion m odel pa ge wit h in Crim eS tat. The
t r ip gen er a t ion m odel is m a de u p of th r ee sepa r a t e pa ges (or t a bs):
13.28
Figure 13.5:
Trip Generation Module
1. A Calibrate m od el pa ge in wh ich a r egr ession m odel can be r u n t o est im a t e
eith er an origin (production) model or a destinat ion (at tr action) model;
2. A M ak e pred iction pa ge in wh ich t h e est ima t ed coefficient s can be ap plied t o

th e same or a different dat a set a nd in which t he extern al tr ips can be added to
t h e pr edict ed or igin s; a n d
3. A B alan ce pred icted origin s & d estin ation s pa ge in wh ich t h e t ot a l pr edict ed

or igin s ca n be a dju st ed t o equ a l t h e t ot a l p r edict ed dest in a t ion s.
Ca li bra te Mo d e l
In t h e fir st st ep, m odels a r e ca libr a t ed u sin g t h e in pu t da t a . Th er e is a m odel for t h e

origin zones an d an oth er model for t he destinat ion zones. The user should indicat e what type
of m od el is bein g r u n in or d er t o m a k e t h e ou t p u t m or e clea r (it is n ot es sen t ia l bu t ca n
m in im ize confu sion fr om m is la belin g).
Da ta Fi le
Th e da t a file is inpu t a s eit h er t h e pr ima r y or secon da r y file. Specify whet h er t h e

da t a file is t h e pr im a r y or s econ da r y file.
Ty pe o f Mo de l
Specify wh et h er t h e m odel is for or igin s or dest in a t ion s. Th is will be pr in t ed ou t on

t h e ou t pu t h ea der .
De pe n de n t Varia ble
Select t h e depen den t va r ia ble fr om t h e list of va r ia bles. Th er e ca n be on ly on e

depen den t va r ia ble per m odel.
Skew ness Diagnostics
If ch ecked , th e r ou t ine will test for t h e sk ewn ess of t h e depen den t var iable. The
out put includes:
1. Th e “g” st a t ist ic
2. Th e st a n da r d er r or of t h e “g” st a t ist ic
3. Th e Z valu e for t h e “g” st a t ist ic
4. Th e pr oba bilit y level of a Typ e I er r or for t h e “g” st a t is t ic
5. Th e r a t io of t h e sa m ple var ian ce t o t h e sa ple m ea n
E r r or m essa ges indicat e wh et h er t h er e is pr oba ble skewn ess in t h e depen den t

va r ia ble. If t h er e is s k ewn ess, u se a P oiss on r egr ession m odel.
13.30
In d e pe n d e n t v ari ab le s
Select ind epen den t var iables from t h e list of var iables in t h e da t a file. Up t o 15
var iables can be selected.
Mi ss in g va lu e s
Specify an y miss ing valu e codes for t h e var iables. Blan k r ecor ds will au t om a t ica lly be
con sid er ed a s m iss in g. If an y of t h e select ed depen den t or in depen den t va r ia bles h a ve
m iss in g valu es, t h ose r ecor ds will be excluded fr om t h e a n a lysis .
Ty pe o f R e gre s si on Mo de l
Specify t h e t yp e of r egr ession m odel t o be u sed. Th e defa u lt is a P ois son r egr ession
wit h over -dis per sion cor r ect ion . Ot h er a lt er n a t ives a r e a P ois son r egr ession a n d a n
Or din a r y Lea st Squ a r es r egr ession .
Type of Re gre ss ion P roc e du re
Sp ecify wh et h er a fixed m odel (all select ed in depen den t va r ia bles a r e u sed in t h e

r egr es sion ) or a ba ckwa r d elim in a t ion s t epwis e m odel is u se d. Th e defa u lt is a fixed m odel.
If a ba ckwa r d elim in a t ion st epwise m odel is s elect ed, ch oose t h e P -to-rem ove va lu e (defau lt
is .01). The ba ckw a r d elimin a t ion st a r t s wit h a ll select ed var iables in t h e m odel (t h e fixed
pr ocedu r e). However , it pr oceeds t o dr op var iables t h a t fa il t h e P-t o-r em ove test , on e a t a
t ime. An y va r iable t h a t h a s a significa n ce level in excess of t h e P-t o-r em ove va lue is dr opped
fr om t h e equ a t ion .
Sa ve Est im ate d Coe fficie n ts/Pa ram e te rs
Th e est ima t ed coefficient s of t h e fina l model ca n be sa ved as a ‘dbf’ file. Specify e a

file n a m e. This would be us efu l in or der t o r epea t t h e r egres sion wh ile a ddin g in exter n a l
t r ips t o th e pr edicte d or igin s (see Ma k e t r ip gen er a t ion p r ediction below) or t o ap ply t h e
coefficien t s t o a n ot h er da t a set (e.g., fu t u r e va lu es of th e in depen den t va r ia ble).
S av e Ou tp u t
Th e out pu t is s a ved a s a ‘dbf’ file u n der a differ en t file n a m e. Th e out pu t in clud es a ll

t h e va r ia bles in t h e in pu t da t a set plu s t wo new ones: 1) t h e pr edict ed va lu es of th e
depen den t va r ia ble for ea ch obser va t ion (with t h e n a m e P RE DICTE D); a n d 2) t h e r esidu a l
er r or va lu es, r epr esen t in g t h e differ en ce bet ween t h e a ct u a l /obser ved va lu es for ea ch
obs er va t ion a n d t h e pr edict ed va lu es (wit h t h e n a m e RE SIDUAL).
P oi ss on ou t p u t
Th e out pu t of t h e P oiss on r egr ession r out in es in clud es 13 fields for t h e en t ir e m odel:
13.31
1. Th e depen den t va r ia ble
2. Th e t ype of m odel
3. The sample size (N)
4. Th e degrees of fr eedom (N - # depen den t var iables – 1)
5. Th e t yp e of r egr ession m odel (P ois son , P ois son wit h over -dis per sion
cor r ect ion )
6. Th e log-likelih ood va lu e
7. Th e Lik elih ood Ra t io
8. Th e pr obabilit y valu e of t h e Lik elih ood Ra t io
9. Th e Ak a ike I n for m a t ion Cr ite r ion (AIC)
10. Th e Schwa r t z Crit er ion (SC)
11. Th e Disper sion Mu ltiplier
12. Th e a pp r oxim a t e R-squ a r e va lu e
13. Th e devia n ce R-squ a r e va lu e
a n d 5 fields for ea ch est ima t ed coefficient :
14. Th e est ima t ed coefficient

15. Th e st a n da r d er r or of t h e coefficient
16. Th e pseu do-t oler a n ce va lu e of t h e coefficien t (see below)
17. Th e Z-valu e of t h e coefficient
18. Th e p-va lue of t h e coefficient .
O LS ou t p u t
Th e out pu t of t h e Or din a r y Lea st Squ a r e (OLS) r out in e in clud es 9 fields for t h e en t ir e

m odel:
1. Th e depen den t va r ia ble

2. Th e t ype of m odel
3. The sample size (N)
4. Th e degrees of fr eedom (N - # depen den t var iables – 1)
5. The type of regression m odel (Norm a/Ordina ry Least Squar es)
6. Squ a r ed m u ltiple R
7. Adjus t ed squ a r ed m u ltiple R
8. F t est of t h e m odel
9. p-va lue of t h e m odel
a n d 5 fields for ea ch est ima t ed coefficient :
10. Th e est ima t ed coefficient

11. Th e st a n da r d er r or of t h e coefficient
12. The t olera n ce value of t h e coefficient (see below)
13. Th e t -valu e of t h e coefficient
14. Th e p-va lue of t h e coefficient .
13.32
Multi co lin e ari ty Amo n g t h e Ind e pe n de n t Varia ble s
To test m u lt icolin ea r it y, a t oler a n ce te st is r u n (see equ a t ion 13.23 a bove). Th er e is

n ot a sim ple t es t of wh et h er a pa r t icula r t oler a n ce is m ea n in gful or n ot. I n Crim eS tat,
sever a l qu a lit a t ive ca t egor ies a r e u sed a n d er r or m essa ges a r e ou t pu t :
1. If t h e t oler a n ce va lu e is 0.80 or gr ea t er , t h en t h er e is lit t le m u lt icolin ea r it y (N o

a ppa r en t m u lt icolin ea r it y);
2. If t h e t oler a n ce is bet ween 0.50-0.79, t h er e is s ome m u lt icolin ea r it y (poss ible

m u lt icolin ea r it y);
3. If t h e t oler a n ce is bet ween 0.25-0.49, t h er e is p r obable m u lt icolin ea r it y

(pr oba ble mu lticolinea r ity. Elimin a t e var iable with lowest t olera n ce a n d r e-
r u n ); a n d
4. If t oler a n ce is less t h a n 0.25, t h er e is d efinit e m u lt icolin ea r it y. (Defin it e

mu lticolinear ity. Results a re not reliable. Elimina te variable with lowest
t oler a n ce a n d r e-r u n ).
Gr a p h
Wh ile t h e out pu t pa ge is open , clickin g on t h e gr a ph bu t t on will d is pla y a gr a ph of th e

r esid u a l er r or s (on t h e Y axis) a ga in st t h e pr edict ed va lu es (on t h e X axis).
Ma k e T ri p Ge n e r a ti o n P r e d i c t io n
Th is r out in e a pp lies a n a lr ea dy-ca libr a t ed r egr ession m odel t o a da t a set . Th is would

be u seful for sever a l r ea son s: 1) if ext er n a l t r ips a r e t o be add ed t o th e m odel (which is
n or m a lly p r efer r ed); 2) if t h e m odel is a pplied t o a n ot h er da t a set ; a n d 3) if va r ia t ion s on t h e
coefficient s a r e being t est ed wit h t h e sa m e da t a set . The m odel will n eed t o be ca libra t ed
firs t (see Ca libr a t e t r ip gen er a t ion m odel) a n d t h e coefficien t s s a ved a s a pa r a m et er s file.
The coefficient par am eter file is th en re-loaded an d applied to th e data .
Da ta Fi le
Th e da t a file is inpu t a s eit h er t h e pr ima r y or secon da r y file. Specify whet h er t h e

da t a file is t h e pr im a r y or s econ da r y file.
Ty pe o f Mo de l
Specify wh et h er t h e m odel is for or igin s or dest in a t ion s. Th is will be pr in t ed ou t on

t h e ou t pu t h ea der .
13.33
Trip Ge n e rat io n Coe fficie n ts /P ara m e te rs F ile
Th is is t h e sa ved coefficien t pa r a m et er file. It is a n ASCI I file a n d ca n be ed it ed if

a lt er n a t ive coefficien t s w er e bein g t es t ed (be car efu l a bout edit in g t h is w it h out m a k in g a
ba cku p). Loa d t h e file by clickin g on t h e Browse bu t t on a n d fin din g th e file. On ce loa ded,
t h e va r ia ble n a m es of t h e sa ved coefficien t s a r e disp la yed in t h e “Ma t chin g pa r a m et er s” box.
In d e pe n d e n t Va ri ab le s
Select ind epen den t var iables from t h e list of var iables in t h e da t a file. Up t o 15
var iables can be selected.
Match in g P ara m e te rs
Th e select ed in depen den t va r ia bles n eed t o be ma t ched t o th e sa ved va r ia bles in t h e

t r ip gen er a t ion pa r a m et er s file in t h e sa m e or der . Add t h e a ppr opr ia t e va r ia bles on e by on e
in t h e ord er in wh ich t h ey a r e list ed in t h e m a t chin g pa r a m et er s box. It is es sen t ia l t h a t t h e
or der by th e sa m e ot h er wise t h e coefficient s will be ap plied t o t h e wr on g va r iables.
H in t : With your cur sor pla ced in t h e list of in depen den t va r ia bles , t ypin g t h e
firs t let t er of t h e m a t chin g var ia ble n a m e will t a k e you t o th e firs t va r ia ble
t h a t st a r t s wit h t h a t let t er . Repea t in g t h e let t er will m ove down t h e list t o t h e
second, third, and so fort h u nt il th e desired var iable is reached.
Miss in g Valu e s
Specify an y miss ing valu e codes for t h e var iables. Blan k r ecor ds will au t om a t ica lly be
con sid er ed a s m iss in g. If an y of t h e select ed depen den t or in depen den t va r ia bles h a ve
m iss in g valu es, t h ose r ecor ds will be excluded fr om t h e a n a lysis .
Ad d E x te r n a l Tr ip s
E xt er n a l t r ips a r e t r ips t h a t st a r t out sid e t h e m odeled s t u dy a r ea . Beca u se t h ey a r e

cr im es t h a t or igin a t e ou t s id e t h e s t u dy a r ea , t h ey wer e n ot in clu d ed in t h e zon es u sed for t h e
origin m odel. Th er efor e, t h ey h a ve t o be ind epen den t ly est im a t ed a n d a dd ed t o th e origin
zon e t ot a l t o m a k e t h e n u m ber of or igin s equ a l t o t h e n u m ber of d es t in a t ion s . Click on t h e
“Ad d ext er n a l t r ip s” bu t t on t o en a ble t h is fea t u r e.
N u m b er o f e xt e r n a l t r i p s
Ad d t h e n u m ber of ext er n a l t r ip s t o t h e box. Th is nu m ber will be a dd ed as an ext r a

or igin zon e (t h e E xt er n a l zon e).
13.34
O r i g i n ID
Specify th e origin ID var iable in t he dat a file. The extern al tr ips will be added as a n
extr a or igin zon e, ca lled th e “E xter n a l” zon e. Not e: t h e ID ’s u sed for t h e dest ina t ion file zon es
sh ould be t h e sa m e a s in t h e origin file. Th is will be n ecess a r y in su bsequ en t m odelin g
sta ges.
Ty pe o f R e gre s si on Mo de l
Specify t h e t yp e of r egr ession m odel t o be u sed. Th e defa u lt is a P ois son r egr ession
a n d t h e ot h er a lt er n a t ive is a Nor m a lly-d is t r ibu t ed/Or din a r y Lea st Squ a r es r egr ession .
S a v e P r e d i c te d Va lu e s

t h e va r ia bles in t h e in pu t da t a set plu s t h e pr edict ed va lu es of t h e depen den t va r ia ble for
ea ch obser vat ion (with t h e n a m e PRE DICTE D). In a ddit ion , if exter n a l tr ips a r e a dded , th en
t h er e sh ould be a n ew r ecor d w it h t h e n a m e E XTE RN AL list ed in t h e Or igin ID colu m n .
Th is r ecor d list s t h e a dded t r ips in t h e PRE DICTE D colum n a n d zeros (0) for a ll ot h er
nu mer ic fields.
Ou tp u t
Th e t a bu lar ou t pu t inclu des s u m m a r y in for m a t ion a bou t file a n d list s t h e pr edict ed

valu es for ea ch inp u t zon e.
B ala n ce P re d ic te d Orig in s & D e st in a tio n s
Sin ce, by defin ition, a ‘t r ip’ h a s a n or igin a n d a dest ina t ion , th e n u m ber of pr edict ed
origin s m u st equ a l t h e n u m ber of pr edicted dest in a t ion s. Beca u se of sligh t differ en ces in t h e
da t a set s of t h e or igin m odel a n d t h e dest ina t ion m odel, it is possible t h a t t h e t ot a l nu m ber
of pr edict ed or igin s (in clu din g a n y ext er n a l t r ip s – see Ma k e t r ip gen er a t ion pr edict ion
a bove) m a y n ot equ a l t h e t ot a l n u m ber of p red ict ed des t in a t ion s. Th is st ep , t h er efor e, is
essen t ial t o gua r a n t ee t h a t t h is con dit ion will be tr u e. The r ou t ine a djus t s eit h er t h e
n u m ber of pr edict ed or igin s or t h e n u m ber of pr edict ed dest in a t ion s so t h a t t h e con dit ion
h olds . Th e t r ip d ist r ibu t ion r out in es will n ot work u n less t h e n u m ber of pr edicted origin s
equ a ls t h e n u m ber of pr edict ed dest in a t ion s (wit h in a ver y s m a ll r ou n din g-off er r or ).
P re di ct e d Orig in Fi le
Specify t h e n a m e of t h e pr edict ed or igin file by click in g on t h e Br owse bu t t on a n d

loca t ing t h e file.
Origin var iable
Specify t h e n a m e of th e va r ia ble for t h e pr edict ed or igin s (e.g., P RE DICTE D).
13.35
P re di ct e d D e st in at io n Fi le
Specify t h e n a m e of t h e pr edict ed dest in a t ion file by click in g on t h e Br owse bu t t on

a n d loca t ing t h e file.
D e st i n a t i o n v a r i a b l e
Specify t h e n a m e of th e va r ia ble for t h e pr edict ed or igin s (e.g., P RE DICTE D).
Ba la n cin g m eth od
Sp ecify wh et h er origin s or dest in a t ion s a r e t o be held con st a n t . Th e defau lt is ‘H old

d es t in a t ion s con st a n t ’.
Sa ve P re di ct e d Orig in /De st in at io n Fi le

t h e va r ia bles in t h e in pu t da t a set plu s t h e a dju st ed va lu es of th e pr edict ed va lu es of th e
depen den t var iable for ea ch obser vat ion . If dest ina t ion s a r e h eld con st a n t , th e a djus t ed
va r ia ble n a m e for t h e pr edicte d t r ips is ADJ ORI GIN. If or igin s a r e h eld con st a n t , t h e
a dju st ed va r ia ble n a m e for t h e pr edicte d t r ips is ADJ DE ST.
Ou tp u t
Th e t a bu la r ou t pu t in clu des file su m m a r y in for m a t ion plu s in for m a t ion a bou t t h e

n u m ber of or igins a n d des t ina t ion s befor e a n d a ft er ba lan cing. In a ddit ion , th e pr edict ed
values of th e dependent var iable are displayed.
E x a m p le Tr ip Ge n e r a t io n Mo d e l
To illu st r a t e t h is m odel, let ’s r u n t h r ou gh t h ese pr ocedu r es u sin g a n exa m ple fr om

Ba lt im or e Cou n t y. In t h e ca se of Ba lt im or e Cou n t y, t r a ffic a n a lysis zon es (TAZ) wer e u sed
for t h e zon a l geogr a ph y. Tw o da t a set s a r e pr odu ced, on e for t h e cr im e or igin s a n d on e for
t h e crim e dest in a t ion s. F or Ba lt im ore Coun t y, th e origin da t a set h a s 532 zone s cover in g
bot h Ba lt im or e Cou n t y a n d t h e Cit y of Ba lt im or e wit h t h e t ot a l n u m ber of cr im e or igin s for
ea ch zon e (su b-divid ed in t o differ en t cr im e t yp es - robber ies, bu r gla r ies, ve h icle t h eft ) a n d a
n u m ber of possible pr edict or var iables (popu lat ion , ret a il a n d n on -r et a il em ploymen t ,
m edia n h ou seh old in com e, p over t y levels , a n d veh icle own er sh ip ). Sim ila r ly, t h e dest in a t ion
da t a set h a s 325 zon es wit h t h e n u m ber of cr ime d est ina t ion s for ea ch zon e (a gain , sub-
divid ed in t o differ en t cr im e t yp es) a n d n u m ber of possible pr edict or va r ia bles (popu la t ion ,
r et a il a n d n on -r et a il em ploymen t , med ian h ou seh old incom e, an d sever a l la n d u se cat egor ies
- a cr ea ge alloca t ed for r et a il, r esiden t ial, office spa ce, an d con ser vat ion u ses). Sa m ple da t a
set s a r e pr ovided on t h e Crim eS tat downloa d pa ge.
13.36
S e tt in g Up th e Ori gi n Mo de l
In t h e fir st st ep, a n or igin m odel is cr ea t ed. F igu r e 13.6 sh ows t h e select ion of th e
depen den t va r ia ble a n d s ome poss ible in depen den t va r ia bles . Th e t ype of model is a n
or d in a r y P ois son r egr es sion . Th e d ep en d en t va r ia ble is t h e n u m ber of cr im es occu r r in g
bet ween 1993 an d 1997 in ea ch or igin zon e (BCORIG). Eight possible indep en den t var iables
h a ve been select ed: t h e 1996 popu la t ion of ea ch zon e (P OP 96), t h e m edia n h ou seh old in com e
of t h e zone r ela t ive t o th e zone wit h t h e h igh es t m edia n h ous eh old in com e (IN CE QU AL), t h e
n u m ber of 1996 n on-r et a il em ployees in ea ch zone (NON RE T96), t h e n u m ber of 1996 r et a il
em ployees in ea ch zon e (RE TE MP 96), t h e t ot a l lin ea r m iles of a r t er ia l r oa ds in ea ch zon e
(ARTE RIAL), a du m m y va r ia ble for w h et h er t h e Ba lt im ore Belt wa y (I-695) pa ss ed t h r ough
t h e zone or n ot (BE LTWAY), t h e lin ea r dis t a n ce of th e zone from Ba lt im ore h a r bor in t h e
CBD (DISTANCE ), a n d t h e n u m ber of h ous eh olds wit h out a u t omobiles (ZE ROAUTO - th is
ca n n ot be seen in t h e im a ge).
Th e m odel is set u p t o r u n a P ois son r egr ession wit h ou t a n over -dis per sion cor r ect ion .
It is a fixed m odel in wh ich a ll ind epen den t var iables a r e inclu ded. Th e coefficient s a r e sa ved
u n der “Sa ve es t im a t ed coefficien t s” dia logu e box a n d t h e out pu t (t h e pr edict ed va lu es) a r e
sa ved un der t h e “Sa ve ou t pu t ” dia logue box. Bot h boxes as k for a file n a m e.
Ta ble 13.2 sh ows t h e r esu lt s. Key st a t ist ics a r e h ighlight ed. Th e overa ll m odel is
h igh ly significa n t . The Likelihood Ra t io is high ly significa n t a n d t h e R-squ a r es a r e
r ea sona bly h igh (0.50 for t h e R-squ a r e a n d 0.42 for t h e devian ce R-squ a r e). The coefficient s
for ea ch of th e va r ia bles a r e sign ifica n t .
H owever, t h er e a r e t wo m a jor pr oblems . Fir st , th e disper sion m u ltiplier (pa r a m et er )

is very lar ge (37.087), in dica t ing t h a t t h e con dit ion a l va r ian ce is m or e t h a n 37 tim es gr ea t er
t h a n t h e con dit ion a l m ea n . Secon d, wh ile a ll of t h e coefficien t s a r e sign ifica n t , s ever a l s h ow
sizea ble m u lticolinea r ity a s eviden ced by t h e ps eu do-t oler a n ce valu e (P OP 96, DISTANCE ,
ZEROAUTO). Th is in dica t es t h a t t h es e va r ia bles a r e es se n t ia lly m ea su r in g t h e sa m e t h in g.
R e st ru c tu ri ng th e Ori gi n Mo de l
Con sequ en t ly, th e m odel is res t r u ct u r ed in t h r ee wa ys (figu r e 13.7). Fir st , th e over-

disp er sion cor r ect ion is a pp lied. Secon d, t h e co-linea r var iab les DIS TANCE a n d
ZE ROAUTO a r e dr opped fr om t h e m odel. Third , a st epwise ba ckw a r d elimin a t ion pr ocedu r e
is u sed wit h t h e pr oba bilit y for keep ing a var iable in t h e equa t ion (p-to-r em ove) being 0.01;
t h a t is, u n less t h e pr obabilit y t h a t a coefficien t cou ld be obta in ed by cha n ce is less t h a n 1 in
100, t h e va r ia ble is dr opped.
Th e r esu lt is n ow a m odel wit h t h e Lik elih ood Ra t io a n d R-s qu a r es bein g a lm ost a s

h igh a s in t h e firs t m odel a n d in wh ich a ll t h e coefficient s a r e significa n t , but t h er e is very
litt le mu lticolinea r ity. (Ta ble 13.3). The disper sion m u ltiplier is n ow 1.0 since th e
coefficien t st a n da r d er r or s h a ve been cor r ect ed for t h e or igin a l over -dis per sion (see equ a t ion
13.8 a bove).
13.37
Figure 13.6:
Origin Poisson Model Setup
Figure 13.7:
Origin Poisson Model with Over-dispersion Correction
Ta ble 13.2
R esu lts of Fir st Or igin Mode l Run
Model result:
Data file: BaltOrigins.dbf
Type of model: Origin
DepVar: BCORIG
N: 532
Df: 523
Type of regression model: Poisson with over-dispersion correction
Log Likelihood: -10678.051687
Likelihood ratio(LR): 25609.182621
P-value of LR: 0.0001
AIC: 21374.103373
SC: 21412.593165
Dispersion multiplier: 37.086973
R-square: 0.499539
Deviance r-square: 0.420031
-----------------------------------------------------------------------------
Pseudo-
Predictor DF Coefficient Stand Error Tolerance z-value p-value
CONSTANT 1 0.887266
0.037707 . 23.530608 0.001
POP96 1 0.000337
0.000016 0.463218 21.665568 0.001
INCEQUAL 1 -0.033017
0.001226 0.608346 -26.926013 0.001
NONRET96 1 -0.000173
0.000028 0.842042 -6.082943 0.001
RETEMP96 1 -0.000364
0.000117 0.960564 -3.107357 0.010
ARTERIAL 1 -0.108257
0.025888 0.771634 -4.181834 0.001
BELTWAY 1 0.150967
0.036047 0.958973 4.188082 0.001
DISTANCE 1 0.034289
0.007842 0.491906 4.372170 0.001
ZEROAUTO 1 -0.000462
0.000141 0.355510 -3.283930 0.010
-----------------------------------------------------------------------------
Lookin g a t t h e m odel, we see six va r ia bles t h a t sign ifica n t ly pr edict t h e n u m ber of

cr im e or igin s. P opu la t ion is t h e st r on gest , a s in dica t ed by it s Z-t est . Rela t ive in com e
equ a lit y is t h e n ext st r ongest , bu t t h is is a n ega t ive coefficien t ; th a t is, zon es wit h h igh
r ela t ive in com e equ a lit y p r odu ce fewer cr im e or igin s wh er ea s zon es wit h low r ela t ive in com e
equ a lit y pr odu ce more cr im e origin s. Th e t h ir d a n d fou r t h st r onges t va r ia bles a r e n on-r et a il
a n d r et a il em ploymen t r espectively, but , aga in, t h e coefficient s a r e n egat ives; zon es wit h less
em ploym en t h a ve m or e cr im es or igin a t e fr om t h em . F in a lly, t h e t wo r oa dwa y va r ia bles sh ow
s ign ifica n t effect s . Zon es in wh ich t h e Ba lt im or e Belt wa y p a ss es t h r ou gh t h em h a ve a
h igh er n u m ber of cr imes or igina t ing (a s m igh t be expect ed) an d a lso zon es wit h fewer m iles
of a r t er ia l h a ve m or e cr im es or igin a t in g; wit h t h e la t t er va r ia ble, it ’s pos sible t h a t we a r e
pick in g u p t h e la ck of com m er cia l em ploym en t oppor t u n it ies sin ce r et a il fir m s t en d t o loca t e
on a r t er ia l r oa ds ra t h er t h a n loca l s tr eet s .
R e s i d u a l An a l y s is o f Or ig i n Mo d e l
Th e Crim eS tat ou t p u t in clu d es a gr a p h of t h e r es id u a l er r or s (a ct u a l va lu es m in u s t h e

pr edict ed va lu es) on t h e Y-axis by t h e pr edict ed va lu es on t h e X-axis. It is im por t a n t t o
exa m in e t h e r es id u a l er r or s a s t h es e ca n in d ica t e ou t lier s , p r oblem s in t h e d a t a , a n d
viola t ion of a s su m p t ion s . F igu r e 13.8 s h ows a n im a ge of t h e r es id u a l gr a p h scr een . As seen ,
13.40
Figure 13.8:
Plot of Residual Errors and Predicted Values
t h e er r or s in cr ea se wit h t h e va lu e of t h e pr edict ed depen den t va r ia ble. Wit h t h e P ois son
m odel, t h is is exp ect ed a n d does n ot in dica t e t h e viola t ion of t h e in depen den t er r or s
a ssu m pt ion , a s it does wit h t h e OLS. Th e er r or s a r e r ea son a bly sym m et r ica l a n d do n ot
in dica t e differ en ces in over- an d u n der -est im a t ion a cross t h e ba n d of th e pr edicte d va lu es .
Th er e a r e some ou t liers; th er e a r e t wo zon es t h a t pr edict ed su bst a n t ially more crim es

t h a n a ctu a lly or igina t ed in t h ose zones a n d t h er e is on e zone t h a t h a d m ore cr im es origin a t e
fr om it t h a n wa s pr edict ed by th e m odel. But , in gener a l, t h e m odel a ppea r s t o be
reasonably balanced.
Ta ble 13.3
R e su lt s o f S e c on d Ori gi n Mo d e l
Model result:
Data file: BaltOrigins.dbf
Type of model: Origin
DepVar: BCORIG
N: 532
Df: 525
AIC: 22538.584312
SC: 22568.520816
R-square: 0.455630
-----------------------------------------------------------------------------
Pseudo-
CONSTANT 1 2.286699
0.039339 . 58.127787 0.001
POP96 1 0.000284
0.000013 0.943426 22.473451 0.001
0.001026 0.849679 -18.048743 0.001
NONRET96 1 -0.000186
0.000030 0.866522 -6.139941 0.001
RETEMP96 1 -0.000353
0.000125 0.960769 -2.820286 0.010
ARTERIAL 1 -0.085070
0.027006 0.938167 -3.150019 0.010
BELTWAY 1 0.123109
0.037868 0.970051 3.251004 0.010
-----------------------------------------------------------------------------
S e tt in g Up th e D e st in a ti on Mo de l
Th e sa m e logic is a pp lied for t h e dest in a t ion m odel. In t h is ca se, t h e dest in a t ion file
h a s da t a on 325 zon es wit h in Ba ltim or e Cou n t y on ly. Similar possible pr edict or var iables
a r e in clud ed in t h e file. Aside from popu la t ion, r et a il a n d n on-r et a il em ploym en t , a n d t h e
r oa dwa y va r ia bles, m or e det a iled a n a lysis on la n d u ses wer e in clu ded (a cr ea ge of
com m er cial, r esiden t ia l, office spa ce, recrea t ion a l, an d con ser va t ion la n ds ). Th e m odel t h a t
wa s r u n wa s a P ois son wit h a n over -dis per sion cor r ect ion . Aga in , a ba ckwa r d elim in a t ion
pr ocedu r e wa s a dopt ed. On ce a fin a l m odel wa s s elect ed, it wa s r e-ru n a s a fixed m odel t o
ensur e tha t t he coefficient s were consistent ly estimat ed. Table 13.4 present s th e results.
13.42
F ive va r ia bles en ded u p in t h e fina l m odel. Aga in , popu la t ion wa s s ignifican t ly
r ela t ed t o t h e n u m ber of cr im es a t t r a ct ed t o a zon e, bu t wa s n ot t h e st r on gest pr edict or a s
in dica t ed by t h e Z-t es t . Th e st r ongest r ela t ions h ip w a s for t h e n u m ber of r et a il em ployees .
Th is su ggest s t h a t r et a il/com m er cia l a r ea s a t t r a ct m a n y cr im es. Th is is su ppor t ed by on e of
t h e lan d u se var iables - th e a cr ea ge ass ociat ed wit h very lar ge ma lls; in oth er wor ds, t h er e
a r e a dd it ion a l cr im es a t t r a cted t o ver y lar ge m a lls a bove-a n d-beyond t h e n u m ber of r et a il
em ployees in t h ose zones . Two ot h er va r ia bles a r e in t h e equ a t ion . Rela t ive in com e equ a lit y
wa s, aga in, n egat ively r elat ed t o cr ime d est ina t ion s/at t r a ct ion s; zon es wit h low incom e t en d
t o at t r a ct m ore crim es . Also, t h er e wa s a n ega t ive a ss ocia t ion wit h dis t a n ce fr om t h e CBD.
Th e fa r t h er a wa y fr om t h e CBD, th e lower t h e n u m ber of cr imes . Over a ll, t h e m odel
su ggest s t h a t zon es wit h com m er cia l a ct ivit ies, p a r t icu la r ly wit h la r ge m a lls, bu t wh ich a r e
clos er t o t h e cit y cen t er a n d wh ich h a ve h ou seh old s wit h r ela t ively lower in com es a r e t h ose
th at at tr act t he most crimes.
Th e over a ll m odel wa s h igh ly sign ifica n t , a s in dica t ed by t h e Lik elih ood Ra t io a n d t h e

R-squ a r e. Ther e wa s a discrep a n cy betw een t h e R-squ a r e st a t ist ic a n d t h e Devian ce R-
squ a r e, ma kin g it u n clear a bou t h ow st r on g is t h e m odel (t h e R-squ a r e wou ld su ggest t h a t
it ’s s t r ong wh er ea s t h e Devia n ce R-squ a r e wou ld n ot). Never t h eles s, t h e overa ll
pr edict a bilit y is r ea sona ble. The a m ou n t of m u lticolinea r ity is t olera ble.
Ta ble 13.4
R e su lt s o f F irs t D e st in a ti on Mo d e l
Model result:
Data file: BCDestinations.dbf
Type of model: Destination
DepVar: BCDEST
N: 325
Df: 319
AIC: 20707.744988
SC: 20730.447939
R-square: 0.596921
-----------------------------------------------------------------------------
Pseudo-
CONSTANT 1 5.485851
0.218977 . 25.052182 0.001
POP96 1 0.000190
0.000027 0.928694 6.935850 0.001
0.005464 0.903130 -3.143462 0.010
RETEMP96 1 0.001018
0.000062 0.717076 16.297855 0.001
VERYLRGMLACR 1 0.006446
0.000974 0.740927 6.616423 0.001
DISTANCE 1 -0.115709
0.017069 0.876461 -6.778875 0.001
-----------------------------------------------------------------------------
13.43
R e si du a l An a ly si s o f D e s ti na ti on Mo de l
As wit h t h e or igin m od el, a n a n a lys is wa s con du ct ed of t h e r es id ua l er r or s . Th is tim e,

t h e out pu t ‘dbf’ file wa s br ough t in t o Excel an d a n icer gr a ph crea t ed (figur e 13.9). Un like
t h e best or igin m odel, th e disper sion of t h e r esidu a ls is not s ymm et r ica l. Ther e a r e severa l
m a jor out lier s, bot h on t h e n ega t ive en d of t h e r esidu a ls (over -est im a t ion of crim e
a t t r a ctions) a n d on t h e posit ive en d (u n der -est im a t ion of crim e a t t r a ctions). In pa r t icula r ,
t h er e a r e t wo zon es t h a t seem t o st a n d ou t . Bot h of t h em h a ve sh oppin g m a lls (Golden Rin g
Ma ll a n d E a st poin t Ma ll). But t h e a m ou n t of cr ime in t h ose zon es is m u ch grea t er t h a n t h e
m odel pr edict s. This is seen a s h igh positive residu a ls (i.e., t h er e wer e m or e a ct u a l cr imes
t h a n pr edicte d). Th ey bot h a r e older m a lls, bu t a r e loca t ed in r ela t ively h igh crim e a r ea s.
Golden Rin g Ma ll wa s d em olish ed sever a l yea r s a go, bu t a fter t h e da t a t h a t a r e bein g
an alyzed in t his exam ple were collected.
Addin g in Sp e cia l Gen e rato rs
Sin ce t h e n u m ber of cr im e in cid en t s (a t t r a ct ion s) in t h ose t wo zon es wer e m u ch

higher t ha n was expected, they were treat ed as ‘special genera tors’. Keeping in m ind the
ca u t ion t h a t on e doesn ’t wan t t o over-u se t h is ca t egor y, we ca n st ill demonst r a t e how it
work s. Two new va r ia bles wer e cre a t ed for t h e da t a set . On e wa s for t h e Golden Rin g Ma ll
a n d on e wa s for t h e E a st poin t Ma ll. For t h e Golden Rin g Ma ll, th e zone t h a t in clud ed it
r eceived a ‘1' for t h is va r ia ble wh ile a ll ot h er zon es r eceived a ‘0'. Sim ila r ly, for t h e E a st poin t
Ma ll va r ia ble, t h e zone in wh ich it occur r ed r eceived a ‘1' wh ile a ll oth er zones r eceived a ‘0'.
Th ese d u m m y va r ia bles wer e t h en in clu ded in t h e m odel (Ta ble 13.5).
Addin g th e t wo special gener a t or s in cr ea ses t h e pr edict a bilit y subs t a n t ially. The

Likelihood Ra t io ju m ps a s does t h e R-squ a r e; t h e Devia n ce R-squ a r e st a t ist ic, however ,
a ct u a lly dr ops, su ggest ing t h a t it is n ot a r eliable indica t or with t h ese da t a . The coefficient s
for t h e t wo zon es , t r ea t ed as sp ecia l gen er a t or s , a r e bot h h igh ly s ign ifica n t a n d, in fa ct , a r e
t h e st r onges t va r ia bles in t h e equ a t ion . All ot h er va r ia bles h a ve t h e sa m e r ela t ion sh ips a s
in t h e first r u n . Th er e does n ot a pp ea r t o be su bst a n t ia l m u lt icolin ea r it y.
Th is br in gs u p a n iss u e over t h e st a t u s of a s pecial gen er a t or. I n t h is exa m ple , t h e

t wo zon es wer e t r ea t ed a s s pecial gen er a t ors in t h e m odel. While t h e pr edicta bilit y
in cr ea s ed su bs t a n t ia lly, on e h a s t o won d er wh et h er t h is wa s a m ea n in gfu l op er a t ion ? Th a t
is , if t h is m odel wer e a pplied t o da t a for a la t er t im e per iod (e.g., 2002-2004 cr im e da t a ),
wou ld t h e r elat ion sh ips st ill h old? In t h e ca se of t h e Golden Ring Ma ll, it wouldn ’t sin ce t h a t
ma ll ha s since been demolished.
Th e valu e of a special gener a t or is t h a t it iden t ifies a lan d u se t h a t wou ld be expect ed

t o be r ela t ively per m a n en t (e.g., a st a diu m or a t r a in st a t ion or a n a ir por t ). In t h e ca se of a
sh oppin g m a ll, it m a y or m a y not. If it ’s a h igh visibilit y ‘r egiona l’ m a ll, th en t r ea t in g it a s a
sp ecia l gen er a t or is pr obably a good id ea . If it ’s a sm a ller , older m a ll, on t h e oth er h a n d, t h e
a n a lysis is gu es sin g t h a t t h e m a ll will m a in t a in it s s t a t u s a s a h igh crim e a t t r a ction loca t ion.
Clea r ly, ju dgm en t a n d k n owledge of t h e pa r t icula r m a ll is es sen t ia l.
13.44
Figure 13.9:
Residual Errors for Crime Destinations
2000
1500
(Observed-Predicted)
1000
500
Residual error
0
0 500 1000 1500 2000 2500 3000
-500
-1000
Predicted values
Ta ble 13.5
R e su lt s o f S e c on d D e st in a ti on Mo d e l
Model result:
Data file: BcDestinations.dbf
Type of model: Destination
DepVar: BCDEST
N: 325
Df: 317
AIC: 15720.476911
SC: 15750.747513
R-square: 0.784194
-----------------------------------------------------------------------------
Pseudo-
CONSTANT 1 5.182117
0.067867 . 76.356923 0.001
0.003942 0.902950 -5.276135 0.001
RETEMP96 1 0.000995
0.000051 0.700294 19.338957 0.001
VERYLRGMLACR 1 0.006590
0.000869 0.716299 7.582758 0.001
POP96 1 0.000238
0.000020 0.921456 12.164552 0.001
DISTANCE 1 -0.087826
0.012462 0.872535 -7.047735 0.001
GOLDENRING 1 1.933321
0.069636 0.969044 27.763123 0.001
EASTPOINT 1 1.602000
0.067934 0.943548 23.581751 0.001
-----------------------------------------------------------------------------
Co m p a r i n g D i ffe r e n t Cr im e s T y p e s
With or with ou t special gener a t or s, a t r ip gener a t ion m odel is an ecologica l model

t h a t pr edict s crim e or igins a n d crim e dest ina t ion s. A poin t wa s m a de in cha pt er 11 th a t
t h ese m odels a r e n ot beha vior a l, but a r e cor r elat es of cr imes . Tha t is, th e var iables t h a t en d
u p pr edict in g t h e n u m ber of cr im es a r e n ot reason s (or expla n a t ions ) for t h e crim es .
P opu lat ion a lmost a lways en t er s t h e equa t ion beca u se, all ot h er t h ings bein g equa l, zon es
wit h la r ger n u m ber s of per son s w ill h a ve m ore crim es , both origin a t in g a n d en din g in t h em .
Sim ila r ly, low in com e st a t u s is fr equ en t ly a ssocia t ed wit h h igh cr im e a r ea s. It doesn ’t follow
t h a t low incom e per sons will be more pr on e t o com m it crim es; it m a y be tr u e but t h ese
m odels don ’t t est t h a t pr oposit ion . Th ese a r e on ly cor r ela t es wit h cr im e in t h ose
en vir onm en t s. As wa s m en t ion ed ea r lier , t h ese va r ia bles a r e oft en cor r ela t ed wit h m a n y
specific conditions t ha t m ay be pr edict or s of ind ividu a l cr ime - povert y, dr u g use,
su bst a n da r d h ou sin g, a n d la ck of job opport u n ities.
To see t h is, t h r ee s epa r a t e m odels of sp ecific crim e t ypes w er e r u n for r obber y,

bu r gla r y, a n d veh icle t h eft . F or ea ch cr im e t yp e, t h e gen er a l m odel wa s t est ed for bot h t h e
or igin a n d t h e dest ina t ion m odels. If a var iable wa s n ot significa n t , it wa s dr opped a n d t h e
m odel wa s r e-ru n . Th e r esu lt s of t h e origin m odel for t h e t h r ee cr im e t ypes a r e seen in t a ble
13.6 while th e r esu lts of t h e dest ina t ion m odel a r e seen in t a ble 13.7.
13.46
Ta ble 13.6
Mo d e l s fo r S p e c i fi c Cr im e T yp e s :
Ori gi n Mo d e l
Vehicle
All Crimes Robbery Burglary Theft
CONSTANT 2.286699 -0.652291 1.621546 -0.800759
INCOME
EQUALITY -0.018525 -0.023964 - -0.019620
NON-RETAIL
EMPLOYMENT -0.000186 -0.000237 -0.000239 -0.000188
RETAIL
EMPLOYMENT -0.000353 - -
POPULATION 0.000284 0.000297 0.000242 0.000342
BELTWAY 0.123109 - - -
MILES OF
ARTERIAL -0.085070 - - -0.180966
Ta ble 13.7
Mo d e l s fo r S p e c i fi c Cr im e T yp e s :
D e st in a ti on Mo d e l
Vehicle
All Crimes Robbery Burglary Theft
CONSTANT 5.485851 3.284488 3.246183 2.610299
INCOME
EQUALITY -0.017176 -0.027946 -0.034598 -0.012910
RETAIL
EMPLOYMENT 0.001018 0.000844 - 0.000507
VERY
LARGE
MALL
ACREAGE 0.006446 0.004332 - -
POPULATION 0.000190 0.000223 0.000309 0.000247
DISTANCE
FROM CBD -0.115709 -0.096330 -0.038715 -0.096088
Th e popula t ion var iable a ppea r s in every sin gle m odel. As m en t ion ed, all ot h er
t h in gs bein g equ a l, th e la r ger t h e n u m ber of per son s in a zon e, t h e m ore cr im e even t s will
occu r wh et h er t h os e even t s a r e cr im e p r od u ct ion s (or igin s ) or cr im e a t t r a ct ion s
13.47
(dest in a t ion s). Sim ila r ly, re la t ive in com e equ a lit y a pp ea r s in five of t h e six m odels w it h t h e
coefficien t a lwa ys bein g n ega t ive. In gen er a l, zones w it h r ela t ively lower in com es will h a ve
m ore r obber ies, bu r gla r ies, an d veh icle t h efts. Th e only m odel for wh ich in com e equ a lit y did
n ot a ppea r wa s a s a n or igin va r iable for bur gla r ies; a ppa r en t ly, bur gla r s com e fr om zon es
wit h va r ious in com e levels, a t lea st in Ba lt im ore .
Th e ot h er gener a l va r iables h a ve more limit ed a pplica bilit y. Reta il em ploymen t

pr edicts both t ota l cr im e origin s a n d t ota l cr im e dest in a t ion s, bu t only pr edicts sp ecifically
r obber y des t in a t ion s a n d veh icle t h eft dest in a t ion s; t h e la t t er t en d t o occu r m ore in
com m er cia l a r ea s t h a n n ot . On t h e ot h er h a n d, n on -r et a il em p loym en t a pp ea r s t o be
imp or t a n t on ly a s a cr ime origin var iable; zon es wit h less n on -r et a il em ploymen t t en d t o
pr odu ce m or e offen der t r ips. Dista n ce fr om t h e CBD on ly a ppea r s a s a dest ina t ion var iable;
t h e closer a zon e is t o th e m et r opolit a n cent er , t h e h igher t h e n u m ber of crim es bein g
a t t r a cted t o th a t zon e; th is va r ia ble wa s n ot im por t a n t in t h e origin m odel.
In ot h er wor ds, t h ese m odels a r e m ea su r ing gener a l con dit ion s a ssociat ed wit h cr ime,
not cau ses per se. Th ey ca pt u r e t h e gen er a l con t ext u a l r ela t ion sh ip s a ss ocia t ed wit h cr im e
pr odu ction s a n d a t t r a ction s. Bu t , t h ey don’t n ecessa r ily pr edict in divid u a l beh a vior.
Never t h eles s, t h e m odels ca n be u sed for pr ediction sin ce th e con dit ion s a pp ea r t o be qu it e
gen er a l.
Ad d i n g E x te r n a l Tr ip s
Aft er a n or igin a n d d es t in a t ion m od el h a s been d evelop ed , t h e n ext s t ep is t o a d d a n y

crime tr ips th at cam e from out side the modeling area (extern al tr ips). In th is case, these
wou ld be t r ip s t h a t ca m e fr om a r ea s t h a t wer e n ot in eit h er Ba lt im or e Cou n t y or t h e Cit y of
Ba lt im ore (th e m odelin g a r ea ).
A sim ple est ima t e of exter n a l tr ips is obt a ined by ta kin g th e differ en ce bet ween t h e
t ot a l n u m ber of crim es occur r in g in t h e st u dy a r ea (Ba lt im or e Cou n t y dest in a t ion s) a n d t h e
t ot a l n u m ber of crim es or igin a t in g in t h e m odelin g a r ea (t a ble 13.8).
Th e differ en ce bet ween t h e n u m ber of cr im e en u m er a t ed wit h in Ba lt im or e Cou n t y

a n d t h a t origin a t in g fr om bot h Ba lt im ore Coun t y a n d t h e Cit y of Balt im ore is 1,627. Th is is
3.9% of th e tota l Baltimore Coun ty crimes. In genera l, it is import an t t ha t t he extern al tr ips
be as sm a ll a s possible. Or t u zar a n d Willum sen (2001) su ggest t h a t t h is per cen t a ge be no
gr ea t er t h a n 5% in or d er t o m in im ize p ot en t ia l bia s fr om n ot in clu d in g t h os e ca s es in t h e
origin m odel. It ’s n ot a n a bs olut e per cent a ge, bu t m ore like a r u le of t h u m b; in t h eor y, a n y
exter n a l tr ips cou ld bias t h e or igin m odel. But , in pr a ct ice, th e er r or will be sm a ll if exter n a l
cr im e t r ip s a r e a sm a ll p er cen t a ge of t h e t ot a l n u m ber en u m er a t ed in t h e d es t in a t ion cou n t y.
In t h is ca se , t h e con dit ion h olds. For t h e t h r ee t ypes of cr im e m odeled, t h e per cent a ge

of ext er n a l t r ip s wa s a ls o less t h a n 5%: r obber y (4.0%), bu r gla r y (4.5%), a n d veh icle t h eft
(1.4%). On t h e ot h er h a n d, if t h e p er cen t a ge of ext er n a l t r ip s is gr ea t er t h a n ap pr oxim a t ely
5%, a u ser would be a dvis ed t o widen t h e origin st u dy a r ea t o in clud e m ore zon es in t h e
m odel.
13.48
Ta ble 13.8
E stimatin g E xtern al Cr im e Trips in Balt imor e Coun t y
N u m ber of
crim es en din g
in 325 Ba lt im or e
Coun ty zones: 41,969
N u m ber of
crim es origin a t in g
in 532 Ba lt im or e
Cou n t y/City zon es: 40,342
Cr im es fr om
ou t s id e t h e
m od elin g a r ea : 1,627
Note: exter n a l t r ips a r e only a dd ed t o th e origin m odel sin ce th ey a r e crim e

t r ip s t h a t or igin a t e ou t sid e t h e m odelin g a r ea . Th ey a r e n ot r eleva n t for t h e
dest in a t ion m odel.
P r e d i c t i n g Ex t e r n a l Tr ip s
If a m odel is bein g a pp lied t o an oth er da t a set from wh ich it wa s in it ia lly est im a t ed, a
pr oblem em er ges a bout h ow to est im a t e t h e n u m ber of ext er n a l t r ips . It is one t h in g t o ap ply
sim ple a r ith m et ic in order t o det er m ine h ow m a n y tr ips origin a t ed out side t h e m odeling a r ea
(as in t a ble 13.8). It is a n oth er t o kn ow how t o ca lcula t e ext er n a l t r ips wh en t h e m odel is
bein g a pplied t o ot h er da t a . F or t h e m odeled zon es, t h e coefficien t s a r e a pplied t o t h e
var iables of t h e m odel (see “Ma ke P r edict ion ” below). But , th e exter n a l tr ips h a ve to be
es t im a t ed in depen den t ly.
There is not a simple way to estimat e extern al crime tr ips. Unlike regular tr ips th at
ca n be es t im a t ed th r ou gh cor don cou n t s, cr im e t r ip s a r e n ot det ect a ble wh ile t h ey a r e
occu r r in g (i.e., on e ca n n ot s t a n d by a r oa d a n d cou n t offen d er s t r a velin g by). Th u s , t h ey
ha ve to be estimat ed.
A s im p le m et h od is t o ca lcu la t e t h e n u m ber of ext er n a l t r ip s for t wo t im e p er iod s. F or

exa m ple, ext er n a l t r ip s cou ld be ca lcu la t ed fr om a 2000 da t a set by s u bt r a ct in g t h e t ot a l
n u m ber of crim es occur r in g in t h e m odelin g r egion fr om t h e t ota l n u m ber of crim es occur r in g
in t h e st u dy a r ea (e.g., as in t a ble 13.8 a bove). If a sim ila r calcu la t ion wa s m a de for , sa y,
2002, th en t h e differ en ce (t h e ‘t r en d’) cou ld be extr a pola t ed. To t a ke our exam ple, between
1993 a n d 1996, t h er e wer e 1,627 ext er n a l t r ip s. If t h e n u m ber of ext er n a l t r ip s t u r n ed ou t t o
be 1,850 for 1997-2000, t h en t h e differ en ce (1,850 - 1,627 = 223) could be a pplied for fu t u r e
13.49
yea r s. E ssen t ia lly, a slop e is bein g ca lcu la t ed a n d a pplied a s a lin ea r equ a t ion :
Yi = 1850 + 223*Xi
wh er e Yi is t h e n u m ber of crim e origin s d u r in g a fou r yea r per iod, I, a n d Xi is a n in t eger for a

fou r year per iod st a r t ing with t h e n ext per iod (i.e., th e bas e year , 1997-2000, ha s in t eger
valu e of 0). In oth er wor ds, a lin ea r t r en d is being extr a pola t ed.
H ow r ea list ic is t h is ? F or sh or t t im e per iod s, lin ea r ext r a pola t ion is pr oba bly a s good
a m et h od a s a n y. But for lon ger t im e per iods , it can lea d t o spu r iou s con clus ion s (e.g., cr im e
t r ips from out side t h e r egion will a lways in cr ea se). Sh or t of developin g a soph ist ica t ed
m odel t h a t r elat es crim e t r ips t o t h e growth of t h e m et r opolita n a r ea a n d t o ot h er
m et r opolit a n a r ea s wit h in , s a y, 500 m iles, a lin ea r ext r a pola t ion is on e of t h e few m et h ods
t h a t one can a pp ly. 7
Ma k e P r e d i c ti o n
I n Crim eS tat, ext er n a l t r ip s a r e a dded on t h e secon d pa ge of t h e t r ip gen er a t ion -

Ma k e pr ediction. Th is is a pa ge wh er e t h e m odeled coefficien t s a n d a n y ext er n a l t r ips a r e
a pplied t o a da t a set . Ther e a r e t wo r ea sons wh y th is is a s epa r a t e pa ge fr om t h e “Ca libra t e
m odel” pa ge wher e t h e m odel wa s calibra t ed. Fir st , th e coefficient s m igh t be ap plied t o
a n ot h er da t a t h a n t h a t fr om wh ich it wa s calibra t ed. For exa m ple, on e m igh t ca libra t e t h e
m odel with a da t a set fr om 1998-2000 an d t h en a pply to a d a t a set coverin g 2001-2003.
Sim ila r ly, on e m igh t t a k e fu t u r e yea r for eca st s (e.g., 2025) a n d a pply t h e m odel. In effect ,
t h e m odel wou ld be pr edict ing t h e n u m ber of fu t u r e cr imes if t h e sa m e con dit ion s h old over
t h e t ime fra m e.
A second r eason for separ at ing th e calibrat ion a nd a pplicat ion pa ges is to add
ext er n a l t r ip s t o t h e or igin zon es. As m en t ion ed a bove, ext er n a l t r ip s a r e, by d efin it ion , t h ose
t h a t wer e n ot m odeled in t h e ca libr a t ion . Th ey h a ve t o be ca lcu la t ed in depen den t ly of th e
m od el a n d t h en a d ded in .
Th u s, t h e “Ma ke p r edict ion ” pa ge allows t h ese opera t ion s t o occu r . Figur e 13.10
sh ows t h e pa ge. Ther e a r e severa l st eps t h a t h a ve to be im plem en t ed for t h is pa ge to be
oper a t ive.
1. Th e da t a file h a s t o be in pu t a s eit h er t h e pr im a r y or secon da r y file (n ot sh own

in t h e im a ge). In t h is exa m ple, t h e sa m e da t a set is bein g u sed a s wa s u sed for
t h e calibr a t ion . Bu t , if it ’s a differ en t da t a set , t h a t will n eed t o be inp u t in t h e
Dat a Setu p section. Wheth er th e inpu t da ta set is a pr imar y file (th e usua l
occu r r en ce) or a secon da r y file n eeds t o be specified. Also, in dica t e wh et h er
t h e a pp lied m odel is t o be an origin or d est in a t ion m odel. In figu r e 13.10, it is
specified a s a n or igin file.
13.50
Figure 13.10:
"Make Prediction" Setup Page
2. A t r ip gener a t ion coefficient s file needs t o be inpu t . These wer e t h e est ima t ed
coefficien t s fr om t h e calibr a t ion st a ge. In pu t t in g t h is file br in gs in t h e
coefficien t s in t h e ord er in wh ich t h ey wer e sa ved. Th ey a r e list ed in t h e
“Matching para met ers” dialogue box on t he right side of th e page.
3. On t h e left side of t h e pa ge ar e list ed a ll t h e var iables in t h e inpu t da t a set

(p r im a r y or s econ d a r y file). In t h e m id dle box, t h e va r ia bles a r e a d ded in t h e
s am e o rd e r a s in t h e m a t ch ing pa r a m et er s box. Tha t is, each ind epen den t
va r ia ble n eeds t o be m a t ch ed t o t h e va r ia ble fr om t h e coefficien t s file, on e for
one. Th is is ve ry im p ort an t. Th e n a m es do not h a ve t o be th e sa m e (e.g., if
t h e m odel wa s ca libr a t ed wit h da t a set a n d a pp lied t o an oth er , t h e va r ia ble
n a m es m a y not be iden t ica l). But t h e con t en t a n d order of t h e var iables n eeds
t o be th e sa m e. In t h e exa m ple, t h e firs t va r ia ble in t h e coefficien t s file is
INCE QUAL. Th e select ed va r ia ble in t h e m id dle box h a s t o be t h e in com e
equ a lit y var ia ble (wh a t ever it s n a m e). In t h e exa m ple, t h e sa m e da t a set is
bein g u sed so t h e n a m es a r e id en t ica l. Th is is r epea t ed for ea ch of th e
ind epen den t var iables in t h e coefficient s file.
4. Next , a n y m is sin g va lu e codes a r e specified in t h e m is sin g va lu es box. An y

r ecor ds wit h a m is sin g va lu e for a n y of t h e select ed in depen den t va r ia bles will
be d r opped from t h e calcu la t ion . In t h e exa m ple, t h er e a r e n o mis sin g valu e
codes applied oth er th an th e defau lt blan k field.
5. If exter n a l tr ips a r e t o be ad ded, t h e exter n a l tr ips box m u st be ch ecked .

E xt er n a l t r ip s cou ld be a pplied in a n or igin m odel, bu t n ot in a dest in a t ion
m odel. If th ey a r e t o be add ed, t h e n u m ber of t r ips sh ould be s pecified in t h e
“Nu mber of extern al tr ips” box an d th e zone ID field for t he file indicat ed; in
t h e exam ple, 1627 is a dded a s ext er n a l tr ips a n d t h e TAZ field is sp ecified a s
t h e ID va r ia ble (TZ98).
6. Th e t ype of model t o be app lied is in dica t ed in t h e “Type of r egr ession m odel”

box. Ther e a r e only t wo ch oices : Poiss on (t h e defa u lt ) an d N orm a l (OLS).
Sin ce t h e coefficien t s a r e bein g a pplied t o t h e da t a , n o over -dis per sion
cor r ect ion is n eces sa r y (sin ce it wa s pr oba bly u sed in ca libr a t in g t h e m odel).
7. F in a lly, t h e out pu t file na m e is define d in t h e “Sa ve pr edicte d va lu es ” box.
For each zone, the rout ine will th en ta ke th e appropriat e var iable from t he input dat a
set a n d a pply th e m a t ch ing coefficient fr om t r ip gener a t ion coefficient s file to produ ce a
pr edicte d est im a t e of t h e n u m ber of t r ips . To ca lcula t e t h is va lu e, for t h e OLS m odel, t h e
r out in e will u se equ a t ion 13.2 a bove wh ile for t h e P oiss on m odel, t h e r out in e will u se
equ a t ion 13.6 above; for t h e lat t er , it will t h en r a ise t h e pr edict ed log va lue t o t h e power, e,
to produce a prediction for t he expected nu mber of crime tr ips:
8i = e ]L n (8i) ] (13.26)
13.52
If ext er n a l t r ips a r e a dd ed, a n ew zone is cr ea t ed called E XTE RN AL in t h e ID field
t h a t wa s in dica t ed on t h e pa ge. Then , th e specified nu m ber of exter n a l tr ips is sim ply placed
in t h a t field wit h zer os bein g pla ced for t h e va lu es of a ll t h e r em a in in g va r ia bles in t h e file.
By d efa u lt , t h e ou t pu t n a m e for t h e pr edict ed n u m ber of crim es will be ca lled P RE DORIG for
a n origin m odel a n d P RE DDE ST for a dest in a t ion m odel. An exa m ple da t a set is a va ila ble
on t h e Crim eS tat downloa d pa ge.
Note: for a dest in a t ion m odel, t h is “Ma k e pr ediction” oper a t ion is n ot n ecess a r ily
n eed ed if t h e sa m e da t a set is u sed for calibr a t ion a n d p r ediction. Th is s t ep is
pr im a r ily for t h e origin file
B a la n ci ng P re d ic te d Ori gi ns an d De s ti na ti on s
Aft er t h e or igin m od el a n d d es t in a t ion m od el a r e ca libr a t ed an d a pp lied to a da t a set ,

t h e fina l st ep in t r ip gen er a t ion is t o ens u r e t h a t t h e n u m ber of pr edicted origin s equ a ls t h e
n u m ber of pr edict ed des t ina t ion s. This is n ecessa r y for t h e n ext s t a ge of cr ime t r a vel
dem a n d m odelin g - tr ip d ist r ibu t ion. S in ce a t r ip h a s both a n origin a n d a dest in a t ion, t h e
t ota l n u m ber of origin s h a s t o equa l t h e t ota l n u m ber of dest in a t ion s. Th is is a n a bsolu t e
con d it ion for t h e t r ip dis t r ibu t ion m od el t o wor k ; t h e r ou t in e will r et u r n a n er r or m es sa ge if
th e num ber of origins does not equal th e num ber of destinat ions.
If t h e P ois son m odel is u sed for ca libr a t ion , t h e r ou t in e en su r es t h a t t h e n u m ber of

pr edicted t r ips equ a ls t h e n u m ber of in pu t t r ips . Fu r t h er , if t h e calcu la t ion of ext er n a l t r ips
h a s been obt a ined by subt r a ct ing t h e t ot a l nu m ber of pr edict ed origin s from t h e t ot a l nu m ber
of pr edicted dest in a t ion s, a n d if t h e ext er n a l t r ips a r e t h en a dd ed t o th e pr edicted origin s,
t h en m ost likely t h e t ota l n u m ber of origin s w ill equ a l t h e t ota l n u m ber of dest in a t ions .
H owever , becau se of r oun din g-off err ors a n d in con sis t en t ext er n a l t r ip es t im a t es, it is
poss ible t h a t t h e su m s a r e n ot equ a l.
Con sequ en t ly, it is im p or t a n t t o ba la n ce t h e p red ict ed or igin s a n d d es t in a t ion s t o

en su r e t h a t n o pr oblem s will occu r in t h e t r ip dis t r ibu t ion m odel. Th er e a r e t wo wa ys t o do
t h is in Crim eS tat. F ir s t , t h e n u m ber of p r ed ict ed des t in a t ion s is h eld con s t a n t a nd t h e
n u m ber of pr edicte d or igin s is a dju st ed t o ma t ch t h is n u m ber . Th is is t h e defa u lt choice.
Secon d, th e n u m ber of pr edict ed origin s is h eld con st a n t a n d t h e n u m ber of pr edict ed
dest in a t ions is a dju st ed t o ma t ch t h is n u m ber .
Th e ca lcu la t ion is essen t ia lly a m u lt ip lier t h a t is a pplied t o ea ch zon e. If dest in a t ion s

ar e to be held const an t, the m ultiplier is defined as:
M j = G (Cr im es by dest in a t ion s, j) / G (Crimes by origins, I) (13.27)
a n d t h e pr edicte d n u m ber of origin s is m u lt iplie d by M j. If, on t h e oth er h a n d, t h e origin s

ar e to be held const an t, the m ultiplier is defined as:
M i = G (Cr im es by or igin s, I) / G (Cr im es by d es t in a t ion s, j) (13.28)
13.53
a n d t h e pr edicte d n u m ber of dest in a t ions is m u lt iplie d by M i . Th e m u lt iplicat ion sim ply
en su r es t h a t t h e su m s of t h e pr edicted origin s a n d p r edicted dest in a t ion s a r e equ a l.
Th e t h ir d p a ge in t h e t r ip gen er a t ion m odel is t h e “Ba la n ce pr edicted origin s &

dest ina t ion s” pa ge. Figur e 13.11 shows th e set u p for t h is pa ge. The st eps a r e as follows:
1. Th e box is ch ecked in dica t in g t h a t it is a ba la n cin g op er a t ion .
2. Th e pr edicte d or igin file is in pu t a n d t h e pr edicte d or igin va r ia ble is iden t ified.

In t h e exa m ple, t h e pr edict ed or igin file is ca lled “P r edict edOr igin s.d bf” a n d
t h e field wit h t h e pr edicte d n u m ber s w a s ca lled P RE DORIG.
3. Th e pr edicted dest in a t ion file is in pu t a n d t h e pr edicted dest in a t ion va r ia ble is

iden t ified. In t h e exam ple, th e pr edict ed des t ina t ion file is ca lled
“Pr edict edDest ina t ion s.dbf” an d t h e field with t h e pr edict ed n u m bers was
ca lled PRE DDE ST.
Note t h a t t h ese files a r e in pu t on t h is p a ge a n d n ot on t h e pr im a r y or secon da r y file

pages.
4. Next , t h e t yp e of ba la n cin g is specifie d - H oldin g dest in a t ion s const a n t (t h e

defau lt) or h olding or igins con st a n t . In t h e exam ple, th e dest ina t ion s a r e t o be
h eld con st a n t .
5. F in a lly, th e out pu t file is s pecified. If th e origin s a r e t o be adju st ed, t h en only

t h e origin file is s a ved. If th e des t in a t ions a r e t o be a dju st ed, t h en only t h e
dest in a t ion file is sa ved. In ot h er wor ds, t h e a dju st m en t is a pplied t o on ly on e
of t h e t wo pr edict ed crim e files. In t h e exam ple, th e file wa s n a m ed
“Adjust edPredictedOrigins.dbf” (not sh own) since the origin file was adjust ed.
The out put produces a new column with th e adjust ed values. Table 13.9 shows the
origin out put for t he Baltimore data of th e first 11 records. Once the balancing has been
com plet ed, t h e t r ip gen er a t ion m odel is finish ed a n d t h e u ser can go on t o th e t r ip
dis t r ibu t ion m odel. In oth er words , t h e out pu t file en su r es t h a t both t h e pr edicted origin file
(cr ime p r odu ct ion s) an d pr edict ed des t ina t ion file (cr ime a t t r a ct ion s) ar e bala n ced.
S t re n g t h s a n d We a k n e s s e s o f Re g r e s s io n Mo d e l in g o f Tr ip s
As m en t ion ed ea r lier, th e u se of r egres sion for pr odu cing t h e t r ip gener a t ion m odel
h a s it s s t r en gt h s a n d wea k n esses. Th e a dva n t a ges a r e t h a t , fir st , t h e a pp r oach is a pp lica ble
t o cr im e in cid en t s. U n lik e r egu la r t r a vel beh a vior , cr im e t r ip s h a ve t o be in fer r ed fr om
police r eport s; one ca n n ot con du ct a h ous eh old s u r vey of offen der s a sk in g t h em a bout t h eir
cr ime t r a vel. Thu s, st a r t ing with cou n t s of t h e n u m ber of cr imes occu r r ing in ea ch zon e a n d
t h e n u m ber of cr imes t h a t or igina t e fr om ea ch zon e, a m odel ca n be con st r u ct ed.
13.54
Figure 13.11:
Balance Predicted Origins and Destinations Setup
Ta ble 13.9
Ad ju s t e d D a t a S h o u l d Ha v e Th e s e F i e l d s
Secon d, th e u se of a n on -linea r m odel, such a s t h e Poisson , allows m or e com plex

fit t in g of cr im e cou n t s. In t h e ea r ly 1970s wh en t r ip gen er a t ion m od els wer e s ta r t in g t o be
im plem en t ed in Met r opolit a n P la n n in g Or ga n iza t ion s a r ou n d t h e U.S ., t h e m a jor t yp e of
r egr ession m odelin g a va ila ble wa s OLS. At t h a t t im e, r esea r ch er s cou ld n ot dem on st r a t e
t h a t t h is m et h od wa s r elia ble in t er m s of pr edict in g t r a vel; we’ve dis cu ssed t h ose r ea son s
ea r lier in t h is ch a pt er . H owever , wit h t h e a va ila bilit y of s oft wa r e for con du ct in g P ois son a n d
ot h er n on -linea r m odels, th a t cr iticism is no lon ger a pplica ble. The P oisson m odel is very
‘well beh a ved’ wit h r es pect t o coun t da t a . It does n ot p r odu ce nega t ive es t im a t es . It
r equ ires h igh levels of a n ind epen den t var iable t o pr odu ce a sligh t effect in t h e depen den t
va r ia ble, bu t t h a t t h e level in cre a se s a s t h e va lu es of t h e in depen den t va r ia ble in cre a se . It
m a int a ins con st a n cy between t h e su m of t h e inpu t cou n t s a n d t h e su m of t h e pr edict ed
coun ts. Non-linear models ar e much more rea listic for m odeling tr ips th an OLS.
Th ird , th e u se of a m u ltiple r egres sion m odel a llows m u ltiple ind epen den t var iables
t o be inclu ded. In ou r exam ple, th er e wer e six an d five va r iables r espectively in t h e gener a l
or igin a n d dest in a t ion m odels . Tr ip t a bles, on t h e ot h er h a n d, t yp ica lly on ly h a ve t h r ee or
fou r in depen den t pr edict or s; it becom es t oo com plica t ed t o keep t r a ck of m u lt ip le con dit ion s
of p red ict or va r ia bles . Th u s, a m or e com p lex a n d s op his t ica t ed mod el ca n be p rod uced wit h
a r egr es sion fra m ework .
F ou r t h , an d fin a lly, a r egres sion fr a m ewor k a llows for com plex in t er a ct ion s t o be
est im a t ed. F or exa m ple, t h e log of a n in depen den t va r ia ble ca n be defin ed. An in t er a ct ion
bet ween t wo of t h e in depen den t va r ia bles ca n be exa m in ed (e.g., m edia n h ou seh old in com e
for t h os e zon es h a vin g a s izea ble a m ou n t of r et a il em p loym en t ). In t h e t r ip t a ble a p pr oa ch ,
t h ese in t er a ct ion s a r e im plicit in t h e cell m ea n s. Th u s, over a ll, t h e r egr ession fr a m ewor k
a llows for a m or e com plex m odel t h a n is a va ila ble wit h a t r ip t a ble a ppr oa ch .
13.56
On t h e ot h er h a n d, t h er e a r e pot en t ia l p r oblem s a ssocia t ed wit h a r egr ession
fr a m ewor k. F irs t , th e r egres sion coefficient s can be influ en ced by zon e size. Since th e m odel
is est ima t ing differ en ces bet ween zon es (i.e., differ en ces in t h e n u m ber of cr imes a s a
fu n ct ion of differ en ces in t h e valu es of t h e indep en den t var iables), zon e size affect s t h e level
of t h ose differ en ces. Wit h sm a ll zon e sizes, th er e will be su bst a n t ial differ en ces bet ween
zon es in both t h e in depen den t a n d d epen den t va r ia bles . Con ver sely, lar ge zone s izes w ill
m in im ize wit h in -zon e differ en ces, bu t will u su a lly in cr ea se t h e est im a t e of t h e bet ween -zon e
differ en ces. The r esu lt cou ld be an exaggera t ion of t h e effect of a var iable t h a t wou ld n ot be
seen with sm a ll zon e geogra ph y. As we a r gued in ch a pt er 12, on e sh ou ld ch oose t h e sm a llest
zon e geogra ph y th a t is pr a ct ica l in or der t o m inim ize t h is pr oblem.
Secon d, a poin t t h a t h a s been r epea t ed a ga in a n d a ga in , t h ese m odels a r e n ot

beh a viora l expla n a t ions . Th ey r epr es en t ecologica l cor r ela t ions wit h crim e t r ips . It ’s
im por t a n t t o n ot t r y t o con ver t t h ese m odels in t o exp la n a t ion s of offen der beh a vior . Too
oft en , r esea r cher s h a ve jum ped t o conclusion s a bout in dividu a ls ba sed on t h e r ela t ion sh ips
wit h en vir on m en t s a n d n eigh bor h ood s. It is im p or t a n t t o n ot do t h is . Th is cr it icis m ,
in ciden t a lly, ap plies both t o th e t r ip t a ble a s well a s t h e r egr ession a pp r oach t o tr ip
gen er a t ion m odelin g.
Th e n ew gener a t ion of t r a vel dem a n d m odels is sp ecifica lly beha vior a l an d in volves
m odelin g t h e beh a vior of specific in divid u a ls . P r oba bilit ies a r e ca lcu la t ed ba sed on
in divid u a l choice an d a m icro-sim u la t ion r out in e can a pp ly t h es e pr obabilit ies t o a la r ge
m et r opolit a n a r ea (RDC, 1995; P a s, 1996; Recker , 2000; Sh ifton et a l, 2003). While t h is
a ppr oa ch offer s some d efinit e t h eor et ica l ad van t a ges an d is t h e su bject of m u ch cu r r en t
r esea r ch , to dat e t h er e h a s n ot been a dem on st r a t ion t h a t t h is a ppr oa ch is m or e a ccu r a t e a t
pr edictin g t r ips t h a n t h e t r a dit ion t r ip-ba sed t r a vel dem a n d m odel.
Su m m ary
In su m m a r y, t h e t r ip gen er a t ion m odel is a va lu a ble t ool for pr edict in g t h e n u m ber of

cr im es t h a t or igin a t e in ea ch zon e a n d t h e n u m ber of cr im es t h a t en d in ea ch zon e. E ven if
t h e m odel is not beh a vior a l, t h e m odel ca n be st a ble an d u seful for m a n y yea r s in t h e fu t u r e.
I t is bes t t h ou gh t of a s a proxy m od el in wh ich t h e va r ia bles in t h e m odels a r e pr oxies for
con d it ion s t h a t a r e gen er a t in g cr im es , eit h er in t er m s of en vir on m en t s t h a t p r od u ce
offen der s or in t er m s of loca t ion s t h a t a t tr a ct t h em .
In t h e n ext cha pt er , we will exa m in e t h e secon d s t a ge in t h e t r a vel dem a n d m odel -

t r ip d ist r ibu t ion . In t h a t st a ge, t h e pr edicted crim e origin s a n d t h e pr edicted crim e
destinat ions a re link ed to produce crime tr ips.
13.57
En d n ot e s fo r Ch ap te r 13
1. Th er e is a ls o su bject ivit y in su bdivid in g va r ia bles a t a n in divid u a l level. F or

exam ple, househ old incom e levels ca n be su bdivided in differ en t wa ys. However ,
wit h a ggrega t e da t a , all var ia bles h a ve t o be su bdivided a r bit r a r ily wh er ea s wit h
in divid u a l level d a t a , t ypica lly only in com e is don e t h is w a y.
2. Som e st a t is t icia n s often r efer t o t h e n u m ber of para m eters t h a t h a ve to be

est ima t ed in a n equa t ion , not just t h e n u m ber of ind epen den t var iables. In a n OLS
m odel, for exa m ple, t h er e a r e K+1 pa r a m et er s t h a t a r e est im a t ed - coefficien t s for
t h e K in depen den t va r ia bles a n d a con st a n t t er m . In t h is t ext , K r efer s t o t h e
n u m ber of in depen den t va r ia bles , n ot est im a t ed pa r a m et er s.
3. It is possible t o t r a n sfor m t h e in depen den t va r ia ble in t o a n on -lin ea r pr edict or , for

exam ple by ta kin g th e log of t h e indep en den t var iable or r a ising it t o some power
(e.g., X2 ). However , th is won ’t solve th e ot h er pr oblems a ssociat ed wit h OLS,
na mely negative and n on-sum ma tive predictions.
4. F or exa m ple, t o a ccoun t for t h e sk ewed depen den t va r ia ble, on e or m or e of th e

in depen den t va r ia bles h a ve t o be t r a n sfor m ed wit h a n on -lin ea r oper a t or (e.g., log or
expon en t ia l t er m ). When m ore t h a n one in depen den t va r ia ble is n on-linea r in a n
equ a t ion , t h e m odel is n o longer ea sily u n der st ood. It m a y en d u p m a k in g
r ea son a ble pr edict ion s for t h e depen den t va r ia ble, bu t it is n ot in t u it ive a n d n ot
ea sily expla in ed t o non -specia list s.
5. Note, Lu c An selin u ses K for t h e n u m ber of pa r a m et er s (coefficien t s + in t er cept ) in

Appen dix C wher ea s we u se it for t h e n u m ber of ind epen den t var iables. Read er s
sh ou ld be awa r e of t h is differ en ce.
6. In t h e u su a l t r a vel dem a n d m odelin g, on t h e ot h er h a n d, m odeler s u su a lly a dju st

th e predicted destinat ions since the origin data is more reliable. These nu mbers a re
obt a in ed fr om t h e cen su s or fr om t h e s am p le of h ou seh old s wh o a r e in t er viewed to
produce a sam ple from which da ta on destina tions a re obtained.
7. An a lt er n a t ive m igh t be t o u s e cor d on cou n t s fr om m a jor h igh wa ys com in g in t o t h e

r egion a n d a ss u m e t h a t crim e t r ips r epr es en t a con st a n t pr oport ion of th ose t r ips .
Th u s, if t h e t ot a l n u m ber of es t im a t ed ext er n a l h igh wa y t r ip s in cr ea ses by 5%, on e
cou ld a ss u m e t h a t t h e ext er n a l t r ips a lso increa se by 5%. Wh ile t h is is pla u sible, it
is n ot n ecess a r ily an a ccu r a t e es t im a t e. Ta lk t o your Met r opolit a n P la n n in g
Or ga n iza t ion or t h e St a t e Depa r t m en t of Tr a n sp ort a t ion if you a r e in t er est ed in
developin g t h is t ype of model a s you will n eed t h eir est im a t es of ext er n a l t r ips .
13.58

CrimeStatChapter 13 PDF

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

CrimeStatChapter 13 PDF

Uploaded by

Copyright:

Available Formats

Chapter 13

In t h is ch a pt er , th e t h eor y an d m ech a n ics of t h e t r ip gener a t ion st a ge will be

Two poin t s s h ould be em ph a size d. F ir st , t h e m odels a r e pr edictive . Th a t is, t h e

Secon d, becau se t h e n u m ber of crim es a t t r a cted t o th e st u dy ju r isd iction will

In t h e followin g discus sion , fir st , t h e logic behin d t r ip gen er a t ion m odelin g is

Th e pr ocess of m odelin g t r ip gen er a t ion is fair ly well developed, at lea st wit h

Sin ce t h e m odel h a s a ggr ega t ed t r ip s t o a zon e, t h e t r ip pu r poses a r e collect ion s of

H owever , oth er dis t in ction s ca n be m a de. Th e m ost obviou s is by t ype of crim e.

An im p or t a n t p oin t t h a t dis t in gu is h es t h e a ggr ega t e d em a n d t yp es of t r a vel d em a n d

An y t r ip h a s con t ext u a l cor r ela t es a ss ocia t ed wit h it . It is well d ocu m en t ed t h a t t h e

Th e t r ip gen er a t ion m odel bein g im plem en t ed in t h is ver sion of Crim eS tat is a n

The ore tic al R e le va n ce of th e Varia ble s

In gener a l, t h e var iables t h a t a r e select ed sh ou ld be emp irically st a ble an d

On t h e ot h er h a n d, if a va r ia ble is eit h er a cor r ela t e of a kn own pr edict or or

Id iosyn cra t ic va r ia bles a r e t h ose t h a t a pp ea r in u n iqu e sit u a t ion s. F or exa m ple, in

So ci al D is org an iza tio n Varia ble s

Th er e is a ver y la r ge lit er a t u r e on t h e pr edict or s of cr im e, t yp ica lly followin g fr om

Secon d, in m u ch of t h is lit er a t u r e, however , th er e is not a clear dist inction bet ween

Accu racy an d Reliability

A tr ip gen er a t ion m odel s h ould be a ccu r a t e a n d r elia ble. Accuracy m ea n s t h a t t h e

Th ese crit er ia h a ve two im plica t ion s wh ich a r e somewh a t a t odds . Fir st , we ha ve to

An ot h er elem en t of t h e m od el is th a t t h e t r ip gen er a t ion m od el is for coun ts (or

Approa ch es Tow ards Trip Gen eration Mode ling

Th e m a in r ea son t h a t t r ip t a bles a r e u sed in a t r ip gen er a t ion m odel is beca u se of

Th e pr oblem for cr ime a n a lysis, however, is th a t it is im possible t o obt a in t h ese

A s econ d pr oblem wit h t h e t r ip t a ble a p pr oa ch is it s u se wit h zon a l d a t a . Wh ile it

Illus tration of Po ssible Trip Table Approach to Trip Gen eration

Hou seh old in com e

Low Med iu m H igh

0-1 3.2 4.6 6.7

2+ 5.4 7.8 8.1

Th e s econ d a pp r oa ch is t o u s e a regression fr a m ewor k . In t h is ap pr oa ch , t h e n u m ber

Yi = f(X1 , X2 , X3 ,....Xk ) + , (13.1)

Th e m ea n n u m ber of crim es , Yi (eit h er or igin a t in g or en din g in zon e I), is a fu n ct ion of a

Th e t r a dit ion a l ap pr oa ch t o r egres sion m odeling a ssu m ed t h a t t h e indep en den t

Yi = " + $1 X1 + $2 X2 + $3 X3 ....+ $k Xk + , (13.2)

In t h is m odel, th er e a r e K indep en den t var iables a n d one con st a n t t er m (") t h a t n eeds t o be

Th e m et h od for est im a t in g t h is equ a t ion u su a lly m in im izes t h e su m of t h e squ a r es of

P roblem s w ith OLS Reg ress ion Modelin g

H owever , t h er e a r e a n u m ber of m a jor pr oblem s a ssocia t ed wit h OLS r egr ession

F ir st , cr im e even t s a r e ext r em ely st a t is t ica lly s k ewed. Som e loca t ion s h a ve a m u ch

Frequency Distribution of Baltimore Crimes:

Skewness in Crime Origins and Destinations:

Number of events per TAZ

Sk ewn ess m a k es pr edict ion difficu lt . Th e OLS m odel a ssu m es t h a t ea ch

N on -con sis ten t s u m m a ti on

A t h ir d pr oblem wit h t h e OLS m od el is t h a t t he s u m of t h e in p u t va lu es do n ot

A fou r t h p r oblem wit h t h e OLS m od el is t h a t it a s su m es t h e in d ep en d en t va r ia bles

P ois son r egr ession is a n on -lin ea r m odelin g m et h od t h a t over com es som e of th e

Th e “law of r a r e event s” a ssu m es t h a t t h e t ot a l nu m ber of event s will a ppr oxim a t e a

Th e m ea n ca n , in t u r n , be m odeled a s a fu n ct ion of som e ot h er va r ia bles (t h e

wh er e Xk i is a set of in depen den t va r ia bles , $ is a set of coefficient s, an d e is t h e ba se of th e

a n d is k n own a s t h e loglinear m odel. In m ore fa m iliar n ota t ion , t h is is

Ln (8i ) = " + $1 X1 i + $2 X2 i + $3 X3 i +..........+$k Xk i (13.6)

Th a t is, t h e n a t u r a l log of th e m ea n is a fun ction of K r a n dom va r ia bles .

Th e m odel is est im a t ed u sin g a m a xim u m likelih ood p r ocedu r e, t ypica lly t h e

0.4 E(Y) = 0.5

Th e P oisson m odel over com es som e of t h e pr oblem s of th e OLS m odel. F ir st , t h e

Th ir d, becau se t h e P oisson m odel is est im a t ed by a m a xim u m likelih ood m et h od, t h e

F ou r t h , com pa r ed t o t h e OLS m odel, th e Poisson m odel gener a lly gives a bet t er

In oth er words , t h e va r ia n ce is m a n y t im es gr ea t er t h a n t h e m ea n . Most r ea l-wor ld

Number of events per TAZ

wh er e N is t h e sa m ple size, K is t h e n u m ber of in depen den t va r ia bles , Yi is t h e obser ved