You are on page 1of 58

Chapter 13

Trip Generation

B ac k gro u n d

In t h is ch a pt er , th e t h eor y an d m ech a n ics of t h e t r ip gener a t ion st a ge will be


expla in ed. T rip gen eration is a m odel of t h e n u m ber of t r ip s t h a t or igin a t e a n d en d in ea ch
zon e for a given ju r is dict ion . Given a set of N dest in a t ion zon es a n d M or igin zon es (wh ich
inclu de a ll t h e dest ina t ion zon es a n d, possibly, zon es from a djacent ju r isdict ion s), sepa r a t e
m odels a r e pr odu ced of t h e n u m ber of crim es origin a t in g a n d en din g in ea ch of th es e zones .
Th a t is, a sep a r a t e m odel is pr odu ced of t h e n u m ber of cr imes or igina t ing in ea ch of t h e M
or igin zon es, a n d a n ot h er m odel is pr odu ced of th e n u m ber of crim es en din g in ea ch of th e
N destinat ion zones. The first is a crim e prod u ction model while the second is a crim e
attraction m odel.

Two poin t s s h ould be em ph a size d. F ir st , t h e m odels a r e pr edictive . Th a t is, t h e


r esu lt of t h e m odels a r e a pr edict ion of bot h t h e n u m ber of cr im e t r ip s or igin a t in g in ea ch
zone a n d t h e n u m ber of crim e t r ips s en din g in ea ch zon e (i.e., crim es occur r in g in a zone).
Beca u se t h e m odels a r e a pr edict ion , th er e is alwa ys er r or bet ween t h e a ct u a l nu m ber a n d
t h a t pr edict ed. As lon g a s t h e er r or is n ot t oo la r ge, t h e m odel ca n be a u sefu l t ool for bot h
a n a lyzing t h e cor r ela t es of crim e a s well a s bein g useful for for ecas t in g or for sim u la t in g
policy int ervent ions.

Secon d, becau se t h e n u m ber of crim es a t t r a cted t o th e st u dy ju r isd iction will


u su a lly be gr ea t er t h a n t h e n u m ber of crim es pr edicted for t h e origin zon es, du e pr im a r ily
t o cr ime t r ips com ing fr om ou t side t h e or igin a r ea s, it is n ecessa r y to bala n ce t h e
pr odu ctions a n d a t t r a ctions. Th is is don e in t wo st eps. On e, a n est im a t e of t r ips com in g
fr om ou t s id e t h e s t u dy a r ea (ext er n a l t r ip s) is a dd ed t o t h e p r ed ict ed or igin s a s a n ‘ext er n a l
zon e’. Two, a st a t is t ica l a dju st m en t is don e in or der t o en su r e t h a t t he t ot a l n u m ber of
or igins equa ls t h e t ot a l nu m ber of dest ina t ion s. This is called ba la n cin g a n d is es sen t ia l a s
a n in pu t in t o t h e secon d st a ge of cr im e t r a vel d em a n d m odelin g - t r ip dis t r ibu t ion .

In t h e followin g discus sion , fir st , t h e logic behin d t r ip gen er a t ion m odelin g is


pr es en t ed, in clud in g t h e calibr a t ion of a m odel, t h e a dd it ion of exter n a l t r ips in m a k in g a
m odel, a n d t h e ba la n cing of pr edicte d or igin s a n d p r edicte d d es t in a t ions . Secon d, t h e
m echa n ics of cond u ctin g t h e t r ip gen er a t ion m odel wit h Crim eS tat is discussed an d
illu st r a t ed wit h da t a fr om Ba lt im or e Cou n t y.

Mo d e l i n g Tr ip Ge n e r a ti o n

Th e pr ocess of m odelin g t r ip gen er a t ion is fair ly well developed, at lea st wit h


r espect t o or din a r y tr ips. It pr oceeds t h r ou gh a ser ies of logica l st eps t h a t m a ke u p t h e
a ggrega t e t r ip gen er a t ion m odel.

13.1
Tr ip P u r p o s e

Trip genera tion m odeling sta rt s with th e reasons behind tr avel. At a n individua l
level, people m a k e t r ip s for a r ea son - t o go t o wor k , t o go sh oppin g, t o go t o a m edica l
a ppoint m en t , to go for r ecr ea t ion , or , in t h e ca se of offen der s, t o com m it a cr ime. Th ese a r e
called trip pu rposes. Since t h er e a r e a very lar ge nu m ber of t r ip pu r poses, usu a lly t h ese
a r e ca t egor ized in t o a few m a jor groupin gs. In t h e ca se of t h e u su a l tr a vel dem a n d
for eca st in g, t h e d is t in ct ion s a r e h om e-to/ from -w ork (or h ome-based work tr ips), h om e-
to/ from -n on -w ork (or h ome -ba se d n on-wor k t r ips , e.g., sh oppin g), an d a n on -h om e trip
wh er e n eit h er t h e or igin n or t h e dest in a t ion a r e a t t h e t r a veler ’s r esid en ce loca t ion (n on -
h ome -ba se d t r ips ).

Sin ce t h e m odel h a s a ggr ega t ed t r ip s t o a zon e, t h e t r ip pu r poses a r e collect ion s of


t r ips from ea ch origin zon e t o each dest in a t ion zon e. Th u s, ea ch zone pr odu ces a cert a in
n u m ber of h om e-wor k t r ip s, h om e-n on -wor k t r ip s, a n d n on -h om e t r ip s a n d ea ch zon e
a t t r a cts a cert a in n u m ber of h ome -work t r ips , h ome -non -work t r ips , a n d n on-h ome t r ips .
Th is is t h e u su a l dist inction t h a t m ost t r a n sport a t ion m odeling or gan izat ion s m a ke. Th e
t r ip pu r poses a r e docu m en t ed du r in g a la r ge t r a vel s u r vey t h a t a sk s in divid u a ls t o fill ou t
t r a vel d ia r ies for on e or t wo da ys of t r a vel. In t h e t r a vel d ia r ies, d et a iled in for m a t ion
a bou t ea ch t r ip is docu m en t ed - tim e of da y, dest ina t ion of t r ip, pur pose of t r ip, tr a vel
m odes u sed in m a kin g th e t r ips, accom pa n yin g pas sen gers , rout e t a ken , an d t ime t o
complete the tr ip.

Cr im e T ri p Gr o u p i n g s

F or cr ime t r ips, however, th ese dist inction s a r e n ot very m ea n ingfu l. Ther e is very
lit t le in for m a t ion on h ow offen der s m a k e t r ip s. On e ca n n ot ju st t a k e a sa m ple of offen der s
a n d a sk t h em t o comp let e a t r a vel dia r y a bout h ow, wh en , a n d w h er e t h e t r ip t ook pla ce.
Wit h a r r est ed offen der s, it m igh t be possible t o pr odu ce su ch a dia r y, bu t bot h m em or y
pr oblem s a s well a s lega l con cer n s qu ick ly m a k e t h is a n u n r elia ble sou r ce of in for m a t ion .
Th er efor e, a s in dica t ed in cha pt er 11, a decision h a s been m a de t o refer en ce all t r ips wit h
r espect t o t h e r esiden t ial h om e loca t ion . All cr ime t r ips a r e a n a lyzed as h om e-crim e tr ips.

H owever , oth er dis t in ction s ca n be m a de. Th e m ost obviou s is by t ype of crim e.


Th er e a r e r obber y t r ips , bu r gla r y t r ips , veh icle th eft t r ips , a n d s o fort h . Sim ila r ly,
dis t in ct ion s ca n be m a de by t r a vel t im e su ch a s a ft er n oon t r ip s or even in g t r ip s. As
m en t ion ed in ch a pt er 12, t h ou gh , t h e sa m ple size will d ecr ea se wit h gr ea t er dis t in ct ion s.
Logica lly, one can divid e a sa m ple in t o a ver y la r ge n u m ber of im por t a n t dis t in ction s (e.g.,
a ft er n oon bur gla r y tr ips involvin g two or m or e offen der s). H owever, t h is r edu ces t h e
sa m ple size a n d in cr ea ses t h e er r or in est im a t ion , p a r t icu la r ly a t t h e t r ip dis t r ibu t ion a n d
subsequent st ages.

An im p or t a n t p oin t t h a t dis t in gu is h es t h e a ggr ega t e d em a n d t yp es of t r a vel d em a n d


m odels, a s is bein g im plem en t ed h er e, a n d t h e n ewer gen er a t ion of a ctivit y-ba sed t r ips is
t h a t t h er e a r e n o linked trips with t h e a ggr egat e a ppr oa ch (F H WA, 2001a). If a n offen der
firs t st ea ls a car , t h en u se s t h e car t o rob a gr ocer y st ore followed by a bu r gla r y, t h e

13.2
a ggr egat e a ppr oa ch m odels t h is a s t h r ee sepa r a t e t r ips, ra t h er t h a n a s a ser ies of t h r ee
lin k ed crim e t r ips (which t h e a ctivit y-ba sed m odels d o). Th is is a deficien cy wit h t h e
a ggrega t e t r a vel dem a n d m odel. In ord er t o ma k e t h e a ggrega t e m odels w ork , ea ch t r ip is
con sider ed ind epen den t of a n y ot h er t r ip. Wh ile t h is is not r ea listic beh a vior a lly, since we
k n ow t h a t m a n y crim es a r e com m it t ed in sequ en ce as p a r t of a sin gle jour n ey (or t our ), th e
zona l a pp r oach does lim it t h e u n der lyin g logic of crim e t r ips . Never t h eles s, t h e a ggr ega t e
a ppr oa ch ca n be very us efu l as lon g as it imp lemen t ed con sist en t ly. With t h e cu r r en t st a t e
of a ct ivit y-ba sed m odelin g, t h er e is n ot yet a n y eviden ce t h a t t h ey p r odu ce m or e a ccu r a t e
pr edict ion s t h a n t h e cr u der , a ggr ega t e a ppr oa ch (FH WA, 2001a ).

Correlate s of Crime

An y t r ip h a s con t ext u a l cor r ela t es a ss ocia t ed wit h it . It is well d ocu m en t ed t h a t t h e


likelih ood of m a k in g a t r ip (cr im e or ot h er wise) is n ot equ a l a cross a r ea s of a m et r opolit a n
r egion. Th er e a r e a ge cor r ela t es of t r a vel, socioecon omic corr ela t es of t r a vel, a n d la n d u se
cor r ela t es of t r a vel; t h e la t t er a r e u s u a lly a s socia t ed wit h t r ip pu r p os es (e.g., r et a il a r ea s
a t t r a ct s h oppin g t r ips ).

Th e t r ip gen er a t ion m odel bein g im plem en t ed in t h is ver sion of Crim eS tat is a n


a ggr ega t e m odel. Th u s, t h e pr edictor s a r e a ggr ega t e, r a t h er t h a n beh a viora l, in n a t u r e, a s
discuss ed in cha pt er 11. Th ey ar e cor r elat es of t r ips, not n ecessa r ily t h e reason s for t h e
t r ips . For exa m ple, t ypically popu la t ion is t h e bes t pr edictor of t r ips . Zones wit h m a n y
per sons will produ ce, on a vera ge, m or e cr ime t r ips t h a n zon es wit h fewer per sons. Th e
obser va t ion is n ot a r ea son , bu t is s im ply a by-pr odu ct of th e size of t h e zone. Sim ila r ly,
low-incom e zon es will t en d t o pr odu ce, on a vera ge, m or e cr ime t r ips t h a n wea lth ier zon es;
a gain , th is is not a r ea son, but a cor r elat e of t h e ch a r a ct er ist ics t h a t m igh t con t r ibut e t o
ind ividu a l likelihoods for com m itt ing cr imes .

As ment ioned in cha pter 12, th ere ar e a nu mber of different var iables tha t could be
u sed for pr edict ion , a lt h ou gh popu la t ion (or a pr oxy for popu la t ion , s u ch a s h ou seh olds),
in com e or pover t y, a n d la n d u se va r ia bles wou ld be t h e m ost comm on (NCH RP , 1998).

The ore tic al R e le va n ce of th e Varia ble s

In gener a l, t h e var iables t h a t a r e select ed sh ou ld be emp irically st a ble an d


t h eor et ically m ea n in gful. Th a t is, t h ey sh ould be s t a ble va r ia bles t h a t do n ot ch a n ge
dr a m a t ica lly fr om yea r t o yea r . Th ey s h ou ld be r elia bly m ea su r ed so t h a t a n a n a lyst ca n
depen d on t h eir va lues . Fin a lly, th ey should be m ea n ingfu l in some wa ys. Tha t is, th ey
sh ou ld be plau sible enough t h a t bot h cr ime a n a lyst s a n d r esea r ch er s a n d in for m ed
out sid er s s h ould a gr ee t h a t t h e r ela t ions h ip is pla u sib le. Th e va r ia bles eit h er sh ould h a ve
been dem ons t r a t ed t o be pr edictor s in ea r lier r esea r ch or else t o be so corr ela t ed wit h
k n own fact ors a s t o be con sid er ed m ea n in gful p r oxies.

13.3
S p u r iou s cor r ela tes

On t h e ot h er h a n d, if a va r ia ble is eit h er a cor r ela t e of a kn own pr edict or or


idiosyn cra t ic, t h en it is lia ble n ot be believed . For exa m ple, t h e n u m ber of t a xis u su a lly
corr elat es with th e amount of employment since taxis tend t o ply comm ercial ar eas for
t h eir t r a de. Addin g t h e n u m ber of t a xis in a pr edict ive m odel is lia ble t o pr odu ce
sign ifica n t st a t is t ica l effect s in pr edict in g cr im e dest in a t ion s. H owever , few per son s a r e
goin g t o be lieve t h a t t h is is a r ea l fa ct or sin ce it is u n der st ood t o be a cor r ela t e of a m or e
st r u ctu r a l va r ia ble.

Id iosyn cra t ic va r ia bles a r e t h ose t h a t a pp ea r in u n iqu e sit u a t ion s. F or exa m ple, in


som e cit ies, a dja cen cy t o a fr eewa y is a cor r ela t e of cr im e or igin s (e.g., in Ba lt im or e Cou n t y
wh er e low incom e popu la t ion s live) wh er ea s in oth er cit ies, it is a cor r ela t e of crim e
dest in a t ion s (e.g., in H ous t on wh er e t h er e a r e fron t a ge r oad s wit h m a jor com m er cial st r ips
t h a t a t t r a ct cr im es). Th e va r ia bles m a y be r ea l p r edict or s. H owever , t h e a n a lyst or
r esea r cher will h a ve difficult y per su a din g ot h er s t o believe in t h e m odel, a t lea st u n t il t h e
results can be replicat ed.

In ot h er wor ds, wh a t is r equ ir ed for t h e m odel is a set of r ea son a ble cor r ela t es of
cr im e t r ip s t h a t wou ld be pla u sible a n d st a ble over t im e. It is a n ecologica l m odel, n ot a
beh a vior a l on e.

So ci al D is org an iza tio n Varia ble s

Th er e is a ver y la r ge lit er a t u r e on t h e pr edict or s of cr im e, t yp ica lly followin g fr om


t h e social disorga n izat ion lit er a t u r e (for exam ple, Pa r k a n d Bu r gess, 1924; Thr a sh er , 1927;
Sh a w a n d McKa y, 1942; Newm a n , 1972; E h r lich , 1975; Coh en a n d F els on , 1979; Wils on
a n d Kelling, 1982; St a ck, 1984; Messn er , 1986; Chir icos, 1987; Koh feld a n d Sp r a gue, 1988;
Bu r sik a n d Gr a sm ick, 1993; H a gan , J . & R. Pet er son, 1994; F owles a n d Mer va, 1996;
Bower s a n d H irs ch field, 1999 am on g ma n y ot h er st u dies). Much of t h is lit er a t u r e
ide n t ifies corr ela t es t h a t a r e a ss ocia t ed wit h crim e in ciden t s. Am ong t h e factor s t h a t h a ve
been a ss ocia t ed wit h crim e a n d d elin qu en cy a r e pover t y, low in com e h ous eh olds ,
overcrowd in g, su bs t a n da r d h ous in g, low ed u cat ion levels, sin gle-pa r en t h ous eh olds, h igh
un employment , minority and imm igra nt populat ions.

Mul ti coli n ea r it y a m on g t h e in d ep en d en t v a r ia bl es

There ar e two sta tistical problems associated with using these var iables as
pr edictor s. Th e first is t h e h igh degr ee of over la p bet ween t h e va r ia bles . Zones t h a t h a ve
h igh pover t y levels t yp ica lly a ls o h a ve low h ou seh old in com e levels , h igh er popu la t ion
den sit ies, s u bst a n da r d h ou sin g, a h igh per cen t a ge of r en t er s, a n d h igh er pr opor t ion of
minority and imm igra nt populat ions. In a r egression m odel, th is overlap cau ses a
condition kn own as m ulticolinearity. Essent ially, th e independent var iables corr elat e so
h ighly am ong t h em selves t h a t t h ey pr odu ce am bigu ous , an d s omet im es st r a n ge, r esu lt s in
a r egres sion m odel. For exam ple, if t wo ind epen den t var iables a r e h igh ly cor r elat ed,
fr equ en t ly on e will h a ve a positive coefficient with t h e depen den t var iable wh ile t h e ot h er

13.4
will h a ve a n ega t ive coefficien t ; conver se ly, t h ey some t im es can can cel ea ch ot h er out .
Th u s, in sp it e of t h e cor r ela t es wit h crim e levels , in a m odel it is u su a lly best t o elimin a t e
co-lin ear var iables. The result is th at simple var iables usua lly end up being the m ost
st r a igh t for wa r d t o u se (popu lat ion , med ian h ou seh old incom e) with m a n y of t h e su bt le, but
t h eor et ica lly r eleva n t , va r ia bles t yp ica lly d r oppin g ou t of t h e equ a t ion .

Fa i l u r e t o d i s ti n gu i sh or i gi n s fr om d es ti n a t i on s

Secon d, in m u ch of t h is lit er a t u r e, however , th er e is not a clear dist inction bet ween


or igin p r edict or s a n d des t ina t ion pr edict or s. Tha t is, in m ost ca ses, t h e cor r elat es of cr imes
wer e ide n t ified bu t it is oft en u n clear wh et h er t h ese cor r ela t es a r e a ss ocia t ed wit h t h e
n eigh bor h oods of t h e offen der s (or igin s) or t h e loca t ion s wh er e t h e cr im es occu r
(dest in a t ion s). Th is ca n r esu lt in a set of va gu e cor r ela t es wit h ou t clea r dir ect ion a bou t
wh et h er t h e va r ia bles a r e a ss ocia t ed wit h pr odu cing or a t t r a ctin g cond it ion s. In fact , in
mu ch of th e ear ly literat ur e on social disorgan izat ion, it was implicitly assu med th at
crim es a r e pr odu ced in t h e n eigh borh oods wh er e t h e offend er s lived , a link a ge t h a t is
in crea sin gly becom in g discon n ected. For m odelin g cr im e t r ips , however , it is es sen t ia l t h a t
t h e pr edict or s of or igin s be kept sepa r a t e fr om t h e pr edict or s of dest in a t ion s.

Accu racy an d Reliability

A tr ip gen er a t ion m odel s h ould be a ccu r a t e a n d r elia ble. Accuracy m ea n s t h a t t h e


m odel s h ou ld r eplica t e a s clos ely a s possible t h e a ct u a l n u m ber of t r ip s or igin a t in g or
en din g in zon es a n d t h a t t h er e sh ou ld be no bia s (which is a s yst em a t ic u n der - or over-
es t im a t in g of t r ips ). R eliability m ea n s t h a t t h e a m oun t of er r or is m in im ized.

Th ese crit er ia h a ve two im plica t ion s wh ich a r e somewh a t a t odds . Fir st , we ha ve to


ch oos e m od els t h a t r ep lica t e a s clos ely a s pos sible t h e n u m ber of t r ip s or igin a t in g or en d in g
in a zon e. In gener a l, th is would be a m odel t h a t h a d t h e h ighes t overa ll pr edicta bility. But ,
secon d, we h a ve t o ch oose m odels t h a t m in im ize t ot a l p r edict ion er r or s. Th is a llows a
m odel t o rep lica t e t h e n u m ber of t r ips for a s m a n y zon es a s p ossible. Th e t wo cr it er ia a r e
somewh a t con t r a dict or y beca u se crim e t r ips a r e h igh ly sk ewed. Tha t is, a h a n dful of zon es
will h a ve a lot of cr im es or igin a t in g or en din g in t h em wh ile m a n y zon es will h a ve few or
n o cr im es. Th e zones wit h t h e m ost crim es will h a ve a dis pr opor t ion a t e im pa ct on t h e fina l
m odel. Th u s, a m odel t h a t obta in s a s h igh a pr ediction a s p ossible (i.e., h igh es t log-
lik elih ood or R 2 ) m a y actu a lly on ly pr edict a ccu r a t ely for a few zon es a n d m a y be ver y
wr ong for t h e m a jor it y.

Th e st r a t egy, th er efor e, is t o obt a in a m odel t h a t ba lan ces h igh pr edict a bilit y but by
k eep in g t h e t ota l pr ediction er r or low.

Co u n t Mo de l

An ot h er elem en t of t h e m od el is th a t t h e t r ip gen er a t ion m od el is for coun ts (or


volu m es), n ot for r a t es. Th e m odel pr edicts t h e n u m ber of crim es origin a t in g in ea ch origin
zon e a n d t h e n u m ber of cr imes occu r r ing in ea ch dest ina t ion zon e. The m odel cou ld be

13.5
con s t r u ct ed t o p r ed ict r a t es , bu t n or m a lly it is n ot d on e. F or m os t t r a vel d em a n d m od elin g,
a s m en t ion ed in cha pt er 11, th e m odel pr edict s t h e n u m ber of t r ips origin a t in g or en din g in
a zon e. Th u s , t h er e is a crim e prod u ction m odel t h a t pr edict s t h e n u m ber of cr imes
or igin a t in g in ea ch zon e a n d a crim e attraction model th at predicts t he nu mber crimes

Approa ch es Tow ards Trip Gen eration Mode ling

Trip Tab le s

There ar e two classic appr oaches to tr ip genera tion m odeling. The first u ses a trip
table (som et im es called a cross -cla ss ificat ion t a ble or a cat egory a n a lysis ). A t r ip t a ble is a
cross-class ifica t ion m a t r ix. Sever a l pr edictive va r ia bles a r e divided in t o ca t egories (e.g.,
t h r ee level of h ou seh old in com e; fou r levels of veh icle own er sh ip ; t h r ee levels of popu la t ion
den sit y) an d a m ea n n u m ber of t r ips is est im a t ed, u su a lly from a su r vey. F or exa m ple , a
su r vey of h ou seh old incom e m igh t sh ow t h e r elat ion sh ip betw een h ou seh old incom e a n d
t h e n u m ber of t r ips t a ken by in dividu a ls of t h e h ou seh olds. Bas ed on a sa m ple, estim a t es
of t h e average n u m ber of trips per person can be obt a in ed for ea ch in com e level (e.g., 3.4
t r ip s per da y for per son s fr om low in com e h ou seh old s; 4.5 t r ip s per da y for per son s fr om
m edia n in com e h ou seh olds; 6.7 t r ip s per da y for per son s fr om h igh in com e h ou seh olds).
Th ese var iables a r e fu r t h er su bdivided in t o t wo-wa y or t h r ee-wa y cr oss-ta bu lat ion t a bles
(e.g., low in com e a n d m ediu m veh icle own er sh ip; low in com e a n d h igh veh icle own er sh ip).
Table 13.1 illust ra tes a possible t r ip t a ble model in volving t wo var iables. In pr a ct ice, th r ee
or four var iables are used.

Th e m a in r ea son t h a t t r ip t a bles a r e u sed in a t r ip gen er a t ion m odel is beca u se of


t h e n on -lin ea r n a t u r e of t r ip s. P r edict ive va r ia bles a r e u su a lly n ot lin ea r in t h eir effect s on
t h e n u m ber of t r ip s. Th u s, u n less a soph is t ica t ed n on -lin ea r m odel is u sed, s izea ble er r or
ca n be in t r od u ced in a pr ed ict ion . It is u su a lly s a fer t o u s e a t r ip t a ble a p pr oa ch (Or t u za r
a n d Willum sen , 2001). Ther e a r e some m a jor h a n dbooks on t h e t opic (H en scher a n d
Bu t t on , 2002; ITE , 2003). In fact, t h e In st itu t e of Tr a n sport a t ion E n gin eer s pu blishes a
lar ge ha n dbook t h a t gives exten sive tr ip pr odu ct ion a n d t r ip a t t r a ct ion t a bles by deta iled
lan d u ses (ITE, 2003). These t a bles ar e oft en u sed in for m a l environm en t a l review
pr ocesses for sit e a n a lysis a n d a r e fr equ en t ly a ccept ed by cou r t s in lit iga t ion . Th ey a r e n ot
with ou t t h eir p r oblems , however , an d t h er e h a ve been n u m er ou s crit iques of t h e t a bles
(Sh oup , 2002; NCH RP , 1998). Th ey a lso ca n n ot be u sed in a t r a vel dem a n d m odel a n d will
produce err oneous r esults.

Th e pr oblem for cr ime a n a lysis, however, is th a t it is im possible t o obt a in t h ese


da t a . On e can n ot a sk a sa m ple of offend er s h ow ma n y cr im es t h ey u n der t a k e ea ch d a y in
or der t o es t im a t e t h e m ea n exp ect a t ion s for a t a ble. Th u s, on e h a s t o a dop t a m or e in dir ect
a ppr oa ch in m odelin g cr im e pr odu ct ion s a n d a t t r a ct ion s.

A s econ d pr oblem wit h t h e t r ip t a ble a p pr oa ch is it s u se wit h zon a l d a t a . Wh ile it


cou ld be a pp lied t o zona l da t a (e.g., usin g m edia n h ous eh old in com e a n d a ver a ge veh icle
own er sh ip in t a ble 13.1 in st ea d of in divid u a l h ou seh old in com e a n d veh icle own er sh ip ),

13.6
su ch a n a ppr oa ch r equ ires int er pr et a t ion a n d some d egree of a r bitr a r ines s. For exa m ple,
h ow does one su bdivide m edia n h ous eh old in com e? On e per son m igh t in t er pr et it sligh t ly

Ta ble 13.1

Illus tration of Po ssible Trip Table Approach to Trip Gen eration


Avera ge Trips P er Adu lt, Age 16+

Hou seh old in com e

Low Med iu m H igh

0-1 3.2 4.6 6.7


V eh i c l e
Ow nership

2+ 5.4 7.8 8.1

differ en t ly t h a n a n ot h er ; u n lik e sim ple n u m er ica l cou n t s (e.g., 0 veh icle own er sh ip ; 1
veh icle own er sh ip; 2 veh icle own er sh ip), th er e is t oo m u ch va r ia bilit y in cat egorizin g
va r ia bles a t t h e zona l level. 1

OLS Re g re s si on Mo de li ng

Th e s econ d a pp r oa ch is t o u s e a regression fr a m ewor k . In t h is ap pr oa ch , t h e n u m ber


of cr im es eit h er or igin a t in g or en din g in ea ch zon e a r e est im a t ed fr om zon e ch a r a ct er is t ics
u sin g a r egr ession m odel. Th is ca n be wr it t en in a n equ a t ion :

Yi = f(X1 , X2 , X3 ,....Xk ) + , (13.1)

Th e m ea n n u m ber of crim es , Yi (eit h er or igin a t in g or en din g in zon e I), is a fu n ct ion of a


n u m ber of in depen den t va r ia bles , X1 , X2 , X3 ,....Xk for t h ese zon es; th er e a r e k in d ep en d en t
var iables, inclu din g an y con st a n t s. Ther e is also an er r or t er m wh ich r epr esen t s t h e
dis crepa n cy bet ween t h e a ctu a l obser va t ion a n d wh a t t h e m odel pr edicts . Th is is
somet imes called resid u al error sin ce it is t h e differ en ce bet ween t h e obser ved an d pr edict ed
va lu es (O i - Yi ). The fu n ct ion is un specified a n d can be non-lin ea r . 2

Th e t r a dit ion a l ap pr oa ch t o r egres sion m odeling a ssu m ed t h a t t h e indep en den t


va r ia ble a r e lin ea r in t h eir effect on t h e depen den t va r ia ble. Th u s,

Yi = " + $1 X1 + $2 X2 + $3 X3 ....+ $k Xk + , (13.2)

In t h is m odel, th er e a r e K indep en den t var iables a n d one con st a n t t er m (") t h a t n eeds t o be


est im a t ed. For ea ch zone , I, ea ch of t h e in depen den t va r ia bles h a s a weigh t a ss ocia t ed wit h

13.7
it (th e coefficien t s, $). The pr odu ct of th e va lu e of t h e in depen den t va r ia ble t im es it s w eigh t
represent s its effect. The ind ividu a l effect s of ea ch of t h e K indep en den t var iables a r e
su m m ed t o pr odu ce an overa ll est im a t e of t h e depen den t va r ia ble, Y.

Th e m et h od for est im a t in g t h is equ a t ion u su a lly m in im izes t h e su m of t h e squ a r es of


th e residua l errors. Hence, th e procedure is called Ord in ary L east S qu ares (or OLS ). If t h e
equ a t ion is cor r ect ly sp ecified (i.e., all r eleva n t va r ia bles a r e in clud ed), th e er r or t er m , ,,
will be n orm a lly dis t r ibu t ed wit h a m ea n of 0 a n d a con st a n t va r ia n ce, F2 .

P roblem s w ith OLS Reg ress ion Modelin g

H owever , t h er e a r e a n u m ber of m a jor pr oblem s a ssocia t ed wit h OLS r egr ession


m odelin g.

S k e w n e ss of c r im e ev en t s

F ir st , cr im e even t s a r e ext r em ely st a t is t ica lly s k ewed. Som e loca t ion s h a ve a m u ch


h igh er likelih ood of a cr im e even t (eit h er a n origin or a dest in a t ion) t h a n oth er s. F igu r e 13.1
below sh ows t h e n u m ber of cr imes fr om 1993 to 1997 in Balt imore Coun t y th a t occu r r ed a t
ea ch loca t ion . Tha t is, th e gra ph sh ows t h e n u m ber of inciden t s t h a t occu r r ed a t every
loca t ion , plot t ed in d ecr ea sin g or der of fr equ en cy. Thu s, t h er e wer e 7,965 loca t ion s wh er e
on ly on e cr ime occu r r ed bet ween 1993 an d 1997. Ther e wer e 2,878 loca t ion s wh er e t wo
crim es occur r ed in t h a t per iod. Th er e wer e 1,138 locat ion s wh er e t h r ee cr im es occur r ed in
t h a t per iod . At t h e ot h er en d of t h e spect r u m , t h er e wer e 332 loca t ion s t h a t h a d 10 or m or e
crim es du r in g t h e per iod a n d t h er e we r e 97 loca t ions t h a t h a d 30 or m ore crim es occur . If
we a dd t o t h is t h e very lar ge nu m ber of loca t ion s t h a t h a d n o cr imes occu r , th e u n equ a l
likelih oods of crim e by loca t ion is even m ore d r a m a t ic. In oth er words , t h e da t a a r e h ighly
sk ewed wit h r espect t o t h e fr equ en cy of cr im es. Most loca t ion s eit h er h a d n o cr im es occu r
or ver y few, wh ile a few loca t ion s h a d m a n y cr im es occu r .

Aggr egat ing cr imes int o zon es t en ds t o r edu ce som e of t h e sk ewn ess. F or exam ple,
gr ou pin g t h e cr im es by or igin t r a ffic a n a lysis zon e (TAZ) r edu ced it a lit t le bit . N in et een of
t h e 525 or igin zon es in Ba lt im or e Cou n t y a n d Ba lt im or e Cit y d id n ot h a ve a n y cr im es occu r
in t h em wh ile 15 zon es h a d on ly on e cr im e occu r . Six zon es h a d t wo cr im es or igin a t e fr om
t h em wh ile 8 zon es h a d t h r ee cr imes or igina t e fr om t h em . At t h e ot h er en d, 1 zon e h a d 738
crim es origin a t e from it a n d a n oth er zon e h a d 53 3 origin a t e from it . Of th e 525 origin
zon es, 155 h a d 100 or m or e cr im e even t s. S im ila r r esu lt s a r e fou n d for t h e dest in a t ion
zon es. Figur e 13.2 gr a ph s t h e dist r ibut ion of or igins a n d des t ina t ion s by TAZ’s in bins of 50
in cid en t s ea ch .

Skewness in t he dependent var iable usua lly ma kes th e final model biased an d
u n r elia ble. P a r t icula r ly if th e sk ewn es s is posit ive (i.e., a h a n dfu l of cas es h a ve ver y la r ge
valu es), t h e r esu ltin g regr ession coefficient s will r eflect t h e ca ses wit h t h e h igh est valu es
r a t h er t h a n r epr esen t a ll t h e ca ses wit h a ppr oxim a t ely equa l weigh t s. Thes e so-ca lled
‘ou t liers’ ca n overwh elm a r egres sion equa t ion . In a n extr em e ca se, a ver y la r ge ou t lier m a y
t ota lly det er m in e t h e m odel. For exa m ple, an exper im en t wit h 100 ca ses wa s cr ea t ed wit h a

13.8
Figure 13.1:

Frequency Distribution of Baltimore Crimes:


1993-97

8000
Number of locations

6000

4000

2000

0
0 5 10 15 20 25 30+
Number of incidents
Figure 13.2:

Skewness in Crime Origins and Destinations:


Baltimore County: 1993-97

125

100
Number of TAZs

75

50

25

0
0 100 200 300 400 500 600 700

Number of events per TAZ

Origins Destinations
progressing dependent var iable and a r a n d o m ind epen den t var iable (i.e., t h e indep en den t
var iable h a d it s valu e select ed r a n domly). The depen den t var iable pr ogres sed from 1 t o
100. F or t h e firs t 99 cases, t h e in depen den t va r ia ble t ook va lu es from 0.12 t o 9.9, ra n dom ly
a ss ign ed. Th e cor r ela t ion bet ween t h es e t wo var ia bles for t h e first 49 ca se s w a s 0.04.
H owever, for t h e 100 t h ca se, th e indep en den t var iable wa s given a valu e of 100. Th e
cor r ela t ion bet ween t h e t wo var ia bles n ow sh ot u p t o 0.17. Even t h ough t h e F -test for t h is
wa s n ot s ignifican t , it r epr esen t ed a sizea ble ju m p. Repla cing one oth er in depen den t va lu e
wit h a 50 ca u se d t h e cor r ela t ion t o ju m p t o 0.23, wh ich w a s s t a t ist ically s ign ifica n t . In
ot h er wor ds , t wo ou t lier s ca u sed a r a n dom s er ies to a pp ea r sign ifica n t !

Sk ewn ess m a k es pr edict ion difficu lt . Th e OLS m odel a ssu m es t h a t ea ch


independent var iable cont ributes to th e dependent var iable at a n a rith met ic ra te; th ere is a
con s t a n t slop e s u ch t h a t a on e u n it ch a n ge in t h e in d ep en d en t va r ia ble is a ss ocia t ed wit h a
con s t a n t ch a n ge in t h e d ep en d en t va r ia ble. Wit h s kewn es s, on t h e ot h er h a n d, s u ch a
relationsh ip will not be foun d. Large cha nges in t he independent var iable will be necessary
t o pr odu ce sm a ll ch a n ges in t h e depen den t var iable, but t h e effect is not con st a n t . In oth er
wor ds, t h e OLS m odel t ypica lly ca n n ot explain t h e n on -linea r ch a n ges in t h e depen den t
var iable. 3

N eg a t i ve p r e d i ct i on s

A s econ d pr oblem wit h OLS is t h a t it ca n h a ve n ega t ive p r ed ict ion s . Wit h a cou n t
va r ia ble, su ch a s t h e n u m ber of crim es origin a t in g or en din g in a zon e, t h e m in im u m
nu mber is zero. That is, th e coun t var iable is always positive, bein g bou n ded by 0 on t h e
lower lim it a n d some la r ge nu m ber on t h e u pper lim it. Th e OLS m odel, on t h e ot h er h a n d,
can produce negat ive predicted values since it is additive in t he independent var iables.
Th is clea r ly is illogica l a n d is a m a jor pr oblem wit h da t a t h a t a r e ver y s k ewed. If t h e m ost
com m on va lu e is close t o zero, it is ver y poss ible for a n OLS m odel t o pr edict a n ega t ive
cou n t .

N on -con sis ten t s u m m a ti on

A t h ir d pr oblem wit h t h e OLS m od el is t h a t t he s u m of t h e in p u t va lu es do n ot


n ecessa r ily equ a l t h e su m of t h e pr edict ed va lu es. Sin ce t h e est im a t e of t h e con st a n t a n d
coefficien t s is obt a in ed by m in im izin g t h e s u m of t h e s qu a r ed r es id u a l er r or s , t h er e is n o
ba la n cin g m ech a n is m t o r equ ir e t h a t t h ey a dd u p t o t h e sa m e a s t h e in pu t va lu es. F or a
t r ip gen er a t ion m odel in wh ich t h e n u m ber of pr edict ed or igin s h a s t o equ a l t h e n u m ber of
pr edict ed des t ina t ion s (after a ddin g in t h e n u m ber of pr edict ed exter n a l tr ips), t h is ca n be a
big p roblem . In ca libr a t in g t h e m od el, a dju st m en t s ca n be m a de t o t h e con st a n t t er m t o
for ce t h e s um of t h e p red ict ed va lu es to be equ a l t o t h e s um of t h e in pu t va lu es . Bu t in
a pplying t h a t con st a n t a n d coefficient s t o a n ot h er da t a set , th er e is no gua r a n t ee t h a t t h e
con sis t en cy of s um m a t ion will h old . In ot h er wor ds , t h e OLS m et h od ca n n ot gu a r a n t ee a
consistent set of predicted values.

13.11
N on -l i n ea r e ffec t s

A fou r t h p r oblem wit h t h e OLS m od el is t h a t it a s su m es t h e in d ep en d en t va r ia bles


a r e lin ea r in t h eir effect . If t h e depen den t var iable wa s n or m a l or r elat ively ba lan ced, th en
a linear model might be appr opriat e. But, when th e dependent var iable is highly skewed,
a s is seen wit h t h ese da t a , t ypically th e a dd it ive effects of ea ch com pon en t can n ot u su a lly
a ccou n t for t h e n on -lin ea r it y. In depen den t va r ia bles h a ve t o be t r a n sfor m ed t o a ccou n t for
t h e n on-lin ea r it y a n d t h e r es u lt is ofte n a com ple x equ a t ion wit h n on-in t u it ive
relationsh ips. 4 It is fa r bet t er t o u se a n on -linea r m odel for a h igh ly sk ewed dep en den t
var iable.

G r ea t e r r es id u a l er r or s

The final problem with an OLS model an d a skewed dependent var iable is th at th e
m odel t en ds t o over - or u n der -pr edict t h e cor r ect va lu es, bu t r a r ely com es u p wit h t h e
cor r ect est im a t e. Wit h sk ewed da t a , t yp ica lly a n OLS equ a t ion pr odu ces n on -con st a n t
r es idu a l er r ors . Th a t is, on e of t h e m a jor a ss u m pt ions of t h e OLS m odel is t h a t a ll r eleva n t
va r ia bles h a ve been in clud ed. If th a t is t h e cas e, t h en t h e er r ors in pr ediction (th e r esidu a l
er r or s - th e differ en ce bet ween t h e obser ved an d pr edict ed valu es) should be u n cor r elat ed
with t h e pr edict ed valu e of t h e depen den t var iable. Violat ion of t h is con dit ion is ca lled
heteroscedasticity beca u se it in dica t es t h a t t h e r esid u a l va r ia n ce is n ot con st a n t . Th e m ost
com m on t yp e is an in cr ea se in t h e r es id ua l er r or s wit h h igh er va lu es of t h e p red ict ed
depen den t va r ia ble. Th a t is , t h e r esid u a l er r or s a r e gr ea t er a t t h e h igh er va lu es of th e
pr edict ed depen den t va r ia ble t h a n a t lower va lu es (Dr a per a n d Sm it h , 1981, 147).

A h igh ly sk ewed dist r ibut ion t en ds t o en cou r a ge th is. Beca u se t h e leas t squ a r es
procedure m inimizes the sum of th e squa red residuals, th e regression line balances the
lower r esidu a ls wit h t h e h igher r esidu a ls. Th e r esu lt is a r egr ession line t h a t n eit h er fit s
t h e low valu es or t h e h igh valu es. For exa m ple, motor vehicle cr a sh es t en d t o con cen t r a t e
a t a few locat ion s (cra sh h ot s pot s). In est im a t in g t h e r ela t ion sh ip bet ween t r a ffic volu m e
a n d cr a sh es, t h e h ot s pot s t en d t o un du ly influen ce th e r egr ession line. Th e r esu lt is a line
t h a t n eit h er fit s t h e n u m ber of expect ed cr a sh es a t m ost loca t ion s (wh ich is low) n or t h e
n u m ber of expect ed cra sh es a t t h e h ot spot loca t ion s (which a r e h igh ). The line en ds u p
over -es t im a t in g t h e n u m ber of cr a sh es for m os t loca t ion s a n d u n der -es t im a t in g t h e n u m ber
of cra shes at th e hot spot locat ions.

P o is so n Re g re s si on Mo de li ng

P ois son r egr ession is a n on -lin ea r m odelin g m et h od t h a t over com es som e of th e


pr oblem s of OLS r egr ession . It is p a r t icula r ly su it ed t o coun t da t a (Cam er on a n d Tr ivedi,
1998). In t h e m odel, t h e n u m ber of even t s is m odeled a s a P oiss on r a n dom va r ia ble wit h a
pr obabilit y of occu r r en ce bein g

e -8 8Yi
P r ob (Yi ) = ------------ (13.3)
Yi !

13.12
wh er e Yi is t h e cou n t for one gr oup or cla ss , i, 8 is th e mean coun t over all groups, and e is
t h e ba se of t h e n a t u r a l logar it h m . Th e dist r ibu t ion h a s a sin gle pa r a m et er , 8, wh ich is bot h
t h e m ea n a n d t h e va r ia n ce of th e fun ction .

Th e “law of r a r e event s” a ssu m es t h a t t h e t ot a l nu m ber of event s will a ppr oxim a t e a


P ois son d is t r ibu t ion if a n even t occur s in a n y of a la r ge n u m ber of t r ia ls bu t t h e pr obabilit y
of occu r r en ce in a n y given t r ia l is sm a ll (Ca m er on a n d Tr ivedi, 1998). Th u s, t h e P ois son
dis t r ibu t ion is ver y a ppr opr ia t e for t h e a n a lysis of r a r e even t s su ch a s cr im e in cid en t s (or
m ot or veh icle cr a sh es or r a r e dis ea ses or a n y ot h er r a r e even t ). Th e P ois son m odel is n ot
pa r t icu la r ly good if t h e pr oba bilit y of a n even t is m or e ba la n ced; for t h a t , t h e n or m a l
dis t r ibu t ion is a bet t er m odel a s t h e sa m plin g dist r ibu t ion will a pp r oxim a t e n orm a lit y wit h
increa sin g sam ple size. Figu r e 13.3 illust r a t es t h e Poisson dist r ibut ion for differ en t
expected mea ns.

Th e m ea n ca n , in t u r n , be m odeled a s a fu n ct ion of som e ot h er va r ia bles (t h e


in depen den t va r ia bles ). Given a set of obser va t ions on d epen den t va r ia bles , Xk i (X1 , X2 ,
X3 ,...,XK ), th e cond itional m ean of Yi ca n be specified a s a n expon en t ia l fu n ct ion of t h e X’s:

Xki $
E (Yi / Xk i ) = 8i = e (13.4)

wh er e Xk i is a set of in depen den t va r ia bles , $ is a set of coefficient s, an d e is t h e ba se of th e


n a t u r a l loga r it h m .. Now, t h e condit ion a l m ea n (t h e m ea n cont r ollin g for t h e effect s of th e
ind epen den t var iables) is n on -linea r . Equ a t ion 13.4 is somet imes wr itt en a s

Ln (8i ) = Xk i $ (13.5)

a n d is k n own a s t h e loglinear m odel. In m ore fa m iliar n ota t ion , t h is is

Ln (8i ) = " + $1 X1 i + $2 X2 i + $3 X3 i +..........+$k Xk i (13.6)

Th a t is, t h e n a t u r a l log of th e m ea n is a fun ction of K r a n dom va r ia bles .

Note, tha t in t his form ulat ion, ther e is not a ra ndom err or t erm . The data ar e
a ssu m ed t o r eflect t h e Poisson m odel. Ther e ca n be “r esidu a l err or s”, but t h ese a r e
a ss u m ed t o reflect a n in com ple t e specificat ion (i.e., n ot in clud in g a ll t h e r eleva n t va r ia bles .
Als o, s in ce t h e va r ia n ce equ a ls t h e m ea n , it is exp ect ed t h a t t h e r es id u a l er r or s sh ou ld
in cr ea se wit h t h e con dit ion a l m ea n . Th a t is , t h er e is in h er en t h et er osceda st icit y (Ca m er on
a n d Tr ivedi, 1998). Th is is ver y d iffer en t t h a n a n OLS wh er e t h e r esid u a l er r or s a r e
expect ed t o be con st a n t .

Th e m odel is est im a t ed u sin g a m a xim u m likelih ood p r ocedu r e, t ypica lly t h e


Newt on -Ra ph son m et h od. In Appen dix C, Lu c An selin pr esen t s a m or e for m a l t r ea t m en t of
both t h e OLS a n d P oisson r egr es sion m odels , in clud in g t h e m et h ods by wh ich t h ey a r e
estimat ed.

13.13
Figure 13.3:

Poisson Distribution
For Different Expected Means

0.5

0.4 E(Y) = 0.5


Probability of X

0.3 E(Y) = 1
E(Y) = 2
0.2 E(Y) = 3
0.1 E(Y) = 4

0.0
0

12
10

14
Count
Ad v a n t a g e s o f t h e P o i s s o n R e g re s s i o n Mo d e l

Th e P oisson m odel over com es som e of t h e pr oblem s of th e OLS m odel. F ir st , t h e


P oiss on m odel h a s a m in im u m va lu e of 0. It will n ot p r edict n ega t ive va lu es. Th is m a k es it
ide a l for a dis t r ibu t ion in wh ich t h e m ea n or t h e m ost t ypica l va lu e is close t o 0. Second , t h e
P oisson is a fun da m en t a lly sk ewed m odel; t h a t is, it is n on-lin ea r wit h a long ‘r igh t t a il’.
Again, this model is appr opriat e for coun ts of ra re events, such a s crime incidents.

Th ir d, becau se t h e P oisson m odel is est im a t ed by a m a xim u m likelih ood m et h od, t h e


est im a t es a r e a da pt ed t o t h e a ct u a l d a t a . In pr a ct ice, t h is m ea n s t h a t t h e su m of th e
pr edict ed va lu es is vir t u a lly iden t ica l t o t h e su m of t h e in pu t va lu es, wit h t h e except ion of
ver y s ligh t r ou n d in g off er r or . In t h e s u bs equ en t ba la n cin g of t h e p r ed ict ed or igin s a n d t h e
pr edict ed des t ina t ion s, t h is leads t o a m or e st a ble estim a t e sin ce t h e on ly differ en ce bet ween
t h e pr edict ed or igin s a n d pr edict ed dest in a t ion s is t h e n u m ber of t r ip s t h a t com e fr om
ou t side t h e st u dy ar ea (exter n a l tr ips). Since th e exter n a l tr ips a r e a dded t o t h e pr edict ed
or igin s, t h e ba la n cin g op er a t ion is les s p ron e t o a dju st m en t er r or .

F ou r t h , com pa r ed t o t h e OLS m odel, th e Poisson m odel gener a lly gives a bet t er


est im a t e of t h e n u m ber of crim es for ea ch zone . Th e pr oblem of over - or u n der -est im a t in g
t h e n u m ber of in ciden t s for m ost zon es wit h t h e OLS m odel is u su a lly less en ed wit h t h e
Poisson, at least for crime an d oth er ra rer event s. When t he residual errors a re calculat ed,
gen er a lly t h e P oisson h a s a lower t ota l er r or t h a n t h e OLS.

In sh or t , th e Poisson m odel ha s some d esir a ble st a t ist ica l pr oper t ies t h a t m a ke it very
u se ful for pr edictin g crim e in ciden t s (origin s or dest in a t ions ).

P r ob le m s w i th th e P o is so n Re g re s si on Mo de l

On th e oth er ha nd, the Poisson m odel is not perfect. The prima ry problem is tha t
cou n t d at a a r e u s u a lly over-d ispersed .

O v er -d i s p er si on i n th e r es id u a l er r or s

In t h e P ois son dis t r ibu t ion , t h e m ea n equ a ls t h e va r ia n ce. In a P ois son r egr ession
m odel, th e m a t h em a t ica l fu n ct ion , th er efor e, equa t es t h e con dit ion a l mea n (t h e m ea n
con t r ollin g for a ll t h e pr edict or var iables) wit h t h e con dit ion a l va r ian ce. However , most r ea l
da t a ar e over-disper sed; th e var ian ce is gener a lly gr eat er t h a n t h e mea n . Figu r e 13.4 sh ows
t h e dis t r ibu t ion of Ba lt im or e Cou n t y a n d Ba lt im or e Cit y cr im e or igin s a n d Ba lt im or e Cou n t y
cr ime d est ina t ion s by TAZ (r epea t of figu r e 13.2) a n d a lso ind ica t es t h e var ian ce-t o-m ea n
r a t io of ea ch va r ia ble. F or t h e origin dis t r ibu t ion , t h e r a t io of t h e va r ia n ce to t h e m ea n is
14.7; t h a t is , t h e va r ia n ce is 14.7 t im es t h a t of t h e m ea n ! F or t h e d es t in a t ion d is t r ibu t ion ,
t h e r a t io is 401.5!

In oth er words , t h e va r ia n ce is m a n y t im es gr ea t er t h a n t h e m ea n . Most r ea l-wor ld


cou n t d at a a r e s im ila r t o t h is ; t h e va r ia n ce will u s u a lly be a lot gr ea t er t h a n t h e m ea n .
Wha t t h is m ea n s in pr a ct ice is th a t t h e r esidu a l err or s - th e differ en ce bet ween t h e obser ved

13.15
Figure 13.4: Skewness in Crime Origins and Destinations:
Baltimore County, MD 1993-97
Over-dispersion
125

100

Number of TAZs
75

50

25

0
0 100 200 300 400 500 600 700

Number of events per TAZ

Origins Destinations

Origins:
Mean = 75.8
Variance = 7848.8
Ratio of variance to mean = 14.7

Destinations:
Mean = 129.1
Variance = 51,849.1
Ratio of variance to mean = 401.5
a n d pr edict ed valu es for ea ch zon e, will be grea t er t h a n wh a t is expect ed. The P oisson m odel
ca lcu la t es a s ta n da r d er r or a s if t h e va r ia n ce equ a ls th e m ea n . Th u s, t h e s ta n da r d er r or will
be u n der est im a t ed u sin g a P ois son m odel a n d, t h er efor e, t h e sign ifica n ce t est s (t h e
coefficien t divided by t h e st a n da r d er r or) will be gr ea t er t h a n it r ea lly sh ould be. Th is would
h a ve t h e effect of iden t ifyin g var ia bles a s bein g m ore s t a t ist ically sign ifican t in a m odel t h a n
wh a t t h ey act u a lly should be. In oth er words , in a P oisson m u ltiple regress ion m odel, we
wou ld en d u p s elect in g va r ia bles t h a t r ea lly sh ould n ot be select ed beca u se we t h in k t h ey a r e
st a t is t ica lly s ign ifica n t wh en , in fa ct , t h ey a r e n ot .

Anoth er problem with t he Poisson, which is tru e for m ost of th e comm on r egression
m et h ods, is t h e la ck of a sp a t ia l pr edictor com pon en t . As m en t ion ed in cha pt er 12, in t h e
cr im e t r a vel d em a n d m odel, spa t ia l in t er a ct ion is h a n dled du r in g t h e second st a ge of th e
m odel - t r ip d ist r ibu t ion. Th u s, a n y er r ors in t r odu ced in t h e first st a ge - tr ip gen er a t ion, a r e
u su a lly comp en sa t ed for du r in g t h e secon d. N ever t h eles s, t h e in clus ion of a sp a t ia l
com pon en t in a r egr ession m odel wou ld gen er a lly im pr ove t h e pr edict ion . F or t h is ver sion of
Crim eS tat, n on -s pa t ia l m et h od s a r e u s ed for t h e fir s t st a ge.

D is pe rs io n Co rre c ti on P a ra m e te r

Th er e a r e a n u m ber of m et h ods for cor r ect in g t h e over -disper sion in a cou n t m odel.
Most of t h em in volve m odifyin g t h e a ssu m pt ion of t h e condit ion a l va r ia n ce equ a l t o t h e
con dit ion a l m ea n . F or exa m ple, t h e n ega t ive bin om ia l m od el a ss um es a P ois son m ea n bu t a
ga m m a -dist r ibu t ed va r ia n ce ter m (Cam er on a n d Tr ivedi, 1998, 62-63; Ven a bles a n d Riple y,
1997, 242-245). Th a t is , t h er e is a n u n obs er ved va r ia ble t h a t a ffect s t h e dis t r ibu t ion of th e
cou n t . Th e m odel is t h en of a P ois son m ea n bu t wit h a ‘lon ger t a il’ va r ia n ce fu n ct ion . As
a n oth er exa m ple, t h e zer o-in flat ed P oiss on m odel a ss u m es a P oiss on fun ction com bin ed wit h
a degener a t e fu n ct ion with a pr oba bilit y of 1 for zero cou n t s (Ha ll, 2000). Such m ixed
fu n ct ion m od els a r e a cu r r en t t op ic of r es ea r ch . In gen er a l, t h ou gh , t h ey a r e com p lica t ed a n d
require estima ting several par am eters.

Th er e is a sim ple cor r ect ion for over -dis per sion t h a t u su a lly wor k s (Ca m er on a n d
Tr ivedi, 1998, 63-65). The m odel pr oceed s in t wo st eps. In t h e firs t , t h e P oiss on m odel is
fit t ed t o t h e da t a a n d t h e degr ee of over - (or u n der -) dis per sion is est im a t ed. Th e dis per sion
par am eter is defined as:

1 (Yi - P i )2
M = ----------- G {----------------} (13.7)
N - K -1 Pi

wh er e N is t h e sa m ple size, K is t h e n u m ber of in depen den t va r ia bles , Yi is t h e obser ved


nu mber of events t ha t occur in zone I, an d P i is t h e pr edicte d n u m ber of even t s for zone I.
Th e t est is sim ila r t o a n a ver a ge ch i-s qu a r e in t h a t it t a k es t h e squ a r e of th e r esid u a ls (Yi -
P i ) a n d divid es it by t h e pr edict ed va lu es, a n d t h en a ver a ges it by t h e degr ees of fr eedom .
Th e disper sion pa r a m et er is a s t a n da r dized nu m ber . A valu e grea t er t h a n 1.0 in dica t es
over-disper sion wh ile a va lu e of less t h a n 1 in dica t es u n der -disper sion (wh ich is r a r e, t h ough
possible). A valu e of 1.0 in dica t es equ id ispersion (or t h e va r ia n ce equ a ls t h e m ea n ).

13.17
In t h e second st ep, t h e P ois son st a n da r d er r or is m u lt ip lied by t h e squ a r e r oot of th e
disper sion pa r a m et er t o pr odu ce a n ad ju sted stan d ard error:

SE a d j = SE * SQRT[ M ] (13.8)

Th e n ew s t a n da r d er r or is t h en u sed in t h e t -test t o pr odu ce an a dju st ed t -va lu e. Th is


a djus t m en t is fou n d in m ost P oisson r egres sion pa cka ges us ing a Gen er a lized Linea r Model
(GLM) a ppr oa ch , such a s SAS (McCu llagh a n d N elder , 1989, 200). Cam er on a n d Tr ivedi
(1998) h a ve s h own t h a t t h is a dju s t m en t p r od u ces r es u lt s t h a t a r e vir t u a lly id en t ica l t o t h a t
of t h e n ega t ive bin om ia l, bu t in volvin g fewer a ssu m pt ion s.

D i a g n o st i c T ests

Th er e a r e a n u m ber of dia gn ost ics t est s t h a t a r e u sed in a r egr ession fr a m ewor k ,


wheth er OLS, Poisson, or oth er met hods.

Ske w ne ss Tests

F ir st , t h er e a r e t es t s of sk ewn es s in t h e depen den t va r ia ble. As m en t ioned a bove, t h e


OLS m odel ca n n ot be a pplied t o da t a t h a t a r e h igh ly sk ewed. If t h ey a r e sk ewed, a n on -
linea r m odel, such a s t h e Poisson , mu st be us ed. Ther efor e, it is essen t ial t o evalu a t e t h e
degree of skewness.

A com m on ly u s ed m ea s u r e of s kewn es s is t h e g st a t is t ic (Micr osoft , 2000):

n n

Sk ewn ess (g) = -------------------- E [ ( X - Mea n X)/s ]


i
3
(13.9)
(n –1) * (n –2) I=1

wh er e n is t h e sa m ple size, X i is obs er va t ion I, Mean X is th e mean of X, an d s is t h e sa m ple


st a n da r d devia t ion (corr ect ed for degr ees of freedom ):

n ( Xi - X)2

s = SQRT [ E -------------- ] (13.10)


I=1 (n –1)

Th e st a n da r d er r or of sk ewn ess (SE S) ca n be a ppr oxim a t ed by (Ta ba ch n ick a n d


F id ell, 1996):

SES = SQRT [ --------- ] (13.11)


n

13.18
An a pp roxim a t e Z-t es t ca n be obt a in ed fr om :

g
Z(g) = ----------- (13.12)
SE S

Th u s, if Z is gr ea t er t h a n +1.96 or sm a ller t h a n -1.96, t h en t h e sk ewn ess is sign ifica n t a t t h e


p #.05 level.

As an exa m ple, for t h e d at a on t h e or igin s of cr im es by TAZ in Ba lt im or e Cou n t y:


_
X = 75.108
s = 96.017
n = 325
n

E [ ( X - Mea n X)/s ]
i
3
= 898.391
I=1

Th er efor e,

325
g = --------------- * 898.391 = 2.79
324*323

SES = SQRT [ --------- ] = 0.136


325

Z(g) = 20.51

Th e Z of th e g va lu e sh ows t h e da t a a r e h igh ly sk ewed a s w e, of cou r se , kn ew.

Li k e li h o o d R a ti o Te s t

Secon d, t h er e a r e t es t s of th e over a ll m odel. In a m a xim u m likelih ood fr a m ework , t h e


firs t t est is of t h e log-lik elih ood fu n ct ion . A lik elih ood fun ction is t h e join t den sit y of all t h e
obser va t ions , given a va lu e for t h e pa r a m et er s, $, a n d t h e va r ia n ce, F2 . Th e log-likelih ood is
t h e n a t u r a l log of t h is pr odu ct , or t h e su m of t h e logs of t h e in divid u a l d en sit ies. F or t h e
OLS model, th e log-likelihood is:

(Yi - Xk i $k )2
L = - (N/2) ln (2 B) - (N/2) ln (F ) - (½ F ) - (½) [-------------]
2 2
(13.13)
F2

13.19
wh er e N is t h e sa m ple size, F2 is t h e va r ia n ce, Yi is the observed nu mber of event s for zone I,
an d Xk i $k is a s er ies of K ind epen den t pr edict or s m u ltiplied by th eir coefficient s.

In t he Poisson m odel, th e log-likelihood is:

L= G [ -8i + Yi Xk i$k - ln Yi ! ] (13.14)

wh er e 8i is t h e con dit iona l m ea n for zone I, Yi is t h e obser ved n u m ber of even t s for zon e ii,
an d Yi Xk i $k is a cr oss-pr odu ct of t h e obs er ved even t s t im es t h e K in depen den t pr edict or s
m u ltiplied by th eir coefficient s. As m en t ion ed a bove, Luc An selin pr ovides a m or e det a iled
discus sion of t h ese fun ct ion s in App en dix C.

Sin ce th e m a xim u m likelih ood m et h od a chieves t h e m odel wit h t h e h igh es t log-


likelihood, th e log-likelihood is a n egat ive n u m ber . Even t h ou gh t h e m odel with t h e h igh est
log-lik elih ood is con sid er ed ‘bes t ’, it is n ot a n in t u it ive n u m ber . Cons equ en t ly, t h e
Lik elihood R atio com pa r es t h e log-lik elih ood of t h e r egr ession m odel wit h t h e log-lik elih ood
t h a t would be obt a in ed if on ly t h e m ea n n u m ber of cou n t s w a s t a k en . Th is la t t er log-
likelihood is:

L R = -N (Me a n Y) +[ln (Mea n Y) (GYi )] - G ln Yi ! (13.15)

The Likelihood Ratio test is:

LR = 2(L - L R ) (13.16)

where L is th e model log-likelihood an d L R is t h e log-likelihood of t h e m ea n cou n t . The


Lik elih ood Ra t io is t wice t h e differ en ce bet ween log-lik elih ood va lu es of t h e r egr ession a n d
mean models respectively. It follows a P2 dis t r ibu t ion wit h K degr ees of free dom (wher e K is
t h e n u m ber of in depen den t va r ia bles). 5

A d j u s t ed l i k e l i h oo d r a t i o

Th e Likelihood Ra t io is a m or e int u itive index sin ce it is a ch i-squ a r e t est . However ,


it is pr on e t o t h e pr oblem of a ll r egres sion m et h ods of over-fitt ing - t h e m or e indep en den t
var iables a r e a dded t o t h e m odel, th e h igh er is t h e Likelihood Ra t io. Con sequ en t ly, th er e a r e
sever a l m et h ods t h a t a dju st for t h e n u m ber of pa r a m et er s fit . On e is t h e Aka ike
Inform at ion Criterion (AIC) which is defined as:

AIC = -2L + 2 (K+1) (13.17)

wh er e L is t h e log-likelih ood a n d K is t h e n u m ber of in depen den t va r ia bles . A secon d on e is


th e Schwar tz Criterion (SC), which is defined as:

SC = 2L+[(K+1)ln (N)] (13.18)

13.20
Th es e t wo m ea su r es a dju st t h e log-lik elih ood for degr ees of freedom, a n d flip t h e sign
ar oun d. The model with th e highest AIC or SC values ar e ‘best’.

R -s q u a r e Te s t

Th e m ost fa m ilia r t est of a n over a ll m odel is t h e R-s qu a r e (or R 2 ) test. This is th e


per cen t of t h e t ot a l va r ia n ce of t h e depen den t va r ia ble a ccou n t ed for by t h e m odel. Mor e
for m a lly, it is defined a s:

G (Yi - P i )2
R2 = 1 - -------------------- (13.19)
G (Yi - Mea n Y)2

wh er e Yi is t h e obse r ved n u m ber of even t s for a zone, I, P i is t h e pr edict ed n u m ber of event s


given a set of K in depen den t va r ia bles, a n d Mea n Y is t h e m ea n n u m ber of even t s a cr oss
zon es. The R-squa r e valu e is a n u m ber fr om 0 to 1; 0 indicat es n o pr edict a bilit y while 1
in dica t es per fect pr edicta bilit y.

F or a n OLS m odel, R-squ a r e is a ver y con sist en t est ima t e. It in cr ea ses in a lin ea r
m a n n er wit h pr edicta bilit y a n d is , t h er efor e, a good indica t or of h ow effect ive one m odel is
com p ar ed to a n ot h er . As wit h a ll d ia gn os t ic t es t s, t h e va lu e of t h e R-s qu a r e in cr ea ses wit h
m or e in depen den t va r ia bles. Con sequ en t ly, R-squ a r e is u su a lly a dju st ed for degr ees of
fr eedom :

[G (Yi - P i )2 ] / (N-K+1)
2
Ra = 1- ------------------------------- (13.20)
G (Yi - Mea n Y)2 / (N - 1)

where N is the sam ple size an d K is th e num ber of independent var iables.

R -sq u a r e for P oiss on m od el

Wit h t h e P ois son m odel, h owever , t h e R-s qu a r e va lu e (wh et h er a dju st ed or n ot ) is n ot


n ecessa r ily a good m ea su r e of overa ll fit. While th e Poisson R-squ a r e var ies fr om 0 to 1,
sim ila r t o t h e OLS, it is n ot m on ot on ic. Th a t is , t h e a ddit ion of a n ew va r ia ble t o a n
equ a t ion ofte n h a s u n pr edicta ble effect s; some t im es it will in cre a se su bs t a n t ia lly a n d
som et im es it will in cr ea se on ly a lit t le in depen den t of h ow st r on g is a va r ia ble’s a ssocia t ion
wit h t h e depen den t va r ia ble (Mia ou , 1996). Th is in con sis t en cy com es fr om t h e
decomposition of th e tota l sum of squar es:

G (Yi - Mea n Y)2 = G(Yi - P i)2 + G(P i - Mea n Y)2 + 2 G(Yi - P i)(P i - Mea n Y) (13.21)

Th e fir st t er m in t h e equ a t ion is t h e r esid u a l s u m of squ a r es (or er r or t er m ) wh ile t h e secon d


t er m is t h e explain ed su m of squ a r es. In a n OLS m odel, th e t h ird t er m is zero if a n int er cept
is in clud ed (Cam er on a n d Tr ivedi, 1998, 153). H en ce, th e t ota l su m of squ a r es is br oken in t o
t wo pa r t s - t h a t wh ich is exp la in ed a n d t h a t wh ich is u n exp la in ed. H owever , for t h e P ois son

13.21
a n d oth er n on -linea r r egres sion m et h ods, t h e las t t er m is not zer o. Con sequ en t ly, a t est t h a t
com p a r es t h e exp la in ed su m of s qu a r es t o t h e t ot a l s u m of s qu a r es will n ot p r od u ce
con sis t en t r es ult s .

Cons equ en t ly, alt er n a t ive R-squ a r e m ea su r es a r e somet im es u sed. On e of t h ese is


Devian ce R -squ are. It is defined as:

G[Yi * Ln{ P i / Mea n Y } – (Yi - P i ) ]


2
RD = 1 - ------------------------------------------------ (13.22)
G [Yi * Ln{ Yi / Mea n Y

wh er e Yi is t h e obse r ved n u m ber of even t s for ea ch zon e, I, P i is t h e pr edict ed n u m ber of


event s for ea ch zon e bas ed on K ind epen den t pr edict or s, an d Mea n Y is t h e m ea n n u m ber of
even t s a cross a ll zones.

Th e Devia n ce R-s qu a r e m ea su r es t h e r edu ct ion in t h e Lik elih ood Ra t io d u e t o t h e


in clus ion of pr edictor va r ia bles . It pr odu ces a sligh t ly differ en t R-squ a r e, one t h a t is
t yp ica lly h igh er t h a n t h e t r a dit ion a l R-squ a r e. Wh er ea s t h e t r a dit ion a l on e m igh t n ot sh ow
a lar ge in cr ea se u pon t h e int r odu ct ion of a n ind epen den t var iable, th e Devia n ce R-squ a r e
often does show the increase.

Ne vert h eless, it h a s p r oblem s, t oo. Miaou (1996) a r gues t h a t t h er e is n ot a sin gle R-


sq u a r e in dex t h a t is p er fect ly con sis t en t a n d s u ggest s t h a t u se r s n eed t o us e m u lt iple ones.
Th er e a r e ot h er R-s qu a r e va lu es t h a t h a ve been pr oposed, bu t t h ese t wo a r e su fficien t for
n ow. In s h or t , a u ser m u st look a t bot h a s a n ind ica t or of h ow good is a m odel com pa r ed t o
a n oth er m odel.

D is pe rs io n P a ra m e te r

F ina lly, in t h e Poisson m odel on ly, th e disper sion pa r a m et er ind ica t es t h e exten t t o
wh ich t h e var ian ce is differ en t fr om t h e m ea n . This wa s defined in equa t ion 13.7 above.

Coefficien ts, Stan dard Errors, and Sign ifican ce Tes ts

Th e secon d t ype of diagn ost ic t est s a r e t h ose for t h e in dividu a l pr edictor s in t h e


model. In both th e OLS an d Poisson m odels, th ere ar e thr ee tests:

1. Th e coefficient . This ind ica t es t h e ch a n ge in t h e depen den t var iable a ssociat ed


wit h t h e cha n ge in t h e in depen den t va r ia ble. In t h e cas e of t h e OLS , it is a
lin ea r t er m (i.e., th e va lu e of t h e depen den t va r ia ble is m u lt iplie d by t h e
coefficien t ) wh ile in t h e P oisson m odel, it h a s t o be con ver t ed by r a isin g t h e
pr odu ct t o an exponen t ia l t er m (i.e., e$X ).

2. Th e st a n da r d er r or . E a ch est im a t ed coefficien t in a m odel a ccou n t s for som e of


t h e va r ia n ce in t h e depen den t va r ia ble. Th is va r ia n ce is t h e con t r ibu t ion of

13.22
t h e pa r t icula r in depen den t va r ia ble t o th e va r ia n ce of th e depen den t va r ia ble.
Th e squ a r e r oot of t h a t var ian ce is th e stan d ard error.

3. Th e sign ifica n ce level. Th e r a t io of t h e coefficien t t o t h e st a n da r d er r or


pr odu ces a significa n ce t est of t h e coefficient . In t h e OLS m odel, it is a t -t est
wit h N-K-1 de gr ees of free dom wh er ea s in t h e P oiss on m odel it is a n
a sym pt ot ic t -t est , which is effect ively a Z-t est . The a ppr opr iat e t a bles (t -t est
or st a n da r d n or m a l) p rod uce a pp roxim a t e p roba bilit y levels of a Typ e I er r or
(t h e lik elih ood of fals ely r eject in g a t r u e n u ll h yp ot h esis of no r ela t ion sh ip ).

Testing for Multicoline arity

On e of t h e m a jor pr oblem s wit h a n y r egr ession m odel, wh et h er OLS or P oiss on, is


m u lt icolin ea r it y a m ong t h e in depen den t va r ia bles . In t h eory, ea ch in depen den t va r ia ble
sh ou ld be st a t is t ica lly in depen den t of t h e ot h er in depen den t va r ia bles. Th u s, t h e a m ou n t of
va r ia n ce for t h e depen den t va r ia ble t h a t is a ccoun t ed for by ea ch in depen den t va r ia ble
sh ould be a u n iqu e con t r ibu t ion . In pr a ctice, h owever , it is r a r e t o obt a in com plet ely
in depen den t pr edictive va r ia bles . More like ly, tw o or m ore of th e in depen den t va r ia bles will
be cor r ela t ed . Th e effect is t h a t t he es t im a t ed st a n d a r d er r or of a p r ed ict or va r ia ble is n o
lon ger u n ique sin ce it sh a r es some of t h e var ian ce with ot h er ind epen den t var iables. The
gr ea t er t h e com m un ality of t h e var ian ces, th e more a m bigu ou s t h e pr edict ed effect s. If t wo
va r ia bles a r e h igh ly cor r ela t ed, it is n ot clea r wh a t con t r ibu t ion ea ch m a k es t owa r ds
pr edict ing t h e depen den t var iable. In effect , mu lticolinea r ity m ea n s t h a t var iables a r e
m ea su r in g t h e s am e effect .

Mu lt icolin ea r it y a m ong t h e in depen den t va r ia bles can pr odu ce very st r a n ge effects in


a r egr ession m odel. Am ong t h ese effects a r e: 1) If two ind epen den t va r ia bles a r e h ighly
cor r ela t ed , bu t on e is m or e cor r ela t ed wit h t h e d ep en d en t va r ia ble t h a n t h e ot h er , t h e
st r on ger one will u su a lly h a ve a cor r ect sign wh ile t h e wea ker on e will somet imes get flipped
a r ou n d (e.g., fr om positive to negat ive, or t h e r evers e); 2) Two var iables can ca n cel each oth er
ou t ; ea ch coefficient is significa n t wh en it a lon e is inclu ded in a m odel but n eith er a r e
sign ifica n t wh en t h ey a r e t oget h er ; 3) On e in depen den t va r ia ble ca n in h ibit t h e effect of
a n ot h er cor r elat ed ind epen den t var iable so th a t t h e secon d var iable is n ot significa n t wh en
com bin ed wit h t h e firs t one; a n d 4) If t wo ind epen den t va r ia bles a r e vir t u a lly per fect ly
cor r elat ed, m a n y regr ession r ou t ines br ea k d own beca u se t h e m a t r ix ca n n ot be inver t ed.

All t h es e effect s in d ica t e t h a t t h er e is n on -in d ep en d en ce a m on g t h e in d ep en d en t


var iables. Aside from pr odu cing con fu sin g coefficient s, m u lticolinea r ity ca n overs t a t e t h e
a m oun t of pr edict ion in a m odel. Sin ce ever y in depen den t va r ia ble a ccoun t s for som e of th e
var ian ce of t h e depen den t var iable, with m u lticolinea r ity, th e overa ll m odel will a ppea r t o
improve when it pr obably ha sn’t.

Toler a nce t est

A u s er h a s t o be a wa r e of t h e p r oblem of m u lt icolin ea r it y a n d s eek t o m in im ize it .


Th e sim plest solu t ion is t o dr op var iables t h a t a r e co-linea r with ot h er ind epen den t var iables

13.23
a lrea dy in t h e equa t ion . A r elat ively sim ple t est for a sses sin g th is is ca lled t olera n ce.
Tolera nce is defined as lack of predictability of ea ch ind epen den t var iable by th e ot h er
in depen den t va r ia bles, or :

Tol = 1 - (R ijk ..l )2 (13.23)

wh er e (R ijk ..l )2 is t h e R-sq u a r e of a n equ a t ion wh er e in depen den t va r ia ble I is p r edicte d by t h e


oth er in depen den t va r ia bles , j, k, l, an d s o fort h . Th a t is, ea ch in depen den t va r ia ble in t u r n
is r egres sed a gain st t h e ot h er ind epen den t var iables in t h e equa t ion . The R 2 a ss ocia t ed wit h
t h a t m odel is s u bt r a cted fr om 1. Th e h igher t h e t oler a n ce level, th e les s a pa r t icula r
independent var iable shar es its var iance with th e oth er independent var iables.

F i x e d Mo d e l v s . S t e p w i s e Va r ia b le S e l e c t i o n

Th er e a r e sever a l st r a t egies design ed t o red u ce mu lt icolin ea r it y in a m odel. On e is t o


st a r t with a defin ed m odel a n d elimin a t e t h ose var iables t h a t h a ve a low t olera n ce. The
t ot a l model is est ima t ed a n d t h e coefficient s for ea ch of t h e var iables a r e est ima t ed a t t h e
sam e time. This is somet imes called a fixed m od el. Then , va r iables t h a t a r e co-linea r a r e
r em oved fr om t h e equ a t ion , a n d t h e m odel is r e-r u n .

An ot h er s t r a t egy is t o es t im a t e t h e coefficien t s a st ep a t a t im e, a p r oced u r e k n own a s


stepwise r egres sion . Ther e a r e severa l st a n da r d st epwise pr ocedu r es. In t h e firs t pr ocedu r e,
va r ia bles a r e a dded one a t a t im e (a forw ard selection m odel). Th e in depen den t va r ia ble
h a vin g t h e st r ongest lin ea r cor r ela t ion wit h t h e depen den t va r ia ble is a dd ed firs t . Next , t h e
independent var iable from t he rem aining list of independent var iables with th e highest
cor r ela t ion wit h t h e depen den t va r ia ble, con trollin g for t h e one va r ia ble a lr ea dy in t h e
equ a t ion , is a dded n ext a n d t h e m odel is re-est ima t ed. In ea ch st ep, th e indep en den t
va r ia ble wit h t h e h igh est cor r ela t ion wit h t h e depen den t va r ia ble con t r ollin g for t h e
va r ia bles a lr ea dy in t h e equ a t ion is a dd ed t o th e m odel, a n d t h e m odel is r e-est im a t ed. Th is
pr oceeds u n t il eith er a ll t h e indep en den t var iables a r e a dded t o t h e equa t ion or else a
st oppin g cr it er ion is m et . Th e u su a l cr it er ion is on ly va r ia bles wit h a cer t a in sign ifica n ce
level ar e a llowed t o en t er (ca lled a p-to-en ter).

A back w ard elim ina tion procedure work s in r everse. All independent var iables are
in it ia lly a dd ed t o th e equ a t ion. Th e va r ia ble w it h t h e we a k es t coefficien t (as define d by t h e
sign ifican ce level) is r em oved, an d t h e m odel is r e-est im a t ed. Next , t h e va r ia ble wit h t h e
wea k est coefficien t in t h e secon d m odel is r em oved, an d t h e m odel is r e-est im a t ed. Th is
pr ocedu r e is rep ea t ed u n t il eith er t h er e a r e n o m or e indep en den t var iables left in t h e m odel
or else a s t oppin g cr iter ion is m et . The u su a l cr iter ion is t h a t a ll r em a inin g va r iables pa ss a
cer t a in s ign ifica n ce level (ca lled a p-to-rem ove).

Th er e a r e com bin a t ion s of t h ese pr ocedu r es, for exa m ple a dd in g a va r ia ble in a
for wa r d s elect ion m a n n er bu t t h en r em ovin g a n y va r ia bles th a t a r e n o lon ger sign ifica n t or
u sin g a ba ckwa r d elim in a t ion pr ocedu r e bu t a llowin g new va r ia bles t o ent er t h e m odel if
t h ey s u dden ly becom e sign ifica n t .

13.24
Th er e a r e a dva n t a ges t o ea ch a ppr oa ch . A fixed m odel a llows sp ecified var iables t o be
in clu d ed . If eit h er t h eor y or p r eviou s r es ea r ch h a s in d ica t ed t h a t a p a r t icu la r com bin a t ion of
var iables is im port a n t , th en t h e fixed m odel a llows t h a t t o be tes t ed. A st epwise pr ocedu r e
m igh t dr op on e of t h ose va r ia bles. On t h e ot h er h a n d, a st epwis e pr ocedu r e u su a lly ca n
obta in t h e sa m e or h igher pr edicta bilit y t h a n a fixed p r ocedu r e (wh et h er pr edicta bilit y is
m ea su r ed by a log-lik elih ood or a n R-squ a r e).

Wit h in t h e st epwis e pr ocedu r es, t h er e a r e a ls o a dva n t a ges a n d dis a dva n t a ges t o ea ch


m et h od, th ou gh t h e differ en ces a r e gener a lly very sm a ll. A for wa r d selection pr ocedu r e a dds
va r ia bles one a t a t im e. Th u s, t h e con t r ibu t ion of each n ew va r ia ble ca n be s een . On t h e
ot h er h a n d, a va r ia ble t h a t is sign ifica n t a t a n ea r ly st a ge cou ld becom e n ot sign ifica n t a t a
la t er st a ge beca u se of t h e u n iqu e com bin a t ion s of va r ia bles. Sim ila r ly, a ba ckwa r d
elimina t ion pr ocedu r e will en su r e t h a t a ll var iables in t h e equa t ion m eet a specified
significa n ce level. But , th e con t r ibut ion of ea ch var iable is n ot ea sily seen ot h er t h a n
t h r ou gh t h e coefficient s. In pr a ct ice, on e u su a lly obt a ins t h e sa m e m odel with eith er
pr ocedu r e, so t h e differ en ces a r e n ot t h a t crit ical.

A st epwise pr ocedu r e will n ot gua r a n t ee t h a t m u lticolinea r ity will be rem oved


en t irely. H owever, it is a good pr ocedu r e for n a r r owing down t h e var iables t o t h ose t h a t a r e
significa n t . Then , an y co-linea r var iables can be dr opped m a n u a lly a n d t h e m odel r e-
est ima t ed. In t h e Crim eS tat t r ip gen er a t ion , bot h a fixed m odel a n d a ba ckwa r d elim in a t ion
procedure ar e allowed.

Altern at iv e Re gre ss io n Mode ls

Th er e a r e a n u m ber of a lt er n a t ive m et h ods for est im a t in g t h e lik ely va lu e of a cou n t


given a set of in depen den t pr edictor s. Th e n ega t ive bin omia l h a s a lr ea dy been m en t ioned.
Th er e a r e a n u m ber of va r ia t ion s of t h ese in volvin g d iffer en t a ssu m pt ion s a bou t t h e
dis per sion t er m . Th er e a r e a lso a n u m ber of differ en t P oisson-t ype m odels . Am ong t h es e a r e
t h e zer o-in flat ed P oisson (or ZIP ; ; H a ll, 2000), th e Weibu l fun ction , t h e Ca u chy fu n ction , a n d
t h e lognorm a l fun ction (see NIS T 2004 for a list of com m on n on-lin ea r fun ction s).

Th er e a r e a lso a set of sp a t ia l r egr ession t ype m odels t h a t cor r ect for sp a t ia l


a u t ocorr ela t ion in t h e depen den t va r ia ble, su ch a s geogr a ph ically-weigh t ed r egr ession u sin g
a P oiss on fun ction (Foth er in gh a m , Br u n sd on, a n d Ch a r lt on, 2002), a h ier a r chical Ba yesia n
m odel (Cla yt on a n d Ka ld or , 1987), a n d a Ma r k ov Ch a in Mon t e Ca r lo s im u la t ion m et h od
(Miou w, Son g, a n d Ba lilick, 2003).

In fut ur e versions of Crim eS tat, s ever a l of t h ese m et h ods will be in t r odu ced. F or t h e
t ime bein g, t h ou gh, t h e Poisson m odel is ava ilable as it is t h e m ost com m on ly u sed
fu n ct ion a l m od el for fit t in g cou n t d at a .

Ad d in g S pe c ia l Ge n e ra to rs

In a t r a vel dem a n d m odel, th er e a r e special gen erators. These a r e u n ique la n d u ses


or en vir on m en t s t h a t p r od u ce a n ext r a la r ge n u m ber of t r ip s. F or r egu la r t r a vel d em a n d

13.25
m odelin g, st a diu m s, a ir por t s, t r a in st a t ion s, la r ge pa r k s, a n d ‘m ega -m a lls’ gen er a t e m or e
t h a n t h eir sh a r e of t r ip s, or a t lea st t h a n wh a t wou ld be pr edict ed by t h e a m ou n t of
em p loym en t a t t h os e loca t ion s . Th ey a r e u s u a lly a t t r a ct or s , n ot p r od u cer s . In a n or m a l
t r a n spor t a t ion t r a vel d em a n d m odel, t h ese zon es a r e exclu ded fr om t h e cr oss-cla ssifica t ion
a n d in depen den t est im a t es a r e m a de of t h em .

F or cr im e t r ips , t h er e a r e a lso special gen er a t ors . Typically, t h es e a r e zones t h a t h a ve


m or e cr im es bein g a t t r a ct ed t o t h em t h a n a r e exp ect ed on t h e ba sis of t h e popu la t ion a n d
em ploym en t a t t h ose loca t ion s.

Sin ce we a r e u sin g a r egr es sion m odel t o est im a t e t h e pr odu ction s a n d a t t r a ction s, a


sim ple wa y to model a sp ecial gener a t or is t o cr ea t e a sim ple d u m m y var iable. This is a
va r ia ble wh er e zones wit h t h e special gen er a t or get a va lu e ‘1' a n d zones wit h out t h e special
gen er a t or get a ‘0'. E ssen t ia lly, t h e va r ia ble is a cr oss-cla ssifica t ion of t h e specia l gen er a t or
vers u s every oth er zon e.

On e h a s t o be ca u t iou s is doin g t h is , h owever . Typ ica lly, specia l gen er a t or s a r e


iden t ified by ha vin g a gr ea t er n u m ber of cr imes being a t t r a ct ed t o a zon e t h a n is pr edict ed
by th e m odel. In oth er wor ds, t h ey ha ve a gr ea t er positive residu a l err or (obser ved -
pr edict ed) an d a r e ‘ou t liers’ in t h e r esidu a l err or dist r ibut ion . By a ddin g a va r iable t o
expla in t h ose ca se s, t h e r es idu a l er r or d ecr ea se s.

Bu t , in doin g so, we ar en ’t r ea lly explain ing wh y th e zon e h a s m or e cr imes t h a n


expe cted, bu t sim ply h a ve a ccoun t ed for it by pu t t in g in a n em pir ical va r ia ble. In r e-ru n n in g
th e model, th ere will be, usu ally, new out liers t ha t h ave a great er positive residual err or. If
t h is logic is t o be rep ea t ed, th en we wou ld cr ea t e n ew special gener a t or s for t h ose zon es a n d
r e-est ima t e t h e m odel. If con t inu ed wit h ou t lim its , even t u a lly t h er e wou ld n ot be a m odel
a n ym ore bu t just a collect ion of du m m y va r ia bles , one for ea ch zon e.

Th er efor e, a u ser sh ould be cau t iou s in int r odu cing special gen er a t ors . It is gen er a lly
a lr igh t t o in t r odu ce a few for t h e t r u ly except ion a l zon es. Th ese a r e zon es wh er e it is logica l
t o tr ea t t h em a s s pecial gen er a t ors a n d w h er e one wou ld expect cont in u it y over t im e. In
oth er wor ds , t h ey sh ould be u se d if t h e special gen er a t or s t a t u s is expect ed t o las t over t im e.
F or exa m ple, a st a diu m or a n a ir por t or a t r a in st a t ion is lia ble t o r em a in a t it s loca t ion for
m a n y yea r s. A pa r t icu la r sh oppin g m a ll, on t h e ot h er h a n d, m a y a t t r a ct cr im es a t on e
pa r t icu la r poin t in t im e bu t n ot n ecessa r ily in t h e fu t u r e. U n less it is a m a ll t h a t is so m u ch
la r ger t h a n a n y ot h er m a ll in t h e r egion (a ‘m ega -m a ll’), it s h ou ld n ’t be given a sp ecia l
gen er a t or s t a t u s.

Ad din g Ex te rn a l Trip s

E xt er n a l t r ips a r e, by d efin it ion, t r ips t h a t com e from ou t sid e t h e r egion. Th ey a r e


pa r t of t h e origin /pr odu ction m odel in t h a t t h es e a r e t r ips t h a t a r e n ot a ccoun t ed for by t h e
m odel. Ther e a r e a lso t r ips t h a t or igina t e with in t h e st u dy ar ea , but en d out side t h e a r ea ;
h owever , t h ose a r e u su a lly n ot m odeled s in ce th e focus will be on t h e st u dy a r ea it se lf. In t h e
u su a l t r a vel dem a n d fr a m ewor k , ext er n a l t r ips a r e t h ose com in g fr om m a jor cor r idor s in t o

13.26
t h e r egion. Est im a t es of t h e t r a vel on t h es e cor r idor s a r e obt a in ed by cord on cou n ts, cou n t s
of veh icles comin g in t o t h e r egion a n d lea vin g t h e r egion (n et in flow). E st im a t es of fut u r e
gr owt h of t h os e ext er n a l t r ip s h a s t o ba s ed on exp ect a t ion s of fu t u r e p op u la t ion gr owt h t h e
met ropolita n r egion a nd in near by regions.

For crime tr ips, extern al tr ips ar e defined as tr ips th at originat e out side the stu dy
a r ea . But t h ey mu st be est ima t ed by th e differ en ce bet ween t h e t ot a l nu m ber of cr imes
occur ring in t he destinat ion st udy area an d th e tota l originat ing in t he origin zones. That is,
of a ll t h e cr im es occu r r in g in t h e s t u dy a r ea , t h e or igin zon es a r e m od eled . Th os e t r ip s t h a t
or igin a t e fr om ou t s id e t h e or igin zon es a r e ext er n a l t r ip s. Th ey m u s t be a d ded t o t h e
pr edict ed n u m ber of or igin t r ip s t o pr odu ce a n a dju st ed est im a t e of t ot a l or igin s, or :

Oj = Opi + Oe (13.24)

wh er e O j is t h e t ota l n u m ber of crim e origin s for crim es com m it t ed in st u dy a r ea , j, Op i is t h e


tota l nu mber of crimes originat ing in t he origin zones, I, an d O e is th e t ot a l n u m ber of
crim es origin a t in g out sid e t h e r egion, e.

In oth er words , for t h e pr odu ction (or igin) model on ly , we add a n exter n a l zon e t o
a ccoun t for crim e t r ips t h a t origin a t ed out sid e t h e m odeled r egion. If we d on’t do t h a t , in t h e
ba lan cing st ep, we’ll overes t ima t e t h e n u m ber of cr imes or igina t ing in ea ch zon e beca u se t h e
pr edicted origin s will be m u lt iplied by a fact or t o ens u r e t h a t t h e t ota l n u m ber of origin s
equa ls th e tota l nu mber of destinat ions.

Not in clu din g t h e ext er n a l t r ip s ca n lea d t o bia s in t h e m odel. If t h e n u m ber of


exter n a l tr ips is a sizeable per cen t a ge of a ll cr ime origins occu r r ing in t h e st u dy ar ea , th en
t h e coefficient s of t h e or igin m odel cou ld be mislea din g. In pr a ct ice, most t r a vel dem a n d
m odeler s a ssu m e t h a t if t h e per cen t a ge of exter n a l tr ips is n ot grea t er t h a n 5%, th er e
u su a lly is lit t le bia s in t r odu ced (Or t u za r a n d Willu m sen , 2001). If it is gr ea t er t h a n 5%,
t h en origin zon es from a dja cent jur isd ictions n eed t o be inclu ded in t h e origin m odel.

B a l a n c i n g P r e d i c t e d O ri g i n s a n d P r e d i c t e d D e s t i n a t io n s

Th e t r ip gen er a t ion ‘m odel’ is a ctu a lly two sep a r a t e m odels: 1) a m odel of t r ips
pr odu ced by ever y zon e a n d 2) a m odel of t r ips a t t r a cted t o every zon e. Sin ce a t r ip h a s a n
or igin a n d a d es t in a t ion (by d efin it ion ), t h en t h e t ot a l n u m ber of p r od u ct ion s m u st equ a l t h e
tota l nu mber of at tr actions:
n n

GO i = GD j (13.25)
I=1 j=1

wh er e O is a t r ip or igin , D is a t rip destina tion, and I an d j ar e zone nu mbers.

To en su r e t h a t t h is equ a lit y is t r u e, a ba la n cing oper a t ion is con du cte d. E ss en t ia lly,


t h is m ea n s m u lt ip lyin g eit h er t h e n u m ber of pr edict ed or igin s in ea ch or igin zon e or t h e

13.27
n u m ber of pr edict ed dest in a t ion s in ea ch dest in a t ion zon e by a const a n t wh ich is t h e r a t io of
eith er th e tota l destinat ions t o th e tota l origins (to mu ltiply th e num ber of predicted origins)
or t h e r a t io of t h e t ot a l or igin s t o t h e t ot a l d es t in a t ion s (t o m u lt ip ly t h e n u m ber of p red ict ed
destinat ions).

With cr ime a n a lysis, t h e n u m ber of dest ina t ion s would gener a lly be con sider ed a
m or e r eliable da t a set t h a n t h e n u m ber of or igins . Beca u se crim es a r e enu m er a t ed wh er e
t h ey occu r , th e n u m ber of cr imes occu r r ing a t a n y on e loca t ion is m or e a ccu r a t e t h a n t h e
loca t ion of t h e offen d er s . Th u s , we a d ju s t t h e p r ed ict ed or igin s so t h a t t h ey equ a l t h e
predicted destinat ions. 6

S u m m a ry o f t h e Tr ip Ge n e r a t io n Mo d e l

In sum ma ry, th e trip genera tion m odel is estimat ed in four steps:

1. A m odel of t h e pr edict or s of t h e n u m ber of cr im es or igin s (a cr im e pr odu ct ion


m odel);
2. A m od el of p red ict or s of t h e n u m ber of cr im e d es t in a t ion s (a cr im e a t t r a ct ion
m odel);
3. Extern al tr ips ar e estima ted an d added to th e num ber of predicted origins as
a n ext er n a l zon e; a n d
4. Th e t ot a l n u m ber of pr edict ed cr im e or igin s is ba la n ced t o be equ a l t o t h e t ot a l
nu mber of predicted crime destinat ions.

Th e Cr i m eS t a t Tr ip Ge n e r a t io n Mo d e l

In t h is s ection , we describe t h e t r ip gen er a t ion m odel im plem en t ed in Crim eS tat. As


m en t ioned a bove, t h is s t ep in volves calibr a t in g a r egr es sion m odel a ga in st t h e zona l da t a .
Two sepa r a t e m odels a r e developed, on e for t r ip or igins a n d one for t r ip dest ina t ion s. The
depen den t va r ia ble is t h e n u m ber of crim es origin a t in g in a zon e (for t h e t r ip or igin m odel)
or t h e n u m ber of cr im es en d in g in a zon e (for t h e t r ip des t in a t ion m od el). Th e in d ep en d en t
va r ia bles a r e zon a l va r ia bles t h a t m a y p r edict t h e n u m ber of or igin s or dest in a t ion s.

Th er e a r e t h r ee s t eps t o th e m odel, ea ch cor r espon din g t o a s epa r a t e t a b in


Crim eS tat:

1. Ca libra t e t h e m odel
2. Ma k e a pr ediction
3. Ba la n ce t h e pr edict ed or igin s a n d t h e pr edict ed dest in a t ion s

F igur e 13.5 s h ows a n im a ge of t h e t r ip gen er a t ion m odel pa ge wit h in Crim eS tat. The
t r ip gen er a t ion m odel is m a de u p of th r ee sepa r a t e pa ges (or t a bs):

13.28
Figure 13.5:
Trip Generation Module
1. A Calibrate m od el pa ge in wh ich a r egr ession m odel can be r u n t o est im a t e
eith er an origin (production) model or a destinat ion (at tr action) model;

2. A M ak e pred iction pa ge in wh ich t h e est ima t ed coefficient s can be ap plied t o


th e same or a different dat a set a nd in which t he extern al tr ips can be added to
t h e pr edict ed or igin s; a n d

3. A B alan ce pred icted origin s & d estin ation s pa ge in wh ich t h e t ot a l pr edict ed


or igin s ca n be a dju st ed t o equ a l t h e t ot a l p r edict ed dest in a t ion s.

Ca li bra te Mo d e l

In t h e fir st st ep, m odels a r e ca libr a t ed u sin g t h e in pu t da t a . Th er e is a m odel for t h e


origin zones an d an oth er model for t he destinat ion zones. The user should indicat e what type
of m od el is bein g r u n in or d er t o m a k e t h e ou t p u t m or e clea r (it is n ot es sen t ia l bu t ca n
m in im ize confu sion fr om m is la belin g).

Da ta Fi le

Th e da t a file is inpu t a s eit h er t h e pr ima r y or secon da r y file. Specify whet h er t h e


da t a file is t h e pr im a r y or s econ da r y file.

Ty pe o f Mo de l

Specify wh et h er t h e m odel is for or igin s or dest in a t ion s. Th is will be pr in t ed ou t on


t h e ou t pu t h ea der .

De pe n de n t Varia ble

Select t h e depen den t va r ia ble fr om t h e list of va r ia bles. Th er e ca n be on ly on e


depen den t va r ia ble per m odel.

Skew ness Diagnostics

If ch ecked , th e r ou t ine will test for t h e sk ewn ess of t h e depen den t var iable. The
out put includes:

1. Th e “g” st a t ist ic
2. Th e st a n da r d er r or of t h e “g” st a t ist ic
3. Th e Z valu e for t h e “g” st a t ist ic
4. Th e pr oba bilit y level of a Typ e I er r or for t h e “g” st a t is t ic
5. Th e r a t io of t h e sa m ple var ian ce t o t h e sa ple m ea n

E r r or m essa ges indicat e wh et h er t h er e is pr oba ble skewn ess in t h e depen den t


va r ia ble. If t h er e is s k ewn ess, u se a P oiss on r egr ession m odel.

13.30
In d e pe n d e n t v ari ab le s

Select ind epen den t var iables from t h e list of var iables in t h e da t a file. Up t o 15
var iables can be selected.

Mi ss in g va lu e s

Specify an y miss ing valu e codes for t h e var iables. Blan k r ecor ds will au t om a t ica lly be
con sid er ed a s m iss in g. If an y of t h e select ed depen den t or in depen den t va r ia bles h a ve
m iss in g valu es, t h ose r ecor ds will be excluded fr om t h e a n a lysis .

Ty pe o f R e gre s si on Mo de l

Specify t h e t yp e of r egr ession m odel t o be u sed. Th e defa u lt is a P ois son r egr ession
wit h over -dis per sion cor r ect ion . Ot h er a lt er n a t ives a r e a P ois son r egr ession a n d a n
Or din a r y Lea st Squ a r es r egr ession .

Type of Re gre ss ion P roc e du re

Sp ecify wh et h er a fixed m odel (all select ed in depen den t va r ia bles a r e u sed in t h e


r egr es sion ) or a ba ckwa r d elim in a t ion s t epwis e m odel is u se d. Th e defa u lt is a fixed m odel.
If a ba ckwa r d elim in a t ion st epwise m odel is s elect ed, ch oose t h e P -to-rem ove va lu e (defau lt
is .01). The ba ckw a r d elimin a t ion st a r t s wit h a ll select ed var iables in t h e m odel (t h e fixed
pr ocedu r e). However , it pr oceeds t o dr op var iables t h a t fa il t h e P-t o-r em ove test , on e a t a
t ime. An y va r iable t h a t h a s a significa n ce level in excess of t h e P-t o-r em ove va lue is dr opped
fr om t h e equ a t ion .

Sa ve Est im ate d Coe fficie n ts/Pa ram e te rs

Th e est ima t ed coefficient s of t h e fina l model ca n be sa ved as a ‘dbf’ file. Specify e a


file n a m e. This would be us efu l in or der t o r epea t t h e r egres sion wh ile a ddin g in exter n a l
t r ips t o th e pr edicte d or igin s (see Ma k e t r ip gen er a t ion p r ediction below) or t o ap ply t h e
coefficien t s t o a n ot h er da t a set (e.g., fu t u r e va lu es of th e in depen den t va r ia ble).

S av e Ou tp u t

Th e out pu t is s a ved a s a ‘dbf’ file u n der a differ en t file n a m e. Th e out pu t in clud es a ll


t h e va r ia bles in t h e in pu t da t a set plu s t wo new ones: 1) t h e pr edict ed va lu es of th e
depen den t va r ia ble for ea ch obser va t ion (with t h e n a m e P RE DICTE D); a n d 2) t h e r esidu a l
er r or va lu es, r epr esen t in g t h e differ en ce bet ween t h e a ct u a l /obser ved va lu es for ea ch
obs er va t ion a n d t h e pr edict ed va lu es (wit h t h e n a m e RE SIDUAL).

P oi ss on ou t p u t

Th e out pu t of t h e P oiss on r egr ession r out in es in clud es 13 fields for t h e en t ir e m odel:

13.31
1. Th e depen den t va r ia ble
2. Th e t ype of m odel
3. The sample size (N)
4. Th e degrees of fr eedom (N - # depen den t var iables – 1)
5. Th e t yp e of r egr ession m odel (P ois son , P ois son wit h over -dis per sion
cor r ect ion )
6. Th e log-likelih ood va lu e
7. Th e Lik elih ood Ra t io
8. Th e pr obabilit y valu e of t h e Lik elih ood Ra t io
9. Th e Ak a ike I n for m a t ion Cr ite r ion (AIC)
10. Th e Schwa r t z Crit er ion (SC)
11. Th e Disper sion Mu ltiplier
12. Th e a pp r oxim a t e R-squ a r e va lu e
13. Th e devia n ce R-squ a r e va lu e

a n d 5 fields for ea ch est ima t ed coefficient :

14. Th e est ima t ed coefficient


15. Th e st a n da r d er r or of t h e coefficient
16. Th e pseu do-t oler a n ce va lu e of t h e coefficien t (see below)
17. Th e Z-valu e of t h e coefficient
18. Th e p-va lue of t h e coefficient .

O LS ou t p u t

Th e out pu t of t h e Or din a r y Lea st Squ a r e (OLS) r out in e in clud es 9 fields for t h e en t ir e


m odel:

1. Th e depen den t va r ia ble


2. Th e t ype of m odel
3. The sample size (N)
4. Th e degrees of fr eedom (N - # depen den t var iables – 1)
5. The type of regression m odel (Norm a/Ordina ry Least Squar es)
6. Squ a r ed m u ltiple R
7. Adjus t ed squ a r ed m u ltiple R
8. F t est of t h e m odel
9. p-va lue of t h e m odel

a n d 5 fields for ea ch est ima t ed coefficient :

10. Th e est ima t ed coefficient


11. Th e st a n da r d er r or of t h e coefficient
12. The t olera n ce value of t h e coefficient (see below)
13. Th e t -valu e of t h e coefficient
14. Th e p-va lue of t h e coefficient .

13.32
Multi co lin e ari ty Amo n g t h e Ind e pe n de n t Varia ble s

To test m u lt icolin ea r it y, a t oler a n ce te st is r u n (see equ a t ion 13.23 a bove). Th er e is


n ot a sim ple t es t of wh et h er a pa r t icula r t oler a n ce is m ea n in gful or n ot. I n Crim eS tat,
sever a l qu a lit a t ive ca t egor ies a r e u sed a n d er r or m essa ges a r e ou t pu t :

1. If t h e t oler a n ce va lu e is 0.80 or gr ea t er , t h en t h er e is lit t le m u lt icolin ea r it y (N o


a ppa r en t m u lt icolin ea r it y);

2. If t h e t oler a n ce is bet ween 0.50-0.79, t h er e is s ome m u lt icolin ea r it y (poss ible


m u lt icolin ea r it y);

3. If t h e t oler a n ce is bet ween 0.25-0.49, t h er e is p r obable m u lt icolin ea r it y


(pr oba ble mu lticolinea r ity. Elimin a t e var iable with lowest t olera n ce a n d r e-
r u n ); a n d

4. If t oler a n ce is less t h a n 0.25, t h er e is d efinit e m u lt icolin ea r it y. (Defin it e


mu lticolinear ity. Results a re not reliable. Elimina te variable with lowest
t oler a n ce a n d r e-r u n ).

Gr a p h

Wh ile t h e out pu t pa ge is open , clickin g on t h e gr a ph bu t t on will d is pla y a gr a ph of th e


r esid u a l er r or s (on t h e Y axis) a ga in st t h e pr edict ed va lu es (on t h e X axis).

Ma k e T ri p Ge n e r a ti o n P r e d i c t io n

Th is r out in e a pp lies a n a lr ea dy-ca libr a t ed r egr ession m odel t o a da t a set . Th is would


be u seful for sever a l r ea son s: 1) if ext er n a l t r ips a r e t o be add ed t o th e m odel (which is
n or m a lly p r efer r ed); 2) if t h e m odel is a pplied t o a n ot h er da t a set ; a n d 3) if va r ia t ion s on t h e
coefficient s a r e being t est ed wit h t h e sa m e da t a set . The m odel will n eed t o be ca libra t ed
firs t (see Ca libr a t e t r ip gen er a t ion m odel) a n d t h e coefficien t s s a ved a s a pa r a m et er s file.
The coefficient par am eter file is th en re-loaded an d applied to th e data .

Da ta Fi le

Th e da t a file is inpu t a s eit h er t h e pr ima r y or secon da r y file. Specify whet h er t h e


da t a file is t h e pr im a r y or s econ da r y file.

Ty pe o f Mo de l

Specify wh et h er t h e m odel is for or igin s or dest in a t ion s. Th is will be pr in t ed ou t on


t h e ou t pu t h ea der .

13.33
Trip Ge n e rat io n Coe fficie n ts /P ara m e te rs F ile

Th is is t h e sa ved coefficien t pa r a m et er file. It is a n ASCI I file a n d ca n be ed it ed if


a lt er n a t ive coefficien t s w er e bein g t es t ed (be car efu l a bout edit in g t h is w it h out m a k in g a
ba cku p). Loa d t h e file by clickin g on t h e Browse bu t t on a n d fin din g th e file. On ce loa ded,
t h e va r ia ble n a m es of t h e sa ved coefficien t s a r e disp la yed in t h e “Ma t chin g pa r a m et er s” box.

In d e pe n d e n t Va ri ab le s

Select ind epen den t var iables from t h e list of var iables in t h e da t a file. Up t o 15
var iables can be selected.

Match in g P ara m e te rs

Th e select ed in depen den t va r ia bles n eed t o be ma t ched t o th e sa ved va r ia bles in t h e


t r ip gen er a t ion pa r a m et er s file in t h e sa m e or der . Add t h e a ppr opr ia t e va r ia bles on e by on e
in t h e ord er in wh ich t h ey a r e list ed in t h e m a t chin g pa r a m et er s box. It is es sen t ia l t h a t t h e
or der by th e sa m e ot h er wise t h e coefficient s will be ap plied t o t h e wr on g va r iables.

H in t : With your cur sor pla ced in t h e list of in depen den t va r ia bles , t ypin g t h e
firs t let t er of t h e m a t chin g var ia ble n a m e will t a k e you t o th e firs t va r ia ble
t h a t st a r t s wit h t h a t let t er . Repea t in g t h e let t er will m ove down t h e list t o t h e
second, third, and so fort h u nt il th e desired var iable is reached.

Miss in g Valu e s

Specify an y miss ing valu e codes for t h e var iables. Blan k r ecor ds will au t om a t ica lly be
con sid er ed a s m iss in g. If an y of t h e select ed depen den t or in depen den t va r ia bles h a ve
m iss in g valu es, t h ose r ecor ds will be excluded fr om t h e a n a lysis .

Ad d E x te r n a l Tr ip s

E xt er n a l t r ips a r e t r ips t h a t st a r t out sid e t h e m odeled s t u dy a r ea . Beca u se t h ey a r e


cr im es t h a t or igin a t e ou t s id e t h e s t u dy a r ea , t h ey wer e n ot in clu d ed in t h e zon es u sed for t h e
origin m odel. Th er efor e, t h ey h a ve t o be ind epen den t ly est im a t ed a n d a dd ed t o th e origin
zon e t ot a l t o m a k e t h e n u m ber of or igin s equ a l t o t h e n u m ber of d es t in a t ion s . Click on t h e
“Ad d ext er n a l t r ip s” bu t t on t o en a ble t h is fea t u r e.

N u m b er o f e xt e r n a l t r i p s

Ad d t h e n u m ber of ext er n a l t r ip s t o t h e box. Th is nu m ber will be a dd ed as an ext r a


or igin zon e (t h e E xt er n a l zon e).

13.34
O r i g i n ID

Specify th e origin ID var iable in t he dat a file. The extern al tr ips will be added as a n
extr a or igin zon e, ca lled th e “E xter n a l” zon e. Not e: t h e ID ’s u sed for t h e dest ina t ion file zon es
sh ould be t h e sa m e a s in t h e origin file. Th is will be n ecess a r y in su bsequ en t m odelin g
sta ges.

Ty pe o f R e gre s si on Mo de l

Specify t h e t yp e of r egr ession m odel t o be u sed. Th e defa u lt is a P ois son r egr ession
a n d t h e ot h er a lt er n a t ive is a Nor m a lly-d is t r ibu t ed/Or din a r y Lea st Squ a r es r egr ession .

S a v e P r e d i c te d Va lu e s

Th e out pu t is s a ved a s a ‘dbf’ file u n der a differ en t file n a m e. Th e out pu t in clud es a ll


t h e va r ia bles in t h e in pu t da t a set plu s t h e pr edict ed va lu es of t h e depen den t va r ia ble for
ea ch obser vat ion (with t h e n a m e PRE DICTE D). In a ddit ion , if exter n a l tr ips a r e a dded , th en
t h er e sh ould be a n ew r ecor d w it h t h e n a m e E XTE RN AL list ed in t h e Or igin ID colu m n .
Th is r ecor d list s t h e a dded t r ips in t h e PRE DICTE D colum n a n d zeros (0) for a ll ot h er
nu mer ic fields.

Ou tp u t

Th e t a bu lar ou t pu t inclu des s u m m a r y in for m a t ion a bou t file a n d list s t h e pr edict ed


valu es for ea ch inp u t zon e.

B ala n ce P re d ic te d Orig in s & D e st in a tio n s

Sin ce, by defin ition, a ‘t r ip’ h a s a n or igin a n d a dest ina t ion , th e n u m ber of pr edict ed
origin s m u st equ a l t h e n u m ber of pr edicted dest in a t ion s. Beca u se of sligh t differ en ces in t h e
da t a set s of t h e or igin m odel a n d t h e dest ina t ion m odel, it is possible t h a t t h e t ot a l nu m ber
of pr edict ed or igin s (in clu din g a n y ext er n a l t r ip s – see Ma k e t r ip gen er a t ion pr edict ion
a bove) m a y n ot equ a l t h e t ot a l n u m ber of p red ict ed des t in a t ion s. Th is st ep , t h er efor e, is
essen t ial t o gua r a n t ee t h a t t h is con dit ion will be tr u e. The r ou t ine a djus t s eit h er t h e
n u m ber of pr edict ed or igin s or t h e n u m ber of pr edict ed dest in a t ion s so t h a t t h e con dit ion
h olds . Th e t r ip d ist r ibu t ion r out in es will n ot work u n less t h e n u m ber of pr edicted origin s
equ a ls t h e n u m ber of pr edict ed dest in a t ion s (wit h in a ver y s m a ll r ou n din g-off er r or ).

P re di ct e d Orig in Fi le

Specify t h e n a m e of t h e pr edict ed or igin file by click in g on t h e Br owse bu t t on a n d


loca t ing t h e file.

Origin var iable

Specify t h e n a m e of th e va r ia ble for t h e pr edict ed or igin s (e.g., P RE DICTE D).

13.35
P re di ct e d D e st in at io n Fi le

Specify t h e n a m e of t h e pr edict ed dest in a t ion file by click in g on t h e Br owse bu t t on


a n d loca t ing t h e file.

D e st i n a t i o n v a r i a b l e

Specify t h e n a m e of th e va r ia ble for t h e pr edict ed or igin s (e.g., P RE DICTE D).

Ba la n cin g m eth od

Sp ecify wh et h er origin s or dest in a t ion s a r e t o be held con st a n t . Th e defau lt is ‘H old


d es t in a t ion s con st a n t ’.

Sa ve P re di ct e d Orig in /De st in at io n Fi le

Th e out pu t is s a ved a s a ‘dbf’ file u n der a differ en t file n a m e. Th e out pu t in clud es a ll


t h e va r ia bles in t h e in pu t da t a set plu s t h e a dju st ed va lu es of th e pr edict ed va lu es of th e
depen den t var iable for ea ch obser vat ion . If dest ina t ion s a r e h eld con st a n t , th e a djus t ed
va r ia ble n a m e for t h e pr edicte d t r ips is ADJ ORI GIN. If or igin s a r e h eld con st a n t , t h e
a dju st ed va r ia ble n a m e for t h e pr edicte d t r ips is ADJ DE ST.

Ou tp u t

Th e t a bu la r ou t pu t in clu des file su m m a r y in for m a t ion plu s in for m a t ion a bou t t h e


n u m ber of or igins a n d des t ina t ion s befor e a n d a ft er ba lan cing. In a ddit ion , th e pr edict ed
values of th e dependent var iable are displayed.

E x a m p le Tr ip Ge n e r a t io n Mo d e l

To illu st r a t e t h is m odel, let ’s r u n t h r ou gh t h ese pr ocedu r es u sin g a n exa m ple fr om


Ba lt im or e Cou n t y. In t h e ca se of Ba lt im or e Cou n t y, t r a ffic a n a lysis zon es (TAZ) wer e u sed
for t h e zon a l geogr a ph y. Tw o da t a set s a r e pr odu ced, on e for t h e cr im e or igin s a n d on e for
t h e crim e dest in a t ion s. F or Ba lt im ore Coun t y, th e origin da t a set h a s 532 zone s cover in g
bot h Ba lt im or e Cou n t y a n d t h e Cit y of Ba lt im or e wit h t h e t ot a l n u m ber of cr im e or igin s for
ea ch zon e (su b-divid ed in t o differ en t cr im e t yp es - robber ies, bu r gla r ies, ve h icle t h eft ) a n d a
n u m ber of possible pr edict or var iables (popu lat ion , ret a il a n d n on -r et a il em ploymen t ,
m edia n h ou seh old in com e, p over t y levels , a n d veh icle own er sh ip ). Sim ila r ly, t h e dest in a t ion
da t a set h a s 325 zon es wit h t h e n u m ber of cr ime d est ina t ion s for ea ch zon e (a gain , sub-
divid ed in t o differ en t cr im e t yp es) a n d n u m ber of possible pr edict or va r ia bles (popu la t ion ,
r et a il a n d n on -r et a il em ploymen t , med ian h ou seh old incom e, an d sever a l la n d u se cat egor ies
- a cr ea ge alloca t ed for r et a il, r esiden t ial, office spa ce, an d con ser vat ion u ses). Sa m ple da t a
set s a r e pr ovided on t h e Crim eS tat downloa d pa ge.

13.36
S e tt in g Up th e Ori gi n Mo de l

In t h e fir st st ep, a n or igin m odel is cr ea t ed. F igu r e 13.6 sh ows t h e select ion of th e
depen den t va r ia ble a n d s ome poss ible in depen den t va r ia bles . Th e t ype of model is a n
or d in a r y P ois son r egr es sion . Th e d ep en d en t va r ia ble is t h e n u m ber of cr im es occu r r in g
bet ween 1993 an d 1997 in ea ch or igin zon e (BCORIG). Eight possible indep en den t var iables
h a ve been select ed: t h e 1996 popu la t ion of ea ch zon e (P OP 96), t h e m edia n h ou seh old in com e
of t h e zone r ela t ive t o th e zone wit h t h e h igh es t m edia n h ous eh old in com e (IN CE QU AL), t h e
n u m ber of 1996 n on-r et a il em ployees in ea ch zone (NON RE T96), t h e n u m ber of 1996 r et a il
em ployees in ea ch zon e (RE TE MP 96), t h e t ot a l lin ea r m iles of a r t er ia l r oa ds in ea ch zon e
(ARTE RIAL), a du m m y va r ia ble for w h et h er t h e Ba lt im ore Belt wa y (I-695) pa ss ed t h r ough
t h e zone or n ot (BE LTWAY), t h e lin ea r dis t a n ce of th e zone from Ba lt im ore h a r bor in t h e
CBD (DISTANCE ), a n d t h e n u m ber of h ous eh olds wit h out a u t omobiles (ZE ROAUTO - th is
ca n n ot be seen in t h e im a ge).

Th e m odel is set u p t o r u n a P ois son r egr ession wit h ou t a n over -dis per sion cor r ect ion .
It is a fixed m odel in wh ich a ll ind epen den t var iables a r e inclu ded. Th e coefficient s a r e sa ved
u n der “Sa ve es t im a t ed coefficien t s” dia logu e box a n d t h e out pu t (t h e pr edict ed va lu es) a r e
sa ved un der t h e “Sa ve ou t pu t ” dia logue box. Bot h boxes as k for a file n a m e.

Ta ble 13.2 sh ows t h e r esu lt s. Key st a t ist ics a r e h ighlight ed. Th e overa ll m odel is
h igh ly significa n t . The Likelihood Ra t io is high ly significa n t a n d t h e R-squ a r es a r e
r ea sona bly h igh (0.50 for t h e R-squ a r e a n d 0.42 for t h e devian ce R-squ a r e). The coefficient s
for ea ch of th e va r ia bles a r e sign ifica n t .

H owever, t h er e a r e t wo m a jor pr oblems . Fir st , th e disper sion m u ltiplier (pa r a m et er )


is very lar ge (37.087), in dica t ing t h a t t h e con dit ion a l va r ian ce is m or e t h a n 37 tim es gr ea t er
t h a n t h e con dit ion a l m ea n . Secon d, wh ile a ll of t h e coefficien t s a r e sign ifica n t , s ever a l s h ow
sizea ble m u lticolinea r ity a s eviden ced by t h e ps eu do-t oler a n ce valu e (P OP 96, DISTANCE ,
ZEROAUTO). Th is in dica t es t h a t t h es e va r ia bles a r e es se n t ia lly m ea su r in g t h e sa m e t h in g.

R e st ru c tu ri ng th e Ori gi n Mo de l

Con sequ en t ly, th e m odel is res t r u ct u r ed in t h r ee wa ys (figu r e 13.7). Fir st , th e over-


disp er sion cor r ect ion is a pp lied. Secon d, t h e co-linea r var iab les DIS TANCE a n d
ZE ROAUTO a r e dr opped fr om t h e m odel. Third , a st epwise ba ckw a r d elimin a t ion pr ocedu r e
is u sed wit h t h e pr oba bilit y for keep ing a var iable in t h e equa t ion (p-to-r em ove) being 0.01;
t h a t is, u n less t h e pr obabilit y t h a t a coefficien t cou ld be obta in ed by cha n ce is less t h a n 1 in
100, t h e va r ia ble is dr opped.

Th e r esu lt is n ow a m odel wit h t h e Lik elih ood Ra t io a n d R-s qu a r es bein g a lm ost a s


h igh a s in t h e firs t m odel a n d in wh ich a ll t h e coefficient s a r e significa n t , but t h er e is very
litt le mu lticolinea r ity. (Ta ble 13.3). The disper sion m u ltiplier is n ow 1.0 since th e
coefficien t st a n da r d er r or s h a ve been cor r ect ed for t h e or igin a l over -dis per sion (see equ a t ion
13.8 a bove).

13.37
Figure 13.6:
Origin Poisson Model Setup
Figure 13.7:
Origin Poisson Model with Over-dispersion Correction
Ta ble 13.2

R esu lts of Fir st Or igin Mode l Run

Model result:
Data file: BaltOrigins.dbf
Type of model: Origin
DepVar: BCORIG
N: 532
Df: 523
Type of regression model: Poisson with over-dispersion correction
Log Likelihood: -10678.051687
Likelihood ratio(LR): 25609.182621
P-value of LR: 0.0001
AIC: 21374.103373
SC: 21412.593165
Dispersion multiplier: 37.086973
R-square: 0.499539
Deviance r-square: 0.420031

-----------------------------------------------------------------------------
Pseudo-
Predictor DF Coefficient Stand Error Tolerance z-value p-value
CONSTANT 1 0.887266
0.037707 . 23.530608 0.001
POP96 1 0.000337
0.000016 0.463218 21.665568 0.001
INCEQUAL 1 -0.033017
0.001226 0.608346 -26.926013 0.001
NONRET96 1 -0.000173
0.000028 0.842042 -6.082943 0.001
RETEMP96 1 -0.000364
0.000117 0.960564 -3.107357 0.010
ARTERIAL 1 -0.108257
0.025888 0.771634 -4.181834 0.001
BELTWAY 1 0.150967
0.036047 0.958973 4.188082 0.001
DISTANCE 1 0.034289
0.007842 0.491906 4.372170 0.001
ZEROAUTO 1 -0.000462
0.000141 0.355510 -3.283930 0.010
-----------------------------------------------------------------------------

Lookin g a t t h e m odel, we see six va r ia bles t h a t sign ifica n t ly pr edict t h e n u m ber of


cr im e or igin s. P opu la t ion is t h e st r on gest , a s in dica t ed by it s Z-t est . Rela t ive in com e
equ a lit y is t h e n ext st r ongest , bu t t h is is a n ega t ive coefficien t ; th a t is, zon es wit h h igh
r ela t ive in com e equ a lit y p r odu ce fewer cr im e or igin s wh er ea s zon es wit h low r ela t ive in com e
equ a lit y pr odu ce more cr im e origin s. Th e t h ir d a n d fou r t h st r onges t va r ia bles a r e n on-r et a il
a n d r et a il em ploymen t r espectively, but , aga in, t h e coefficient s a r e n egat ives; zon es wit h less
em ploym en t h a ve m or e cr im es or igin a t e fr om t h em . F in a lly, t h e t wo r oa dwa y va r ia bles sh ow
s ign ifica n t effect s . Zon es in wh ich t h e Ba lt im or e Belt wa y p a ss es t h r ou gh t h em h a ve a
h igh er n u m ber of cr imes or igina t ing (a s m igh t be expect ed) an d a lso zon es wit h fewer m iles
of a r t er ia l h a ve m or e cr im es or igin a t in g; wit h t h e la t t er va r ia ble, it ’s pos sible t h a t we a r e
pick in g u p t h e la ck of com m er cia l em ploym en t oppor t u n it ies sin ce r et a il fir m s t en d t o loca t e
on a r t er ia l r oa ds ra t h er t h a n loca l s tr eet s .

R e s i d u a l An a l y s is o f Or ig i n Mo d e l

Th e Crim eS tat ou t p u t in clu d es a gr a p h of t h e r es id u a l er r or s (a ct u a l va lu es m in u s t h e


pr edict ed va lu es) on t h e Y-axis by t h e pr edict ed va lu es on t h e X-axis. It is im por t a n t t o
exa m in e t h e r es id u a l er r or s a s t h es e ca n in d ica t e ou t lier s , p r oblem s in t h e d a t a , a n d
viola t ion of a s su m p t ion s . F igu r e 13.8 s h ows a n im a ge of t h e r es id u a l gr a p h scr een . As seen ,

13.40
Figure 13.8:
Plot of Residual Errors and Predicted Values
t h e er r or s in cr ea se wit h t h e va lu e of t h e pr edict ed depen den t va r ia ble. Wit h t h e P ois son
m odel, t h is is exp ect ed a n d does n ot in dica t e t h e viola t ion of t h e in depen den t er r or s
a ssu m pt ion , a s it does wit h t h e OLS. Th e er r or s a r e r ea son a bly sym m et r ica l a n d do n ot
in dica t e differ en ces in over- an d u n der -est im a t ion a cross t h e ba n d of th e pr edicte d va lu es .

Th er e a r e some ou t liers; th er e a r e t wo zon es t h a t pr edict ed su bst a n t ially more crim es


t h a n a ctu a lly or igina t ed in t h ose zones a n d t h er e is on e zone t h a t h a d m ore cr im es origin a t e
fr om it t h a n wa s pr edict ed by th e m odel. But , in gener a l, t h e m odel a ppea r s t o be
reasonably balanced.

Ta ble 13.3

R e su lt s o f S e c on d Ori gi n Mo d e l

Model result:
Data file: BaltOrigins.dbf
Type of model: Origin
DepVar: BCORIG
N: 532
Df: 525
Type of regression model: Poisson with over-dispersion correction
Log Likelihood: -11262.292156
Likelihood ratio(LR): 24440.701682
P-value of LR: 0.0001
AIC: 22538.584312
SC: 22568.520816
Dispersion multiplier: 1.000000
R-square: 0.455630
Deviance r-square: 0.446502

-----------------------------------------------------------------------------
Pseudo-
Predictor DF Coefficient Stand Error Tolerance z-value p-value
CONSTANT 1 2.286699
0.039339 . 58.127787 0.001
POP96 1 0.000284
0.000013 0.943426 22.473451 0.001
INCEQUAL 1 -0.018525
0.001026 0.849679 -18.048743 0.001
NONRET96 1 -0.000186
0.000030 0.866522 -6.139941 0.001
RETEMP96 1 -0.000353
0.000125 0.960769 -2.820286 0.010
ARTERIAL 1 -0.085070
0.027006 0.938167 -3.150019 0.010
BELTWAY 1 0.123109
0.037868 0.970051 3.251004 0.010
-----------------------------------------------------------------------------

S e tt in g Up th e D e st in a ti on Mo de l

Th e sa m e logic is a pp lied for t h e dest in a t ion m odel. In t h is ca se, t h e dest in a t ion file
h a s da t a on 325 zon es wit h in Ba ltim or e Cou n t y on ly. Similar possible pr edict or var iables
a r e in clud ed in t h e file. Aside from popu la t ion, r et a il a n d n on-r et a il em ploym en t , a n d t h e
r oa dwa y va r ia bles, m or e det a iled a n a lysis on la n d u ses wer e in clu ded (a cr ea ge of
com m er cial, r esiden t ia l, office spa ce, recrea t ion a l, an d con ser va t ion la n ds ). Th e m odel t h a t
wa s r u n wa s a P ois son wit h a n over -dis per sion cor r ect ion . Aga in , a ba ckwa r d elim in a t ion
pr ocedu r e wa s a dopt ed. On ce a fin a l m odel wa s s elect ed, it wa s r e-ru n a s a fixed m odel t o
ensur e tha t t he coefficient s were consistent ly estimat ed. Table 13.4 present s th e results.

13.42
F ive va r ia bles en ded u p in t h e fina l m odel. Aga in , popu la t ion wa s s ignifican t ly
r ela t ed t o t h e n u m ber of cr im es a t t r a ct ed t o a zon e, bu t wa s n ot t h e st r on gest pr edict or a s
in dica t ed by t h e Z-t es t . Th e st r ongest r ela t ions h ip w a s for t h e n u m ber of r et a il em ployees .
Th is su ggest s t h a t r et a il/com m er cia l a r ea s a t t r a ct m a n y cr im es. Th is is su ppor t ed by on e of
t h e lan d u se var iables - th e a cr ea ge ass ociat ed wit h very lar ge ma lls; in oth er wor ds, t h er e
a r e a dd it ion a l cr im es a t t r a cted t o ver y lar ge m a lls a bove-a n d-beyond t h e n u m ber of r et a il
em ployees in t h ose zones . Two ot h er va r ia bles a r e in t h e equ a t ion . Rela t ive in com e equ a lit y
wa s, aga in, n egat ively r elat ed t o cr ime d est ina t ion s/at t r a ct ion s; zon es wit h low incom e t en d
t o at t r a ct m ore crim es . Also, t h er e wa s a n ega t ive a ss ocia t ion wit h dis t a n ce fr om t h e CBD.
Th e fa r t h er a wa y fr om t h e CBD, th e lower t h e n u m ber of cr imes . Over a ll, t h e m odel
su ggest s t h a t zon es wit h com m er cia l a ct ivit ies, p a r t icu la r ly wit h la r ge m a lls, bu t wh ich a r e
clos er t o t h e cit y cen t er a n d wh ich h a ve h ou seh old s wit h r ela t ively lower in com es a r e t h ose
th at at tr act t he most crimes.

Th e over a ll m odel wa s h igh ly sign ifica n t , a s in dica t ed by t h e Lik elih ood Ra t io a n d t h e


R-squ a r e. Ther e wa s a discrep a n cy betw een t h e R-squ a r e st a t ist ic a n d t h e Devian ce R-
squ a r e, ma kin g it u n clear a bou t h ow st r on g is t h e m odel (t h e R-squ a r e wou ld su ggest t h a t
it ’s s t r ong wh er ea s t h e Devia n ce R-squ a r e wou ld n ot). Never t h eles s, t h e overa ll
pr edict a bilit y is r ea sona ble. The a m ou n t of m u lticolinea r ity is t olera ble.

Ta ble 13.4

R e su lt s o f F irs t D e st in a ti on Mo d e l

Model result:
Data file: BCDestinations.dbf
Type of model: Destination
DepVar: BCDEST
N: 325
Df: 319
Type of regression model: Poisson with over-dispersion correction
Log Likelihood: -10347.872494
Likelihood ratio(LR): 41708.925054
P-value of LR: 0.0001
AIC: 20707.744988
SC: 20730.447939
Dispersion multiplier: 1.000000
R-square: 0.596921
Deviance r-square: 0.310251

-----------------------------------------------------------------------------
Pseudo-
Predictor DF Coefficient Stand Error Tolerance z-value p-value
CONSTANT 1 5.485851
0.218977 . 25.052182 0.001
POP96 1 0.000190
0.000027 0.928694 6.935850 0.001
INCEQUAL 1 -0.017176
0.005464 0.903130 -3.143462 0.010
RETEMP96 1 0.001018
0.000062 0.717076 16.297855 0.001
VERYLRGMLACR 1 0.006446
0.000974 0.740927 6.616423 0.001
DISTANCE 1 -0.115709
0.017069 0.876461 -6.778875 0.001
-----------------------------------------------------------------------------

13.43
R e si du a l An a ly si s o f D e s ti na ti on Mo de l

As wit h t h e or igin m od el, a n a n a lys is wa s con du ct ed of t h e r es id ua l er r or s . Th is tim e,


t h e out pu t ‘dbf’ file wa s br ough t in t o Excel an d a n icer gr a ph crea t ed (figur e 13.9). Un like
t h e best or igin m odel, th e disper sion of t h e r esidu a ls is not s ymm et r ica l. Ther e a r e severa l
m a jor out lier s, bot h on t h e n ega t ive en d of t h e r esidu a ls (over -est im a t ion of crim e
a t t r a ctions) a n d on t h e posit ive en d (u n der -est im a t ion of crim e a t t r a ctions). In pa r t icula r ,
t h er e a r e t wo zon es t h a t seem t o st a n d ou t . Bot h of t h em h a ve sh oppin g m a lls (Golden Rin g
Ma ll a n d E a st poin t Ma ll). But t h e a m ou n t of cr ime in t h ose zon es is m u ch grea t er t h a n t h e
m odel pr edict s. This is seen a s h igh positive residu a ls (i.e., t h er e wer e m or e a ct u a l cr imes
t h a n pr edicte d). Th ey bot h a r e older m a lls, bu t a r e loca t ed in r ela t ively h igh crim e a r ea s.
Golden Rin g Ma ll wa s d em olish ed sever a l yea r s a go, bu t a fter t h e da t a t h a t a r e bein g
an alyzed in t his exam ple were collected.

Addin g in Sp e cia l Gen e rato rs

Sin ce t h e n u m ber of cr im e in cid en t s (a t t r a ct ion s) in t h ose t wo zon es wer e m u ch


higher t ha n was expected, they were treat ed as ‘special genera tors’. Keeping in m ind the
ca u t ion t h a t on e doesn ’t wan t t o over-u se t h is ca t egor y, we ca n st ill demonst r a t e how it
work s. Two new va r ia bles wer e cre a t ed for t h e da t a set . On e wa s for t h e Golden Rin g Ma ll
a n d on e wa s for t h e E a st poin t Ma ll. For t h e Golden Rin g Ma ll, th e zone t h a t in clud ed it
r eceived a ‘1' for t h is va r ia ble wh ile a ll ot h er zon es r eceived a ‘0'. Sim ila r ly, for t h e E a st poin t
Ma ll va r ia ble, t h e zone in wh ich it occur r ed r eceived a ‘1' wh ile a ll oth er zones r eceived a ‘0'.
Th ese d u m m y va r ia bles wer e t h en in clu ded in t h e m odel (Ta ble 13.5).

Addin g th e t wo special gener a t or s in cr ea ses t h e pr edict a bilit y subs t a n t ially. The


Likelihood Ra t io ju m ps a s does t h e R-squ a r e; t h e Devia n ce R-squ a r e st a t ist ic, however ,
a ct u a lly dr ops, su ggest ing t h a t it is n ot a r eliable indica t or with t h ese da t a . The coefficient s
for t h e t wo zon es , t r ea t ed as sp ecia l gen er a t or s , a r e bot h h igh ly s ign ifica n t a n d, in fa ct , a r e
t h e st r onges t va r ia bles in t h e equ a t ion . All ot h er va r ia bles h a ve t h e sa m e r ela t ion sh ips a s
in t h e first r u n . Th er e does n ot a pp ea r t o be su bst a n t ia l m u lt icolin ea r it y.

Th is br in gs u p a n iss u e over t h e st a t u s of a s pecial gen er a t or. I n t h is exa m ple , t h e


t wo zon es wer e t r ea t ed a s s pecial gen er a t ors in t h e m odel. While t h e pr edicta bilit y
in cr ea s ed su bs t a n t ia lly, on e h a s t o won d er wh et h er t h is wa s a m ea n in gfu l op er a t ion ? Th a t
is , if t h is m odel wer e a pplied t o da t a for a la t er t im e per iod (e.g., 2002-2004 cr im e da t a ),
wou ld t h e r elat ion sh ips st ill h old? In t h e ca se of t h e Golden Ring Ma ll, it wouldn ’t sin ce t h a t
ma ll ha s since been demolished.

Th e valu e of a special gener a t or is t h a t it iden t ifies a lan d u se t h a t wou ld be expect ed


t o be r ela t ively per m a n en t (e.g., a st a diu m or a t r a in st a t ion or a n a ir por t ). In t h e ca se of a
sh oppin g m a ll, it m a y or m a y not. If it ’s a h igh visibilit y ‘r egiona l’ m a ll, th en t r ea t in g it a s a
sp ecia l gen er a t or is pr obably a good id ea . If it ’s a sm a ller , older m a ll, on t h e oth er h a n d, t h e
a n a lysis is gu es sin g t h a t t h e m a ll will m a in t a in it s s t a t u s a s a h igh crim e a t t r a ction loca t ion.
Clea r ly, ju dgm en t a n d k n owledge of t h e pa r t icula r m a ll is es sen t ia l.

13.44
Figure 13.9:

Residual Errors for Crime Destinations

2000

1500
(Observed-Predicted)

1000

500
Residual error

0
0 500 1000 1500 2000 2500 3000

-500

-1000

Predicted values
Ta ble 13.5

R e su lt s o f S e c on d D e st in a ti on Mo d e l

Model result:
Data file: BcDestinations.dbf
Type of model: Destination
DepVar: BCDEST
N: 325
Df: 317
Type of regression model: Poisson with over-dispersion correction
Log Likelihood: -7852.238456
Likelihood ratio(LR): 46700.193131
P-value of LR: 0.0001
AIC: 15720.476911
SC: 15750.747513
Dispersion multiplier: 1.000000
R-square: 0.784194
Deviance r-square: 0.227710

-----------------------------------------------------------------------------
Pseudo-
Predictor DF Coefficient Stand Error Tolerance z-value p-value
CONSTANT 1 5.182117
0.067867 . 76.356923 0.001
INCEQUAL 1 -0.020797
0.003942 0.902950 -5.276135 0.001
RETEMP96 1 0.000995
0.000051 0.700294 19.338957 0.001
VERYLRGMLACR 1 0.006590
0.000869 0.716299 7.582758 0.001
POP96 1 0.000238
0.000020 0.921456 12.164552 0.001
DISTANCE 1 -0.087826
0.012462 0.872535 -7.047735 0.001
GOLDENRING 1 1.933321
0.069636 0.969044 27.763123 0.001
EASTPOINT 1 1.602000
0.067934 0.943548 23.581751 0.001
-----------------------------------------------------------------------------

Co m p a r i n g D i ffe r e n t Cr im e s T y p e s

With or with ou t special gener a t or s, a t r ip gener a t ion m odel is an ecologica l model


t h a t pr edict s crim e or igins a n d crim e dest ina t ion s. A poin t wa s m a de in cha pt er 11 th a t
t h ese m odels a r e n ot beha vior a l, but a r e cor r elat es of cr imes . Tha t is, th e var iables t h a t en d
u p pr edict in g t h e n u m ber of cr im es a r e n ot reason s (or expla n a t ions ) for t h e crim es .
P opu lat ion a lmost a lways en t er s t h e equa t ion beca u se, all ot h er t h ings bein g equa l, zon es
wit h la r ger n u m ber s of per son s w ill h a ve m ore crim es , both origin a t in g a n d en din g in t h em .
Sim ila r ly, low in com e st a t u s is fr equ en t ly a ssocia t ed wit h h igh cr im e a r ea s. It doesn ’t follow
t h a t low incom e per sons will be more pr on e t o com m it crim es; it m a y be tr u e but t h ese
m odels don ’t t est t h a t pr oposit ion . Th ese a r e on ly cor r ela t es wit h cr im e in t h ose
en vir onm en t s. As wa s m en t ion ed ea r lier , t h ese va r ia bles a r e oft en cor r ela t ed wit h m a n y
specific conditions t ha t m ay be pr edict or s of ind ividu a l cr ime - povert y, dr u g use,
su bst a n da r d h ou sin g, a n d la ck of job opport u n ities.

To see t h is, t h r ee s epa r a t e m odels of sp ecific crim e t ypes w er e r u n for r obber y,


bu r gla r y, a n d veh icle t h eft . F or ea ch cr im e t yp e, t h e gen er a l m odel wa s t est ed for bot h t h e
or igin a n d t h e dest ina t ion m odels. If a var iable wa s n ot significa n t , it wa s dr opped a n d t h e
m odel wa s r e-ru n . Th e r esu lt s of t h e origin m odel for t h e t h r ee cr im e t ypes a r e seen in t a ble
13.6 while th e r esu lts of t h e dest ina t ion m odel a r e seen in t a ble 13.7.

13.46
Ta ble 13.6

Mo d e l s fo r S p e c i fi c Cr im e T yp e s :
Ori gi n Mo d e l

Vehicle
All Crimes Robbery Burglary Theft
CONSTANT 2.286699 -0.652291 1.621546 -0.800759

INCOME
EQUALITY -0.018525 -0.023964 - -0.019620

NON-RETAIL
EMPLOYMENT -0.000186 -0.000237 -0.000239 -0.000188

RETAIL
EMPLOYMENT -0.000353 - -

POPULATION 0.000284 0.000297 0.000242 0.000342

BELTWAY 0.123109 - - -

MILES OF
ARTERIAL -0.085070 - - -0.180966

Ta ble 13.7

Mo d e l s fo r S p e c i fi c Cr im e T yp e s :
D e st in a ti on Mo d e l

Vehicle
All Crimes Robbery Burglary Theft
CONSTANT 5.485851 3.284488 3.246183 2.610299

INCOME
EQUALITY -0.017176 -0.027946 -0.034598 -0.012910

RETAIL
EMPLOYMENT 0.001018 0.000844 - 0.000507

VERY
LARGE
MALL
ACREAGE 0.006446 0.004332 - -

POPULATION 0.000190 0.000223 0.000309 0.000247

DISTANCE
FROM CBD -0.115709 -0.096330 -0.038715 -0.096088

Th e popula t ion var iable a ppea r s in every sin gle m odel. As m en t ion ed, all ot h er
t h in gs bein g equ a l, th e la r ger t h e n u m ber of per son s in a zon e, t h e m ore cr im e even t s will
occu r wh et h er t h os e even t s a r e cr im e p r od u ct ion s (or igin s ) or cr im e a t t r a ct ion s

13.47
(dest in a t ion s). Sim ila r ly, re la t ive in com e equ a lit y a pp ea r s in five of t h e six m odels w it h t h e
coefficien t a lwa ys bein g n ega t ive. In gen er a l, zones w it h r ela t ively lower in com es will h a ve
m ore r obber ies, bu r gla r ies, an d veh icle t h efts. Th e only m odel for wh ich in com e equ a lit y did
n ot a ppea r wa s a s a n or igin va r iable for bur gla r ies; a ppa r en t ly, bur gla r s com e fr om zon es
wit h va r ious in com e levels, a t lea st in Ba lt im ore .

Th e ot h er gener a l va r iables h a ve more limit ed a pplica bilit y. Reta il em ploymen t


pr edicts both t ota l cr im e origin s a n d t ota l cr im e dest in a t ion s, bu t only pr edicts sp ecifically
r obber y des t in a t ion s a n d veh icle t h eft dest in a t ion s; t h e la t t er t en d t o occu r m ore in
com m er cia l a r ea s t h a n n ot . On t h e ot h er h a n d, n on -r et a il em p loym en t a pp ea r s t o be
imp or t a n t on ly a s a cr ime origin var iable; zon es wit h less n on -r et a il em ploymen t t en d t o
pr odu ce m or e offen der t r ips. Dista n ce fr om t h e CBD on ly a ppea r s a s a dest ina t ion var iable;
t h e closer a zon e is t o th e m et r opolit a n cent er , t h e h igher t h e n u m ber of crim es bein g
a t t r a cted t o th a t zon e; th is va r ia ble wa s n ot im por t a n t in t h e origin m odel.

In ot h er wor ds, t h ese m odels a r e m ea su r ing gener a l con dit ion s a ssociat ed wit h cr ime,
not cau ses per se. Th ey ca pt u r e t h e gen er a l con t ext u a l r ela t ion sh ip s a ss ocia t ed wit h cr im e
pr odu ction s a n d a t t r a ction s. Bu t , t h ey don’t n ecessa r ily pr edict in divid u a l beh a vior.
Never t h eles s, t h e m odels ca n be u sed for pr ediction sin ce th e con dit ion s a pp ea r t o be qu it e
gen er a l.

Ad d i n g E x te r n a l Tr ip s

Aft er a n or igin a n d d es t in a t ion m od el h a s been d evelop ed , t h e n ext s t ep is t o a d d a n y


crime tr ips th at cam e from out side the modeling area (extern al tr ips). In th is case, these
wou ld be t r ip s t h a t ca m e fr om a r ea s t h a t wer e n ot in eit h er Ba lt im or e Cou n t y or t h e Cit y of
Ba lt im ore (th e m odelin g a r ea ).

A sim ple est ima t e of exter n a l tr ips is obt a ined by ta kin g th e differ en ce bet ween t h e
t ot a l n u m ber of crim es occur r in g in t h e st u dy a r ea (Ba lt im or e Cou n t y dest in a t ion s) a n d t h e
t ot a l n u m ber of crim es or igin a t in g in t h e m odelin g a r ea (t a ble 13.8).

Th e differ en ce bet ween t h e n u m ber of cr im e en u m er a t ed wit h in Ba lt im or e Cou n t y


a n d t h a t origin a t in g fr om bot h Ba lt im ore Coun t y a n d t h e Cit y of Balt im ore is 1,627. Th is is
3.9% of th e tota l Baltimore Coun ty crimes. In genera l, it is import an t t ha t t he extern al tr ips
be as sm a ll a s possible. Or t u zar a n d Willum sen (2001) su ggest t h a t t h is per cen t a ge be no
gr ea t er t h a n 5% in or d er t o m in im ize p ot en t ia l bia s fr om n ot in clu d in g t h os e ca s es in t h e
origin m odel. It ’s n ot a n a bs olut e per cent a ge, bu t m ore like a r u le of t h u m b; in t h eor y, a n y
exter n a l tr ips cou ld bias t h e or igin m odel. But , in pr a ct ice, th e er r or will be sm a ll if exter n a l
cr im e t r ip s a r e a sm a ll p er cen t a ge of t h e t ot a l n u m ber en u m er a t ed in t h e d es t in a t ion cou n t y.

In t h is ca se , t h e con dit ion h olds. For t h e t h r ee t ypes of cr im e m odeled, t h e per cent a ge


of ext er n a l t r ip s wa s a ls o less t h a n 5%: r obber y (4.0%), bu r gla r y (4.5%), a n d veh icle t h eft
(1.4%). On t h e ot h er h a n d, if t h e p er cen t a ge of ext er n a l t r ip s is gr ea t er t h a n ap pr oxim a t ely
5%, a u ser would be a dvis ed t o widen t h e origin st u dy a r ea t o in clud e m ore zon es in t h e
m odel.

13.48
Ta ble 13.8

E stimatin g E xtern al Cr im e Trips in Balt imor e Coun t y

N u m ber of
crim es en din g
in 325 Ba lt im or e
Coun ty zones: 41,969

N u m ber of
crim es origin a t in g
in 532 Ba lt im or e
Cou n t y/City zon es: 40,342

Cr im es fr om
ou t s id e t h e
m od elin g a r ea : 1,627

Note: exter n a l t r ips a r e only a dd ed t o th e origin m odel sin ce th ey a r e crim e


t r ip s t h a t or igin a t e ou t sid e t h e m odelin g a r ea . Th ey a r e n ot r eleva n t for t h e
dest in a t ion m odel.

P r e d i c t i n g Ex t e r n a l Tr ip s

If a m odel is bein g a pp lied t o an oth er da t a set from wh ich it wa s in it ia lly est im a t ed, a
pr oblem em er ges a bout h ow to est im a t e t h e n u m ber of ext er n a l t r ips . It is one t h in g t o ap ply
sim ple a r ith m et ic in order t o det er m ine h ow m a n y tr ips origin a t ed out side t h e m odeling a r ea
(as in t a ble 13.8). It is a n oth er t o kn ow how t o ca lcula t e ext er n a l t r ips wh en t h e m odel is
bein g a pplied t o ot h er da t a . F or t h e m odeled zon es, t h e coefficien t s a r e a pplied t o t h e
var iables of t h e m odel (see “Ma ke P r edict ion ” below). But , th e exter n a l tr ips h a ve to be
es t im a t ed in depen den t ly.

There is not a simple way to estimat e extern al crime tr ips. Unlike regular tr ips th at
ca n be es t im a t ed th r ou gh cor don cou n t s, cr im e t r ip s a r e n ot det ect a ble wh ile t h ey a r e
occu r r in g (i.e., on e ca n n ot s t a n d by a r oa d a n d cou n t offen d er s t r a velin g by). Th u s , t h ey
ha ve to be estimat ed.

A s im p le m et h od is t o ca lcu la t e t h e n u m ber of ext er n a l t r ip s for t wo t im e p er iod s. F or


exa m ple, ext er n a l t r ip s cou ld be ca lcu la t ed fr om a 2000 da t a set by s u bt r a ct in g t h e t ot a l
n u m ber of crim es occur r in g in t h e m odelin g r egion fr om t h e t ota l n u m ber of crim es occur r in g
in t h e st u dy a r ea (e.g., as in t a ble 13.8 a bove). If a sim ila r calcu la t ion wa s m a de for , sa y,
2002, th en t h e differ en ce (t h e ‘t r en d’) cou ld be extr a pola t ed. To t a ke our exam ple, between
1993 a n d 1996, t h er e wer e 1,627 ext er n a l t r ip s. If t h e n u m ber of ext er n a l t r ip s t u r n ed ou t t o
be 1,850 for 1997-2000, t h en t h e differ en ce (1,850 - 1,627 = 223) could be a pplied for fu t u r e

13.49
yea r s. E ssen t ia lly, a slop e is bein g ca lcu la t ed a n d a pplied a s a lin ea r equ a t ion :

Yi = 1850 + 223*Xi

wh er e Yi is t h e n u m ber of crim e origin s d u r in g a fou r yea r per iod, I, a n d Xi is a n in t eger for a


fou r year per iod st a r t ing with t h e n ext per iod (i.e., th e bas e year , 1997-2000, ha s in t eger
valu e of 0). In oth er wor ds, a lin ea r t r en d is being extr a pola t ed.

H ow r ea list ic is t h is ? F or sh or t t im e per iod s, lin ea r ext r a pola t ion is pr oba bly a s good
a m et h od a s a n y. But for lon ger t im e per iods , it can lea d t o spu r iou s con clus ion s (e.g., cr im e
t r ips from out side t h e r egion will a lways in cr ea se). Sh or t of developin g a soph ist ica t ed
m odel t h a t r elat es crim e t r ips t o t h e growth of t h e m et r opolita n a r ea a n d t o ot h er
m et r opolit a n a r ea s wit h in , s a y, 500 m iles, a lin ea r ext r a pola t ion is on e of t h e few m et h ods
t h a t one can a pp ly. 7

Ma k e P r e d i c ti o n

I n Crim eS tat, ext er n a l t r ip s a r e a dded on t h e secon d pa ge of t h e t r ip gen er a t ion -


Ma k e pr ediction. Th is is a pa ge wh er e t h e m odeled coefficien t s a n d a n y ext er n a l t r ips a r e
a pplied t o a da t a set . Ther e a r e t wo r ea sons wh y th is is a s epa r a t e pa ge fr om t h e “Ca libra t e
m odel” pa ge wher e t h e m odel wa s calibra t ed. Fir st , th e coefficient s m igh t be ap plied t o
a n ot h er da t a t h a n t h a t fr om wh ich it wa s calibra t ed. For exa m ple, on e m igh t ca libra t e t h e
m odel with a da t a set fr om 1998-2000 an d t h en a pply to a d a t a set coverin g 2001-2003.
Sim ila r ly, on e m igh t t a k e fu t u r e yea r for eca st s (e.g., 2025) a n d a pply t h e m odel. In effect ,
t h e m odel wou ld be pr edict ing t h e n u m ber of fu t u r e cr imes if t h e sa m e con dit ion s h old over
t h e t ime fra m e.

A second r eason for separ at ing th e calibrat ion a nd a pplicat ion pa ges is to add
ext er n a l t r ip s t o t h e or igin zon es. As m en t ion ed a bove, ext er n a l t r ip s a r e, by d efin it ion , t h ose
t h a t wer e n ot m odeled in t h e ca libr a t ion . Th ey h a ve t o be ca lcu la t ed in depen den t ly of th e
m od el a n d t h en a d ded in .

Th u s, t h e “Ma ke p r edict ion ” pa ge allows t h ese opera t ion s t o occu r . Figur e 13.10
sh ows t h e pa ge. Ther e a r e severa l st eps t h a t h a ve to be im plem en t ed for t h is pa ge to be
oper a t ive.

1. Th e da t a file h a s t o be in pu t a s eit h er t h e pr im a r y or secon da r y file (n ot sh own


in t h e im a ge). In t h is exa m ple, t h e sa m e da t a set is bein g u sed a s wa s u sed for
t h e calibr a t ion . Bu t , if it ’s a differ en t da t a set , t h a t will n eed t o be inp u t in t h e
Dat a Setu p section. Wheth er th e inpu t da ta set is a pr imar y file (th e usua l
occu r r en ce) or a secon da r y file n eeds t o be specified. Also, in dica t e wh et h er
t h e a pp lied m odel is t o be an origin or d est in a t ion m odel. In figu r e 13.10, it is
specified a s a n or igin file.

13.50
Figure 13.10:
"Make Prediction" Setup Page
2. A t r ip gener a t ion coefficient s file needs t o be inpu t . These wer e t h e est ima t ed
coefficien t s fr om t h e calibr a t ion st a ge. In pu t t in g t h is file br in gs in t h e
coefficien t s in t h e ord er in wh ich t h ey wer e sa ved. Th ey a r e list ed in t h e
“Matching para met ers” dialogue box on t he right side of th e page.

3. On t h e left side of t h e pa ge ar e list ed a ll t h e var iables in t h e inpu t da t a set


(p r im a r y or s econ d a r y file). In t h e m id dle box, t h e va r ia bles a r e a d ded in t h e
s am e o rd e r a s in t h e m a t ch ing pa r a m et er s box. Tha t is, each ind epen den t
va r ia ble n eeds t o be m a t ch ed t o t h e va r ia ble fr om t h e coefficien t s file, on e for
one. Th is is ve ry im p ort an t. Th e n a m es do not h a ve t o be th e sa m e (e.g., if
t h e m odel wa s ca libr a t ed wit h da t a set a n d a pp lied t o an oth er , t h e va r ia ble
n a m es m a y not be iden t ica l). But t h e con t en t a n d order of t h e var iables n eeds
t o be th e sa m e. In t h e exa m ple, t h e firs t va r ia ble in t h e coefficien t s file is
INCE QUAL. Th e select ed va r ia ble in t h e m id dle box h a s t o be t h e in com e
equ a lit y var ia ble (wh a t ever it s n a m e). In t h e exa m ple, t h e sa m e da t a set is
bein g u sed so t h e n a m es a r e id en t ica l. Th is is r epea t ed for ea ch of th e
ind epen den t var iables in t h e coefficient s file.

4. Next , a n y m is sin g va lu e codes a r e specified in t h e m is sin g va lu es box. An y


r ecor ds wit h a m is sin g va lu e for a n y of t h e select ed in depen den t va r ia bles will
be d r opped from t h e calcu la t ion . In t h e exa m ple, t h er e a r e n o mis sin g valu e
codes applied oth er th an th e defau lt blan k field.

5. If exter n a l tr ips a r e t o be ad ded, t h e exter n a l tr ips box m u st be ch ecked .


E xt er n a l t r ip s cou ld be a pplied in a n or igin m odel, bu t n ot in a dest in a t ion
m odel. If th ey a r e t o be add ed, t h e n u m ber of t r ips sh ould be s pecified in t h e
“Nu mber of extern al tr ips” box an d th e zone ID field for t he file indicat ed; in
t h e exam ple, 1627 is a dded a s ext er n a l tr ips a n d t h e TAZ field is sp ecified a s
t h e ID va r ia ble (TZ98).

6. Th e t ype of model t o be app lied is in dica t ed in t h e “Type of r egr ession m odel”


box. Ther e a r e only t wo ch oices : Poiss on (t h e defa u lt ) an d N orm a l (OLS).
Sin ce t h e coefficien t s a r e bein g a pplied t o t h e da t a , n o over -dis per sion
cor r ect ion is n eces sa r y (sin ce it wa s pr oba bly u sed in ca libr a t in g t h e m odel).

7. F in a lly, t h e out pu t file na m e is define d in t h e “Sa ve pr edicte d va lu es ” box.

For each zone, the rout ine will th en ta ke th e appropriat e var iable from t he input dat a
set a n d a pply th e m a t ch ing coefficient fr om t r ip gener a t ion coefficient s file to produ ce a
pr edicte d est im a t e of t h e n u m ber of t r ips . To ca lcula t e t h is va lu e, for t h e OLS m odel, t h e
r out in e will u se equ a t ion 13.2 a bove wh ile for t h e P oiss on m odel, t h e r out in e will u se
equ a t ion 13.6 above; for t h e lat t er , it will t h en r a ise t h e pr edict ed log va lue t o t h e power, e,
to produce a prediction for t he expected nu mber of crime tr ips:

8i = e ]L n (8i) ] (13.26)

13.52
If ext er n a l t r ips a r e a dd ed, a n ew zone is cr ea t ed called E XTE RN AL in t h e ID field
t h a t wa s in dica t ed on t h e pa ge. Then , th e specified nu m ber of exter n a l tr ips is sim ply placed
in t h a t field wit h zer os bein g pla ced for t h e va lu es of a ll t h e r em a in in g va r ia bles in t h e file.
By d efa u lt , t h e ou t pu t n a m e for t h e pr edict ed n u m ber of crim es will be ca lled P RE DORIG for
a n origin m odel a n d P RE DDE ST for a dest in a t ion m odel. An exa m ple da t a set is a va ila ble
on t h e Crim eS tat downloa d pa ge.

Note: for a dest in a t ion m odel, t h is “Ma k e pr ediction” oper a t ion is n ot n ecess a r ily
n eed ed if t h e sa m e da t a set is u sed for calibr a t ion a n d p r ediction. Th is s t ep is
pr im a r ily for t h e origin file

B a la n ci ng P re d ic te d Ori gi ns an d De s ti na ti on s

Aft er t h e or igin m od el a n d d es t in a t ion m od el a r e ca libr a t ed an d a pp lied to a da t a set ,


t h e fina l st ep in t r ip gen er a t ion is t o ens u r e t h a t t h e n u m ber of pr edicted origin s equ a ls t h e
n u m ber of pr edict ed des t ina t ion s. This is n ecessa r y for t h e n ext s t a ge of cr ime t r a vel
dem a n d m odelin g - tr ip d ist r ibu t ion. S in ce a t r ip h a s both a n origin a n d a dest in a t ion, t h e
t ota l n u m ber of origin s h a s t o equa l t h e t ota l n u m ber of dest in a t ion s. Th is is a n a bsolu t e
con d it ion for t h e t r ip dis t r ibu t ion m od el t o wor k ; t h e r ou t in e will r et u r n a n er r or m es sa ge if
th e num ber of origins does not equal th e num ber of destinat ions.

If t h e P ois son m odel is u sed for ca libr a t ion , t h e r ou t in e en su r es t h a t t h e n u m ber of


pr edicted t r ips equ a ls t h e n u m ber of in pu t t r ips . Fu r t h er , if t h e calcu la t ion of ext er n a l t r ips
h a s been obt a ined by subt r a ct ing t h e t ot a l nu m ber of pr edict ed origin s from t h e t ot a l nu m ber
of pr edicted dest in a t ion s, a n d if t h e ext er n a l t r ips a r e t h en a dd ed t o th e pr edicted origin s,
t h en m ost likely t h e t ota l n u m ber of origin s w ill equ a l t h e t ota l n u m ber of dest in a t ions .
H owever , becau se of r oun din g-off err ors a n d in con sis t en t ext er n a l t r ip es t im a t es, it is
poss ible t h a t t h e su m s a r e n ot equ a l.

Con sequ en t ly, it is im p or t a n t t o ba la n ce t h e p red ict ed or igin s a n d d es t in a t ion s t o


en su r e t h a t n o pr oblem s will occu r in t h e t r ip dis t r ibu t ion m odel. Th er e a r e t wo wa ys t o do
t h is in Crim eS tat. F ir s t , t h e n u m ber of p r ed ict ed des t in a t ion s is h eld con s t a n t a nd t h e
n u m ber of pr edicte d or igin s is a dju st ed t o ma t ch t h is n u m ber . Th is is t h e defa u lt choice.
Secon d, th e n u m ber of pr edict ed origin s is h eld con st a n t a n d t h e n u m ber of pr edict ed
dest in a t ions is a dju st ed t o ma t ch t h is n u m ber .

Th e ca lcu la t ion is essen t ia lly a m u lt ip lier t h a t is a pplied t o ea ch zon e. If dest in a t ion s


ar e to be held const an t, the m ultiplier is defined as:

M j = G (Cr im es by dest in a t ion s, j) / G (Crimes by origins, I) (13.27)

a n d t h e pr edicte d n u m ber of origin s is m u lt iplie d by M j. If, on t h e oth er h a n d, t h e origin s


ar e to be held const an t, the m ultiplier is defined as:

M i = G (Cr im es by or igin s, I) / G (Cr im es by d es t in a t ion s, j) (13.28)

13.53
a n d t h e pr edicte d n u m ber of dest in a t ions is m u lt iplie d by M i . Th e m u lt iplicat ion sim ply
en su r es t h a t t h e su m s of t h e pr edicted origin s a n d p r edicted dest in a t ion s a r e equ a l.

Th e t h ir d p a ge in t h e t r ip gen er a t ion m odel is t h e “Ba la n ce pr edicted origin s &


dest ina t ion s” pa ge. Figur e 13.11 shows th e set u p for t h is pa ge. The st eps a r e as follows:

1. Th e box is ch ecked in dica t in g t h a t it is a ba la n cin g op er a t ion .

2. Th e pr edicte d or igin file is in pu t a n d t h e pr edicte d or igin va r ia ble is iden t ified.


In t h e exa m ple, t h e pr edict ed or igin file is ca lled “P r edict edOr igin s.d bf” a n d
t h e field wit h t h e pr edicte d n u m ber s w a s ca lled P RE DORIG.

3. Th e pr edicted dest in a t ion file is in pu t a n d t h e pr edicted dest in a t ion va r ia ble is


iden t ified. In t h e exam ple, th e pr edict ed des t ina t ion file is ca lled
“Pr edict edDest ina t ion s.dbf” an d t h e field with t h e pr edict ed n u m bers was
ca lled PRE DDE ST.

Note t h a t t h ese files a r e in pu t on t h is p a ge a n d n ot on t h e pr im a r y or secon da r y file


pages.

4. Next , t h e t yp e of ba la n cin g is specifie d - H oldin g dest in a t ion s const a n t (t h e


defau lt) or h olding or igins con st a n t . In t h e exam ple, th e dest ina t ion s a r e t o be
h eld con st a n t .

5. F in a lly, th e out pu t file is s pecified. If th e origin s a r e t o be adju st ed, t h en only


t h e origin file is s a ved. If th e des t in a t ions a r e t o be a dju st ed, t h en only t h e
dest in a t ion file is sa ved. In ot h er wor ds, t h e a dju st m en t is a pplied t o on ly on e
of t h e t wo pr edict ed crim e files. In t h e exam ple, th e file wa s n a m ed
“Adjust edPredictedOrigins.dbf” (not sh own) since the origin file was adjust ed.

The out put produces a new column with th e adjust ed values. Table 13.9 shows the
origin out put for t he Baltimore data of th e first 11 records. Once the balancing has been
com plet ed, t h e t r ip gen er a t ion m odel is finish ed a n d t h e u ser can go on t o th e t r ip
dis t r ibu t ion m odel. In oth er words , t h e out pu t file en su r es t h a t both t h e pr edicted origin file
(cr ime p r odu ct ion s) an d pr edict ed des t ina t ion file (cr ime a t t r a ct ion s) ar e bala n ced.

S t re n g t h s a n d We a k n e s s e s o f Re g r e s s io n Mo d e l in g o f Tr ip s

As m en t ion ed ea r lier, th e u se of r egres sion for pr odu cing t h e t r ip gener a t ion m odel
h a s it s s t r en gt h s a n d wea k n esses. Th e a dva n t a ges a r e t h a t , fir st , t h e a pp r oach is a pp lica ble
t o cr im e in cid en t s. U n lik e r egu la r t r a vel beh a vior , cr im e t r ip s h a ve t o be in fer r ed fr om
police r eport s; one ca n n ot con du ct a h ous eh old s u r vey of offen der s a sk in g t h em a bout t h eir
cr ime t r a vel. Thu s, st a r t ing with cou n t s of t h e n u m ber of cr imes occu r r ing in ea ch zon e a n d
t h e n u m ber of cr imes t h a t or igina t e fr om ea ch zon e, a m odel ca n be con st r u ct ed.

13.54
Figure 13.11:
Balance Predicted Origins and Destinations Setup
Ta ble 13.9

Ad ju s t e d D a t a S h o u l d Ha v e Th e s e F i e l d s

Secon d, th e u se of a n on -linea r m odel, such a s t h e Poisson , allows m or e com plex


fit t in g of cr im e cou n t s. In t h e ea r ly 1970s wh en t r ip gen er a t ion m od els wer e s ta r t in g t o be
im plem en t ed in Met r opolit a n P la n n in g Or ga n iza t ion s a r ou n d t h e U.S ., t h e m a jor t yp e of
r egr ession m odelin g a va ila ble wa s OLS. At t h a t t im e, r esea r ch er s cou ld n ot dem on st r a t e
t h a t t h is m et h od wa s r elia ble in t er m s of pr edict in g t r a vel; we’ve dis cu ssed t h ose r ea son s
ea r lier in t h is ch a pt er . H owever , wit h t h e a va ila bilit y of s oft wa r e for con du ct in g P ois son a n d
ot h er n on -linea r m odels, th a t cr iticism is no lon ger a pplica ble. The P oisson m odel is very
‘well beh a ved’ wit h r es pect t o coun t da t a . It does n ot p r odu ce nega t ive es t im a t es . It
r equ ires h igh levels of a n ind epen den t var iable t o pr odu ce a sligh t effect in t h e depen den t
va r ia ble, bu t t h a t t h e level in cre a se s a s t h e va lu es of t h e in depen den t va r ia ble in cre a se . It
m a int a ins con st a n cy between t h e su m of t h e inpu t cou n t s a n d t h e su m of t h e pr edict ed
coun ts. Non-linear models ar e much more rea listic for m odeling tr ips th an OLS.

Th ird , th e u se of a m u ltiple r egres sion m odel a llows m u ltiple ind epen den t var iables
t o be inclu ded. In ou r exam ple, th er e wer e six an d five va r iables r espectively in t h e gener a l
or igin a n d dest in a t ion m odels . Tr ip t a bles, on t h e ot h er h a n d, t yp ica lly on ly h a ve t h r ee or
fou r in depen den t pr edict or s; it becom es t oo com plica t ed t o keep t r a ck of m u lt ip le con dit ion s
of p red ict or va r ia bles . Th u s, a m or e com p lex a n d s op his t ica t ed mod el ca n be p rod uced wit h
a r egr es sion fra m ework .

F ou r t h , an d fin a lly, a r egres sion fr a m ewor k a llows for com plex in t er a ct ion s t o be
est im a t ed. F or exa m ple, t h e log of a n in depen den t va r ia ble ca n be defin ed. An in t er a ct ion
bet ween t wo of t h e in depen den t va r ia bles ca n be exa m in ed (e.g., m edia n h ou seh old in com e
for t h os e zon es h a vin g a s izea ble a m ou n t of r et a il em p loym en t ). In t h e t r ip t a ble a p pr oa ch ,
t h ese in t er a ct ion s a r e im plicit in t h e cell m ea n s. Th u s, over a ll, t h e r egr ession fr a m ewor k
a llows for a m or e com plex m odel t h a n is a va ila ble wit h a t r ip t a ble a ppr oa ch .

13.56
On t h e ot h er h a n d, t h er e a r e pot en t ia l p r oblem s a ssocia t ed wit h a r egr ession
fr a m ewor k. F irs t , th e r egres sion coefficient s can be influ en ced by zon e size. Since th e m odel
is est ima t ing differ en ces bet ween zon es (i.e., differ en ces in t h e n u m ber of cr imes a s a
fu n ct ion of differ en ces in t h e valu es of t h e indep en den t var iables), zon e size affect s t h e level
of t h ose differ en ces. Wit h sm a ll zon e sizes, th er e will be su bst a n t ial differ en ces bet ween
zon es in both t h e in depen den t a n d d epen den t va r ia bles . Con ver sely, lar ge zone s izes w ill
m in im ize wit h in -zon e differ en ces, bu t will u su a lly in cr ea se t h e est im a t e of t h e bet ween -zon e
differ en ces. The r esu lt cou ld be an exaggera t ion of t h e effect of a var iable t h a t wou ld n ot be
seen with sm a ll zon e geogra ph y. As we a r gued in ch a pt er 12, on e sh ou ld ch oose t h e sm a llest
zon e geogra ph y th a t is pr a ct ica l in or der t o m inim ize t h is pr oblem.

Secon d, a poin t t h a t h a s been r epea t ed a ga in a n d a ga in , t h ese m odels a r e n ot


beh a viora l expla n a t ions . Th ey r epr es en t ecologica l cor r ela t ions wit h crim e t r ips . It ’s
im por t a n t t o n ot t r y t o con ver t t h ese m odels in t o exp la n a t ion s of offen der beh a vior . Too
oft en , r esea r cher s h a ve jum ped t o conclusion s a bout in dividu a ls ba sed on t h e r ela t ion sh ips
wit h en vir on m en t s a n d n eigh bor h ood s. It is im p or t a n t t o n ot do t h is . Th is cr it icis m ,
in ciden t a lly, ap plies both t o th e t r ip t a ble a s well a s t h e r egr ession a pp r oach t o tr ip
gen er a t ion m odelin g.

Th e n ew gener a t ion of t r a vel dem a n d m odels is sp ecifica lly beha vior a l an d in volves
m odelin g t h e beh a vior of specific in divid u a ls . P r oba bilit ies a r e ca lcu la t ed ba sed on
in divid u a l choice an d a m icro-sim u la t ion r out in e can a pp ly t h es e pr obabilit ies t o a la r ge
m et r opolit a n a r ea (RDC, 1995; P a s, 1996; Recker , 2000; Sh ifton et a l, 2003). While t h is
a ppr oa ch offer s some d efinit e t h eor et ica l ad van t a ges an d is t h e su bject of m u ch cu r r en t
r esea r ch , to dat e t h er e h a s n ot been a dem on st r a t ion t h a t t h is a ppr oa ch is m or e a ccu r a t e a t
pr edictin g t r ips t h a n t h e t r a dit ion t r ip-ba sed t r a vel dem a n d m odel.

Su m m ary

In su m m a r y, t h e t r ip gen er a t ion m odel is a va lu a ble t ool for pr edict in g t h e n u m ber of


cr im es t h a t or igin a t e in ea ch zon e a n d t h e n u m ber of cr im es t h a t en d in ea ch zon e. E ven if
t h e m odel is not beh a vior a l, t h e m odel ca n be st a ble an d u seful for m a n y yea r s in t h e fu t u r e.
I t is bes t t h ou gh t of a s a proxy m od el in wh ich t h e va r ia bles in t h e m odels a r e pr oxies for
con d it ion s t h a t a r e gen er a t in g cr im es , eit h er in t er m s of en vir on m en t s t h a t p r od u ce
offen der s or in t er m s of loca t ion s t h a t a t tr a ct t h em .

In t h e n ext cha pt er , we will exa m in e t h e secon d s t a ge in t h e t r a vel dem a n d m odel -


t r ip d ist r ibu t ion . In t h a t st a ge, t h e pr edicted crim e origin s a n d t h e pr edicted crim e
destinat ions a re link ed to produce crime tr ips.

13.57
En d n ot e s fo r Ch ap te r 13

1. Th er e is a ls o su bject ivit y in su bdivid in g va r ia bles a t a n in divid u a l level. F or


exam ple, househ old incom e levels ca n be su bdivided in differ en t wa ys. However ,
wit h a ggrega t e da t a , all var ia bles h a ve t o be su bdivided a r bit r a r ily wh er ea s wit h
in divid u a l level d a t a , t ypica lly only in com e is don e t h is w a y.

2. Som e st a t is t icia n s often r efer t o t h e n u m ber of para m eters t h a t h a ve to be


est ima t ed in a n equa t ion , not just t h e n u m ber of ind epen den t var iables. In a n OLS
m odel, for exa m ple, t h er e a r e K+1 pa r a m et er s t h a t a r e est im a t ed - coefficien t s for
t h e K in depen den t va r ia bles a n d a con st a n t t er m . In t h is t ext , K r efer s t o t h e
n u m ber of in depen den t va r ia bles , n ot est im a t ed pa r a m et er s.

3. It is possible t o t r a n sfor m t h e in depen den t va r ia ble in t o a n on -lin ea r pr edict or , for


exam ple by ta kin g th e log of t h e indep en den t var iable or r a ising it t o some power
(e.g., X2 ). However , th is won ’t solve th e ot h er pr oblems a ssociat ed wit h OLS,
na mely negative and n on-sum ma tive predictions.

4. F or exa m ple, t o a ccoun t for t h e sk ewed depen den t va r ia ble, on e or m or e of th e


in depen den t va r ia bles h a ve t o be t r a n sfor m ed wit h a n on -lin ea r oper a t or (e.g., log or
expon en t ia l t er m ). When m ore t h a n one in depen den t va r ia ble is n on-linea r in a n
equ a t ion , t h e m odel is n o longer ea sily u n der st ood. It m a y en d u p m a k in g
r ea son a ble pr edict ion s for t h e depen den t va r ia ble, bu t it is n ot in t u it ive a n d n ot
ea sily expla in ed t o non -specia list s.

5. Note, Lu c An selin u ses K for t h e n u m ber of pa r a m et er s (coefficien t s + in t er cept ) in


Appen dix C wher ea s we u se it for t h e n u m ber of ind epen den t var iables. Read er s
sh ou ld be awa r e of t h is differ en ce.

6. In t h e u su a l t r a vel dem a n d m odelin g, on t h e ot h er h a n d, m odeler s u su a lly a dju st


th e predicted destinat ions since the origin data is more reliable. These nu mbers a re
obt a in ed fr om t h e cen su s or fr om t h e s am p le of h ou seh old s wh o a r e in t er viewed to
produce a sam ple from which da ta on destina tions a re obtained.

7. An a lt er n a t ive m igh t be t o u s e cor d on cou n t s fr om m a jor h igh wa ys com in g in t o t h e


r egion a n d a ss u m e t h a t crim e t r ips r epr es en t a con st a n t pr oport ion of th ose t r ips .
Th u s, if t h e t ot a l n u m ber of es t im a t ed ext er n a l h igh wa y t r ip s in cr ea ses by 5%, on e
cou ld a ss u m e t h a t t h e ext er n a l t r ips a lso increa se by 5%. Wh ile t h is is pla u sible, it
is n ot n ecess a r ily an a ccu r a t e es t im a t e. Ta lk t o your Met r opolit a n P la n n in g
Or ga n iza t ion or t h e St a t e Depa r t m en t of Tr a n sp ort a t ion if you a r e in t er est ed in
developin g t h is t ype of model a s you will n eed t h eir est im a t es of ext er n a l t r ips .

13.58

You might also like