Professional Documents
Culture Documents
CrimeStatChapter 13 PDF
CrimeStatChapter 13 PDF
Trip Generation
B ac k gro u n d
Mo d e l i n g Tr ip Ge n e r a ti o n
13.1
Tr ip P u r p o s e
Trip genera tion m odeling sta rt s with th e reasons behind tr avel. At a n individua l
level, people m a k e t r ip s for a r ea son - t o go t o wor k , t o go sh oppin g, t o go t o a m edica l
a ppoint m en t , to go for r ecr ea t ion , or , in t h e ca se of offen der s, t o com m it a cr ime. Th ese a r e
called trip pu rposes. Since t h er e a r e a very lar ge nu m ber of t r ip pu r poses, usu a lly t h ese
a r e ca t egor ized in t o a few m a jor groupin gs. In t h e ca se of t h e u su a l tr a vel dem a n d
for eca st in g, t h e d is t in ct ion s a r e h om e-to/ from -w ork (or h ome-based work tr ips), h om e-
to/ from -n on -w ork (or h ome -ba se d n on-wor k t r ips , e.g., sh oppin g), an d a n on -h om e trip
wh er e n eit h er t h e or igin n or t h e dest in a t ion a r e a t t h e t r a veler ’s r esid en ce loca t ion (n on -
h ome -ba se d t r ips ).
Cr im e T ri p Gr o u p i n g s
F or cr ime t r ips, however, th ese dist inction s a r e n ot very m ea n ingfu l. Ther e is very
lit t le in for m a t ion on h ow offen der s m a k e t r ip s. On e ca n n ot ju st t a k e a sa m ple of offen der s
a n d a sk t h em t o comp let e a t r a vel dia r y a bout h ow, wh en , a n d w h er e t h e t r ip t ook pla ce.
Wit h a r r est ed offen der s, it m igh t be possible t o pr odu ce su ch a dia r y, bu t bot h m em or y
pr oblem s a s well a s lega l con cer n s qu ick ly m a k e t h is a n u n r elia ble sou r ce of in for m a t ion .
Th er efor e, a s in dica t ed in cha pt er 11, a decision h a s been m a de t o refer en ce all t r ips wit h
r espect t o t h e r esiden t ial h om e loca t ion . All cr ime t r ips a r e a n a lyzed as h om e-crim e tr ips.
13.2
a ggr egat e a ppr oa ch m odels t h is a s t h r ee sepa r a t e t r ips, ra t h er t h a n a s a ser ies of t h r ee
lin k ed crim e t r ips (which t h e a ctivit y-ba sed m odels d o). Th is is a deficien cy wit h t h e
a ggrega t e t r a vel dem a n d m odel. In ord er t o ma k e t h e a ggrega t e m odels w ork , ea ch t r ip is
con sider ed ind epen den t of a n y ot h er t r ip. Wh ile t h is is not r ea listic beh a vior a lly, since we
k n ow t h a t m a n y crim es a r e com m it t ed in sequ en ce as p a r t of a sin gle jour n ey (or t our ), th e
zona l a pp r oach does lim it t h e u n der lyin g logic of crim e t r ips . Never t h eles s, t h e a ggr ega t e
a ppr oa ch ca n be very us efu l as lon g as it imp lemen t ed con sist en t ly. With t h e cu r r en t st a t e
of a ct ivit y-ba sed m odelin g, t h er e is n ot yet a n y eviden ce t h a t t h ey p r odu ce m or e a ccu r a t e
pr edict ion s t h a n t h e cr u der , a ggr ega t e a ppr oa ch (FH WA, 2001a ).
Correlate s of Crime
As ment ioned in cha pter 12, th ere ar e a nu mber of different var iables tha t could be
u sed for pr edict ion , a lt h ou gh popu la t ion (or a pr oxy for popu la t ion , s u ch a s h ou seh olds),
in com e or pover t y, a n d la n d u se va r ia bles wou ld be t h e m ost comm on (NCH RP , 1998).
13.3
S p u r iou s cor r ela tes
In ot h er wor ds, wh a t is r equ ir ed for t h e m odel is a set of r ea son a ble cor r ela t es of
cr im e t r ip s t h a t wou ld be pla u sible a n d st a ble over t im e. It is a n ecologica l m odel, n ot a
beh a vior a l on e.
Mul ti coli n ea r it y a m on g t h e in d ep en d en t v a r ia bl es
There ar e two sta tistical problems associated with using these var iables as
pr edictor s. Th e first is t h e h igh degr ee of over la p bet ween t h e va r ia bles . Zones t h a t h a ve
h igh pover t y levels t yp ica lly a ls o h a ve low h ou seh old in com e levels , h igh er popu la t ion
den sit ies, s u bst a n da r d h ou sin g, a h igh per cen t a ge of r en t er s, a n d h igh er pr opor t ion of
minority and imm igra nt populat ions. In a r egression m odel, th is overlap cau ses a
condition kn own as m ulticolinearity. Essent ially, th e independent var iables corr elat e so
h ighly am ong t h em selves t h a t t h ey pr odu ce am bigu ous , an d s omet im es st r a n ge, r esu lt s in
a r egres sion m odel. For exam ple, if t wo ind epen den t var iables a r e h igh ly cor r elat ed,
fr equ en t ly on e will h a ve a positive coefficient with t h e depen den t var iable wh ile t h e ot h er
13.4
will h a ve a n ega t ive coefficien t ; conver se ly, t h ey some t im es can can cel ea ch ot h er out .
Th u s, in sp it e of t h e cor r ela t es wit h crim e levels , in a m odel it is u su a lly best t o elimin a t e
co-lin ear var iables. The result is th at simple var iables usua lly end up being the m ost
st r a igh t for wa r d t o u se (popu lat ion , med ian h ou seh old incom e) with m a n y of t h e su bt le, but
t h eor et ica lly r eleva n t , va r ia bles t yp ica lly d r oppin g ou t of t h e equ a t ion .
Fa i l u r e t o d i s ti n gu i sh or i gi n s fr om d es ti n a t i on s
Th e st r a t egy, th er efor e, is t o obt a in a m odel t h a t ba lan ces h igh pr edict a bilit y but by
k eep in g t h e t ota l pr ediction er r or low.
Co u n t Mo de l
13.5
con s t r u ct ed t o p r ed ict r a t es , bu t n or m a lly it is n ot d on e. F or m os t t r a vel d em a n d m od elin g,
a s m en t ion ed in cha pt er 11, th e m odel pr edict s t h e n u m ber of t r ips origin a t in g or en din g in
a zon e. Th u s , t h er e is a crim e prod u ction m odel t h a t pr edict s t h e n u m ber of cr imes
or igin a t in g in ea ch zon e a n d a crim e attraction model th at predicts t he nu mber crimes
Trip Tab le s
There ar e two classic appr oaches to tr ip genera tion m odeling. The first u ses a trip
table (som et im es called a cross -cla ss ificat ion t a ble or a cat egory a n a lysis ). A t r ip t a ble is a
cross-class ifica t ion m a t r ix. Sever a l pr edictive va r ia bles a r e divided in t o ca t egories (e.g.,
t h r ee level of h ou seh old in com e; fou r levels of veh icle own er sh ip ; t h r ee levels of popu la t ion
den sit y) an d a m ea n n u m ber of t r ips is est im a t ed, u su a lly from a su r vey. F or exa m ple , a
su r vey of h ou seh old incom e m igh t sh ow t h e r elat ion sh ip betw een h ou seh old incom e a n d
t h e n u m ber of t r ips t a ken by in dividu a ls of t h e h ou seh olds. Bas ed on a sa m ple, estim a t es
of t h e average n u m ber of trips per person can be obt a in ed for ea ch in com e level (e.g., 3.4
t r ip s per da y for per son s fr om low in com e h ou seh old s; 4.5 t r ip s per da y for per son s fr om
m edia n in com e h ou seh olds; 6.7 t r ip s per da y for per son s fr om h igh in com e h ou seh olds).
Th ese var iables a r e fu r t h er su bdivided in t o t wo-wa y or t h r ee-wa y cr oss-ta bu lat ion t a bles
(e.g., low in com e a n d m ediu m veh icle own er sh ip; low in com e a n d h igh veh icle own er sh ip).
Table 13.1 illust ra tes a possible t r ip t a ble model in volving t wo var iables. In pr a ct ice, th r ee
or four var iables are used.
13.6
su ch a n a ppr oa ch r equ ires int er pr et a t ion a n d some d egree of a r bitr a r ines s. For exa m ple,
h ow does one su bdivide m edia n h ous eh old in com e? On e per son m igh t in t er pr et it sligh t ly
Ta ble 13.1
differ en t ly t h a n a n ot h er ; u n lik e sim ple n u m er ica l cou n t s (e.g., 0 veh icle own er sh ip ; 1
veh icle own er sh ip; 2 veh icle own er sh ip), th er e is t oo m u ch va r ia bilit y in cat egorizin g
va r ia bles a t t h e zona l level. 1
OLS Re g re s si on Mo de li ng
13.7
it (th e coefficien t s, $). The pr odu ct of th e va lu e of t h e in depen den t va r ia ble t im es it s w eigh t
represent s its effect. The ind ividu a l effect s of ea ch of t h e K indep en den t var iables a r e
su m m ed t o pr odu ce an overa ll est im a t e of t h e depen den t va r ia ble, Y.
S k e w n e ss of c r im e ev en t s
Aggr egat ing cr imes int o zon es t en ds t o r edu ce som e of t h e sk ewn ess. F or exam ple,
gr ou pin g t h e cr im es by or igin t r a ffic a n a lysis zon e (TAZ) r edu ced it a lit t le bit . N in et een of
t h e 525 or igin zon es in Ba lt im or e Cou n t y a n d Ba lt im or e Cit y d id n ot h a ve a n y cr im es occu r
in t h em wh ile 15 zon es h a d on ly on e cr im e occu r . Six zon es h a d t wo cr im es or igin a t e fr om
t h em wh ile 8 zon es h a d t h r ee cr imes or igina t e fr om t h em . At t h e ot h er en d, 1 zon e h a d 738
crim es origin a t e from it a n d a n oth er zon e h a d 53 3 origin a t e from it . Of th e 525 origin
zon es, 155 h a d 100 or m or e cr im e even t s. S im ila r r esu lt s a r e fou n d for t h e dest in a t ion
zon es. Figur e 13.2 gr a ph s t h e dist r ibut ion of or igins a n d des t ina t ion s by TAZ’s in bins of 50
in cid en t s ea ch .
Skewness in t he dependent var iable usua lly ma kes th e final model biased an d
u n r elia ble. P a r t icula r ly if th e sk ewn es s is posit ive (i.e., a h a n dfu l of cas es h a ve ver y la r ge
valu es), t h e r esu ltin g regr ession coefficient s will r eflect t h e ca ses wit h t h e h igh est valu es
r a t h er t h a n r epr esen t a ll t h e ca ses wit h a ppr oxim a t ely equa l weigh t s. Thes e so-ca lled
‘ou t liers’ ca n overwh elm a r egres sion equa t ion . In a n extr em e ca se, a ver y la r ge ou t lier m a y
t ota lly det er m in e t h e m odel. For exa m ple, an exper im en t wit h 100 ca ses wa s cr ea t ed wit h a
13.8
Figure 13.1:
8000
Number of locations
6000
4000
2000
0
0 5 10 15 20 25 30+
Number of incidents
Figure 13.2:
125
100
Number of TAZs
75
50
25
0
0 100 200 300 400 500 600 700
Origins Destinations
progressing dependent var iable and a r a n d o m ind epen den t var iable (i.e., t h e indep en den t
var iable h a d it s valu e select ed r a n domly). The depen den t var iable pr ogres sed from 1 t o
100. F or t h e firs t 99 cases, t h e in depen den t va r ia ble t ook va lu es from 0.12 t o 9.9, ra n dom ly
a ss ign ed. Th e cor r ela t ion bet ween t h es e t wo var ia bles for t h e first 49 ca se s w a s 0.04.
H owever, for t h e 100 t h ca se, th e indep en den t var iable wa s given a valu e of 100. Th e
cor r ela t ion bet ween t h e t wo var ia bles n ow sh ot u p t o 0.17. Even t h ough t h e F -test for t h is
wa s n ot s ignifican t , it r epr esen t ed a sizea ble ju m p. Repla cing one oth er in depen den t va lu e
wit h a 50 ca u se d t h e cor r ela t ion t o ju m p t o 0.23, wh ich w a s s t a t ist ically s ign ifica n t . In
ot h er wor ds , t wo ou t lier s ca u sed a r a n dom s er ies to a pp ea r sign ifica n t !
N eg a t i ve p r e d i ct i on s
A s econ d pr oblem wit h OLS is t h a t it ca n h a ve n ega t ive p r ed ict ion s . Wit h a cou n t
va r ia ble, su ch a s t h e n u m ber of crim es origin a t in g or en din g in a zon e, t h e m in im u m
nu mber is zero. That is, th e coun t var iable is always positive, bein g bou n ded by 0 on t h e
lower lim it a n d some la r ge nu m ber on t h e u pper lim it. Th e OLS m odel, on t h e ot h er h a n d,
can produce negat ive predicted values since it is additive in t he independent var iables.
Th is clea r ly is illogica l a n d is a m a jor pr oblem wit h da t a t h a t a r e ver y s k ewed. If t h e m ost
com m on va lu e is close t o zero, it is ver y poss ible for a n OLS m odel t o pr edict a n ega t ive
cou n t .
13.11
N on -l i n ea r e ffec t s
G r ea t e r r es id u a l er r or s
The final problem with an OLS model an d a skewed dependent var iable is th at th e
m odel t en ds t o over - or u n der -pr edict t h e cor r ect va lu es, bu t r a r ely com es u p wit h t h e
cor r ect est im a t e. Wit h sk ewed da t a , t yp ica lly a n OLS equ a t ion pr odu ces n on -con st a n t
r es idu a l er r ors . Th a t is, on e of t h e m a jor a ss u m pt ions of t h e OLS m odel is t h a t a ll r eleva n t
va r ia bles h a ve been in clud ed. If th a t is t h e cas e, t h en t h e er r ors in pr ediction (th e r esidu a l
er r or s - th e differ en ce bet ween t h e obser ved an d pr edict ed valu es) should be u n cor r elat ed
with t h e pr edict ed valu e of t h e depen den t var iable. Violat ion of t h is con dit ion is ca lled
heteroscedasticity beca u se it in dica t es t h a t t h e r esid u a l va r ia n ce is n ot con st a n t . Th e m ost
com m on t yp e is an in cr ea se in t h e r es id ua l er r or s wit h h igh er va lu es of t h e p red ict ed
depen den t va r ia ble. Th a t is , t h e r esid u a l er r or s a r e gr ea t er a t t h e h igh er va lu es of th e
pr edict ed depen den t va r ia ble t h a n a t lower va lu es (Dr a per a n d Sm it h , 1981, 147).
A h igh ly sk ewed dist r ibut ion t en ds t o en cou r a ge th is. Beca u se t h e leas t squ a r es
procedure m inimizes the sum of th e squa red residuals, th e regression line balances the
lower r esidu a ls wit h t h e h igher r esidu a ls. Th e r esu lt is a r egr ession line t h a t n eit h er fit s
t h e low valu es or t h e h igh valu es. For exa m ple, motor vehicle cr a sh es t en d t o con cen t r a t e
a t a few locat ion s (cra sh h ot s pot s). In est im a t in g t h e r ela t ion sh ip bet ween t r a ffic volu m e
a n d cr a sh es, t h e h ot s pot s t en d t o un du ly influen ce th e r egr ession line. Th e r esu lt is a line
t h a t n eit h er fit s t h e n u m ber of expect ed cr a sh es a t m ost loca t ion s (wh ich is low) n or t h e
n u m ber of expect ed cra sh es a t t h e h ot spot loca t ion s (which a r e h igh ). The line en ds u p
over -es t im a t in g t h e n u m ber of cr a sh es for m os t loca t ion s a n d u n der -es t im a t in g t h e n u m ber
of cra shes at th e hot spot locat ions.
P o is so n Re g re s si on Mo de li ng
e -8 8Yi
P r ob (Yi ) = ------------ (13.3)
Yi !
13.12
wh er e Yi is t h e cou n t for one gr oup or cla ss , i, 8 is th e mean coun t over all groups, and e is
t h e ba se of t h e n a t u r a l logar it h m . Th e dist r ibu t ion h a s a sin gle pa r a m et er , 8, wh ich is bot h
t h e m ea n a n d t h e va r ia n ce of th e fun ction .
Xki $
E (Yi / Xk i ) = 8i = e (13.4)
Ln (8i ) = Xk i $ (13.5)
Note, tha t in t his form ulat ion, ther e is not a ra ndom err or t erm . The data ar e
a ssu m ed t o r eflect t h e Poisson m odel. Ther e ca n be “r esidu a l err or s”, but t h ese a r e
a ss u m ed t o reflect a n in com ple t e specificat ion (i.e., n ot in clud in g a ll t h e r eleva n t va r ia bles .
Als o, s in ce t h e va r ia n ce equ a ls t h e m ea n , it is exp ect ed t h a t t h e r es id u a l er r or s sh ou ld
in cr ea se wit h t h e con dit ion a l m ea n . Th a t is , t h er e is in h er en t h et er osceda st icit y (Ca m er on
a n d Tr ivedi, 1998). Th is is ver y d iffer en t t h a n a n OLS wh er e t h e r esid u a l er r or s a r e
expect ed t o be con st a n t .
13.13
Figure 13.3:
Poisson Distribution
For Different Expected Means
0.5
0.3 E(Y) = 1
E(Y) = 2
0.2 E(Y) = 3
0.1 E(Y) = 4
0.0
0
12
10
14
Count
Ad v a n t a g e s o f t h e P o i s s o n R e g re s s i o n Mo d e l
In sh or t , th e Poisson m odel ha s some d esir a ble st a t ist ica l pr oper t ies t h a t m a ke it very
u se ful for pr edictin g crim e in ciden t s (origin s or dest in a t ions ).
P r ob le m s w i th th e P o is so n Re g re s si on Mo de l
On th e oth er ha nd, the Poisson m odel is not perfect. The prima ry problem is tha t
cou n t d at a a r e u s u a lly over-d ispersed .
O v er -d i s p er si on i n th e r es id u a l er r or s
In t h e P ois son dis t r ibu t ion , t h e m ea n equ a ls t h e va r ia n ce. In a P ois son r egr ession
m odel, th e m a t h em a t ica l fu n ct ion , th er efor e, equa t es t h e con dit ion a l mea n (t h e m ea n
con t r ollin g for a ll t h e pr edict or var iables) wit h t h e con dit ion a l va r ian ce. However , most r ea l
da t a ar e over-disper sed; th e var ian ce is gener a lly gr eat er t h a n t h e mea n . Figu r e 13.4 sh ows
t h e dis t r ibu t ion of Ba lt im or e Cou n t y a n d Ba lt im or e Cit y cr im e or igin s a n d Ba lt im or e Cou n t y
cr ime d est ina t ion s by TAZ (r epea t of figu r e 13.2) a n d a lso ind ica t es t h e var ian ce-t o-m ea n
r a t io of ea ch va r ia ble. F or t h e origin dis t r ibu t ion , t h e r a t io of t h e va r ia n ce to t h e m ea n is
14.7; t h a t is , t h e va r ia n ce is 14.7 t im es t h a t of t h e m ea n ! F or t h e d es t in a t ion d is t r ibu t ion ,
t h e r a t io is 401.5!
13.15
Figure 13.4: Skewness in Crime Origins and Destinations:
Baltimore County, MD 1993-97
Over-dispersion
125
100
Number of TAZs
75
50
25
0
0 100 200 300 400 500 600 700
Origins Destinations
Origins:
Mean = 75.8
Variance = 7848.8
Ratio of variance to mean = 14.7
Destinations:
Mean = 129.1
Variance = 51,849.1
Ratio of variance to mean = 401.5
a n d pr edict ed valu es for ea ch zon e, will be grea t er t h a n wh a t is expect ed. The P oisson m odel
ca lcu la t es a s ta n da r d er r or a s if t h e va r ia n ce equ a ls th e m ea n . Th u s, t h e s ta n da r d er r or will
be u n der est im a t ed u sin g a P ois son m odel a n d, t h er efor e, t h e sign ifica n ce t est s (t h e
coefficien t divided by t h e st a n da r d er r or) will be gr ea t er t h a n it r ea lly sh ould be. Th is would
h a ve t h e effect of iden t ifyin g var ia bles a s bein g m ore s t a t ist ically sign ifican t in a m odel t h a n
wh a t t h ey act u a lly should be. In oth er words , in a P oisson m u ltiple regress ion m odel, we
wou ld en d u p s elect in g va r ia bles t h a t r ea lly sh ould n ot be select ed beca u se we t h in k t h ey a r e
st a t is t ica lly s ign ifica n t wh en , in fa ct , t h ey a r e n ot .
Anoth er problem with t he Poisson, which is tru e for m ost of th e comm on r egression
m et h ods, is t h e la ck of a sp a t ia l pr edictor com pon en t . As m en t ion ed in cha pt er 12, in t h e
cr im e t r a vel d em a n d m odel, spa t ia l in t er a ct ion is h a n dled du r in g t h e second st a ge of th e
m odel - t r ip d ist r ibu t ion. Th u s, a n y er r ors in t r odu ced in t h e first st a ge - tr ip gen er a t ion, a r e
u su a lly comp en sa t ed for du r in g t h e secon d. N ever t h eles s, t h e in clus ion of a sp a t ia l
com pon en t in a r egr ession m odel wou ld gen er a lly im pr ove t h e pr edict ion . F or t h is ver sion of
Crim eS tat, n on -s pa t ia l m et h od s a r e u s ed for t h e fir s t st a ge.
D is pe rs io n Co rre c ti on P a ra m e te r
Th er e a r e a n u m ber of m et h ods for cor r ect in g t h e over -disper sion in a cou n t m odel.
Most of t h em in volve m odifyin g t h e a ssu m pt ion of t h e condit ion a l va r ia n ce equ a l t o t h e
con dit ion a l m ea n . F or exa m ple, t h e n ega t ive bin om ia l m od el a ss um es a P ois son m ea n bu t a
ga m m a -dist r ibu t ed va r ia n ce ter m (Cam er on a n d Tr ivedi, 1998, 62-63; Ven a bles a n d Riple y,
1997, 242-245). Th a t is , t h er e is a n u n obs er ved va r ia ble t h a t a ffect s t h e dis t r ibu t ion of th e
cou n t . Th e m odel is t h en of a P ois son m ea n bu t wit h a ‘lon ger t a il’ va r ia n ce fu n ct ion . As
a n oth er exa m ple, t h e zer o-in flat ed P oiss on m odel a ss u m es a P oiss on fun ction com bin ed wit h
a degener a t e fu n ct ion with a pr oba bilit y of 1 for zero cou n t s (Ha ll, 2000). Such m ixed
fu n ct ion m od els a r e a cu r r en t t op ic of r es ea r ch . In gen er a l, t h ou gh , t h ey a r e com p lica t ed a n d
require estima ting several par am eters.
Th er e is a sim ple cor r ect ion for over -dis per sion t h a t u su a lly wor k s (Ca m er on a n d
Tr ivedi, 1998, 63-65). The m odel pr oceed s in t wo st eps. In t h e firs t , t h e P oiss on m odel is
fit t ed t o t h e da t a a n d t h e degr ee of over - (or u n der -) dis per sion is est im a t ed. Th e dis per sion
par am eter is defined as:
1 (Yi - P i )2
M = ----------- G {----------------} (13.7)
N - K -1 Pi
13.17
In t h e second st ep, t h e P ois son st a n da r d er r or is m u lt ip lied by t h e squ a r e r oot of th e
disper sion pa r a m et er t o pr odu ce a n ad ju sted stan d ard error:
SE a d j = SE * SQRT[ M ] (13.8)
D i a g n o st i c T ests
Ske w ne ss Tests
n n
n ( Xi - X)2
13.18
An a pp roxim a t e Z-t es t ca n be obt a in ed fr om :
g
Z(g) = ----------- (13.12)
SE S
E [ ( X - Mea n X)/s ]
i
3
= 898.391
I=1
Th er efor e,
325
g = --------------- * 898.391 = 2.79
324*323
Z(g) = 20.51
Li k e li h o o d R a ti o Te s t
(Yi - Xk i $k )2
L = - (N/2) ln (2 B) - (N/2) ln (F ) - (½ F ) - (½) [-------------]
2 2
(13.13)
F2
13.19
wh er e N is t h e sa m ple size, F2 is t h e va r ia n ce, Yi is the observed nu mber of event s for zone I,
an d Xk i $k is a s er ies of K ind epen den t pr edict or s m u ltiplied by th eir coefficient s.
wh er e 8i is t h e con dit iona l m ea n for zone I, Yi is t h e obser ved n u m ber of even t s for zon e ii,
an d Yi Xk i $k is a cr oss-pr odu ct of t h e obs er ved even t s t im es t h e K in depen den t pr edict or s
m u ltiplied by th eir coefficient s. As m en t ion ed a bove, Luc An selin pr ovides a m or e det a iled
discus sion of t h ese fun ct ion s in App en dix C.
LR = 2(L - L R ) (13.16)
A d j u s t ed l i k e l i h oo d r a t i o
13.20
Th es e t wo m ea su r es a dju st t h e log-lik elih ood for degr ees of freedom, a n d flip t h e sign
ar oun d. The model with th e highest AIC or SC values ar e ‘best’.
R -s q u a r e Te s t
G (Yi - P i )2
R2 = 1 - -------------------- (13.19)
G (Yi - Mea n Y)2
F or a n OLS m odel, R-squ a r e is a ver y con sist en t est ima t e. It in cr ea ses in a lin ea r
m a n n er wit h pr edicta bilit y a n d is , t h er efor e, a good indica t or of h ow effect ive one m odel is
com p ar ed to a n ot h er . As wit h a ll d ia gn os t ic t es t s, t h e va lu e of t h e R-s qu a r e in cr ea ses wit h
m or e in depen den t va r ia bles. Con sequ en t ly, R-squ a r e is u su a lly a dju st ed for degr ees of
fr eedom :
[G (Yi - P i )2 ] / (N-K+1)
2
Ra = 1- ------------------------------- (13.20)
G (Yi - Mea n Y)2 / (N - 1)
where N is the sam ple size an d K is th e num ber of independent var iables.
G (Yi - Mea n Y)2 = G(Yi - P i)2 + G(P i - Mea n Y)2 + 2 G(Yi - P i)(P i - Mea n Y) (13.21)
13.21
a n d oth er n on -linea r r egres sion m et h ods, t h e las t t er m is not zer o. Con sequ en t ly, a t est t h a t
com p a r es t h e exp la in ed su m of s qu a r es t o t h e t ot a l s u m of s qu a r es will n ot p r od u ce
con sis t en t r es ult s .
D is pe rs io n P a ra m e te r
F ina lly, in t h e Poisson m odel on ly, th e disper sion pa r a m et er ind ica t es t h e exten t t o
wh ich t h e var ian ce is differ en t fr om t h e m ea n . This wa s defined in equa t ion 13.7 above.
13.22
t h e pa r t icula r in depen den t va r ia ble t o th e va r ia n ce of th e depen den t va r ia ble.
Th e squ a r e r oot of t h a t var ian ce is th e stan d ard error.
13.23
a lrea dy in t h e equa t ion . A r elat ively sim ple t est for a sses sin g th is is ca lled t olera n ce.
Tolera nce is defined as lack of predictability of ea ch ind epen den t var iable by th e ot h er
in depen den t va r ia bles, or :
F i x e d Mo d e l v s . S t e p w i s e Va r ia b le S e l e c t i o n
A back w ard elim ina tion procedure work s in r everse. All independent var iables are
in it ia lly a dd ed t o th e equ a t ion. Th e va r ia ble w it h t h e we a k es t coefficien t (as define d by t h e
sign ifican ce level) is r em oved, an d t h e m odel is r e-est im a t ed. Next , t h e va r ia ble wit h t h e
wea k est coefficien t in t h e secon d m odel is r em oved, an d t h e m odel is r e-est im a t ed. Th is
pr ocedu r e is rep ea t ed u n t il eith er t h er e a r e n o m or e indep en den t var iables left in t h e m odel
or else a s t oppin g cr iter ion is m et . The u su a l cr iter ion is t h a t a ll r em a inin g va r iables pa ss a
cer t a in s ign ifica n ce level (ca lled a p-to-rem ove).
Th er e a r e com bin a t ion s of t h ese pr ocedu r es, for exa m ple a dd in g a va r ia ble in a
for wa r d s elect ion m a n n er bu t t h en r em ovin g a n y va r ia bles th a t a r e n o lon ger sign ifica n t or
u sin g a ba ckwa r d elim in a t ion pr ocedu r e bu t a llowin g new va r ia bles t o ent er t h e m odel if
t h ey s u dden ly becom e sign ifica n t .
13.24
Th er e a r e a dva n t a ges t o ea ch a ppr oa ch . A fixed m odel a llows sp ecified var iables t o be
in clu d ed . If eit h er t h eor y or p r eviou s r es ea r ch h a s in d ica t ed t h a t a p a r t icu la r com bin a t ion of
var iables is im port a n t , th en t h e fixed m odel a llows t h a t t o be tes t ed. A st epwise pr ocedu r e
m igh t dr op on e of t h ose va r ia bles. On t h e ot h er h a n d, a st epwis e pr ocedu r e u su a lly ca n
obta in t h e sa m e or h igher pr edicta bilit y t h a n a fixed p r ocedu r e (wh et h er pr edicta bilit y is
m ea su r ed by a log-lik elih ood or a n R-squ a r e).
In fut ur e versions of Crim eS tat, s ever a l of t h ese m et h ods will be in t r odu ced. F or t h e
t ime bein g, t h ou gh, t h e Poisson m odel is ava ilable as it is t h e m ost com m on ly u sed
fu n ct ion a l m od el for fit t in g cou n t d at a .
Ad d in g S pe c ia l Ge n e ra to rs
13.25
m odelin g, st a diu m s, a ir por t s, t r a in st a t ion s, la r ge pa r k s, a n d ‘m ega -m a lls’ gen er a t e m or e
t h a n t h eir sh a r e of t r ip s, or a t lea st t h a n wh a t wou ld be pr edict ed by t h e a m ou n t of
em p loym en t a t t h os e loca t ion s . Th ey a r e u s u a lly a t t r a ct or s , n ot p r od u cer s . In a n or m a l
t r a n spor t a t ion t r a vel d em a n d m odel, t h ese zon es a r e exclu ded fr om t h e cr oss-cla ssifica t ion
a n d in depen den t est im a t es a r e m a de of t h em .
Th er efor e, a u ser sh ould be cau t iou s in int r odu cing special gen er a t ors . It is gen er a lly
a lr igh t t o in t r odu ce a few for t h e t r u ly except ion a l zon es. Th ese a r e zon es wh er e it is logica l
t o tr ea t t h em a s s pecial gen er a t ors a n d w h er e one wou ld expect cont in u it y over t im e. In
oth er wor ds , t h ey sh ould be u se d if t h e special gen er a t or s t a t u s is expect ed t o las t over t im e.
F or exa m ple, a st a diu m or a n a ir por t or a t r a in st a t ion is lia ble t o r em a in a t it s loca t ion for
m a n y yea r s. A pa r t icu la r sh oppin g m a ll, on t h e ot h er h a n d, m a y a t t r a ct cr im es a t on e
pa r t icu la r poin t in t im e bu t n ot n ecessa r ily in t h e fu t u r e. U n less it is a m a ll t h a t is so m u ch
la r ger t h a n a n y ot h er m a ll in t h e r egion (a ‘m ega -m a ll’), it s h ou ld n ’t be given a sp ecia l
gen er a t or s t a t u s.
Ad din g Ex te rn a l Trip s
13.26
t h e r egion. Est im a t es of t h e t r a vel on t h es e cor r idor s a r e obt a in ed by cord on cou n ts, cou n t s
of veh icles comin g in t o t h e r egion a n d lea vin g t h e r egion (n et in flow). E st im a t es of fut u r e
gr owt h of t h os e ext er n a l t r ip s h a s t o ba s ed on exp ect a t ion s of fu t u r e p op u la t ion gr owt h t h e
met ropolita n r egion a nd in near by regions.
For crime tr ips, extern al tr ips ar e defined as tr ips th at originat e out side the stu dy
a r ea . But t h ey mu st be est ima t ed by th e differ en ce bet ween t h e t ot a l nu m ber of cr imes
occur ring in t he destinat ion st udy area an d th e tota l originat ing in t he origin zones. That is,
of a ll t h e cr im es occu r r in g in t h e s t u dy a r ea , t h e or igin zon es a r e m od eled . Th os e t r ip s t h a t
or igin a t e fr om ou t s id e t h e or igin zon es a r e ext er n a l t r ip s. Th ey m u s t be a d ded t o t h e
pr edict ed n u m ber of or igin t r ip s t o pr odu ce a n a dju st ed est im a t e of t ot a l or igin s, or :
Oj = Opi + Oe (13.24)
In oth er words , for t h e pr odu ction (or igin) model on ly , we add a n exter n a l zon e t o
a ccoun t for crim e t r ips t h a t origin a t ed out sid e t h e m odeled r egion. If we d on’t do t h a t , in t h e
ba lan cing st ep, we’ll overes t ima t e t h e n u m ber of cr imes or igina t ing in ea ch zon e beca u se t h e
pr edicted origin s will be m u lt iplied by a fact or t o ens u r e t h a t t h e t ota l n u m ber of origin s
equa ls th e tota l nu mber of destinat ions.
B a l a n c i n g P r e d i c t e d O ri g i n s a n d P r e d i c t e d D e s t i n a t io n s
Th e t r ip gen er a t ion ‘m odel’ is a ctu a lly two sep a r a t e m odels: 1) a m odel of t r ips
pr odu ced by ever y zon e a n d 2) a m odel of t r ips a t t r a cted t o every zon e. Sin ce a t r ip h a s a n
or igin a n d a d es t in a t ion (by d efin it ion ), t h en t h e t ot a l n u m ber of p r od u ct ion s m u st equ a l t h e
tota l nu mber of at tr actions:
n n
GO i = GD j (13.25)
I=1 j=1
13.27
n u m ber of pr edict ed dest in a t ion s in ea ch dest in a t ion zon e by a const a n t wh ich is t h e r a t io of
eith er th e tota l destinat ions t o th e tota l origins (to mu ltiply th e num ber of predicted origins)
or t h e r a t io of t h e t ot a l or igin s t o t h e t ot a l d es t in a t ion s (t o m u lt ip ly t h e n u m ber of p red ict ed
destinat ions).
With cr ime a n a lysis, t h e n u m ber of dest ina t ion s would gener a lly be con sider ed a
m or e r eliable da t a set t h a n t h e n u m ber of or igins . Beca u se crim es a r e enu m er a t ed wh er e
t h ey occu r , th e n u m ber of cr imes occu r r ing a t a n y on e loca t ion is m or e a ccu r a t e t h a n t h e
loca t ion of t h e offen d er s . Th u s , we a d ju s t t h e p r ed ict ed or igin s so t h a t t h ey equ a l t h e
predicted destinat ions. 6
S u m m a ry o f t h e Tr ip Ge n e r a t io n Mo d e l
Th e Cr i m eS t a t Tr ip Ge n e r a t io n Mo d e l
1. Ca libra t e t h e m odel
2. Ma k e a pr ediction
3. Ba la n ce t h e pr edict ed or igin s a n d t h e pr edict ed dest in a t ion s
F igur e 13.5 s h ows a n im a ge of t h e t r ip gen er a t ion m odel pa ge wit h in Crim eS tat. The
t r ip gen er a t ion m odel is m a de u p of th r ee sepa r a t e pa ges (or t a bs):
13.28
Figure 13.5:
Trip Generation Module
1. A Calibrate m od el pa ge in wh ich a r egr ession m odel can be r u n t o est im a t e
eith er an origin (production) model or a destinat ion (at tr action) model;
Ca li bra te Mo d e l
Da ta Fi le
Ty pe o f Mo de l
De pe n de n t Varia ble
If ch ecked , th e r ou t ine will test for t h e sk ewn ess of t h e depen den t var iable. The
out put includes:
1. Th e “g” st a t ist ic
2. Th e st a n da r d er r or of t h e “g” st a t ist ic
3. Th e Z valu e for t h e “g” st a t ist ic
4. Th e pr oba bilit y level of a Typ e I er r or for t h e “g” st a t is t ic
5. Th e r a t io of t h e sa m ple var ian ce t o t h e sa ple m ea n
13.30
In d e pe n d e n t v ari ab le s
Select ind epen den t var iables from t h e list of var iables in t h e da t a file. Up t o 15
var iables can be selected.
Mi ss in g va lu e s
Specify an y miss ing valu e codes for t h e var iables. Blan k r ecor ds will au t om a t ica lly be
con sid er ed a s m iss in g. If an y of t h e select ed depen den t or in depen den t va r ia bles h a ve
m iss in g valu es, t h ose r ecor ds will be excluded fr om t h e a n a lysis .
Ty pe o f R e gre s si on Mo de l
Specify t h e t yp e of r egr ession m odel t o be u sed. Th e defa u lt is a P ois son r egr ession
wit h over -dis per sion cor r ect ion . Ot h er a lt er n a t ives a r e a P ois son r egr ession a n d a n
Or din a r y Lea st Squ a r es r egr ession .
S av e Ou tp u t
P oi ss on ou t p u t
13.31
1. Th e depen den t va r ia ble
2. Th e t ype of m odel
3. The sample size (N)
4. Th e degrees of fr eedom (N - # depen den t var iables – 1)
5. Th e t yp e of r egr ession m odel (P ois son , P ois son wit h over -dis per sion
cor r ect ion )
6. Th e log-likelih ood va lu e
7. Th e Lik elih ood Ra t io
8. Th e pr obabilit y valu e of t h e Lik elih ood Ra t io
9. Th e Ak a ike I n for m a t ion Cr ite r ion (AIC)
10. Th e Schwa r t z Crit er ion (SC)
11. Th e Disper sion Mu ltiplier
12. Th e a pp r oxim a t e R-squ a r e va lu e
13. Th e devia n ce R-squ a r e va lu e
O LS ou t p u t
13.32
Multi co lin e ari ty Amo n g t h e Ind e pe n de n t Varia ble s
Gr a p h
Ma k e T ri p Ge n e r a ti o n P r e d i c t io n
Da ta Fi le
Ty pe o f Mo de l
13.33
Trip Ge n e rat io n Coe fficie n ts /P ara m e te rs F ile
In d e pe n d e n t Va ri ab le s
Select ind epen den t var iables from t h e list of var iables in t h e da t a file. Up t o 15
var iables can be selected.
Match in g P ara m e te rs
H in t : With your cur sor pla ced in t h e list of in depen den t va r ia bles , t ypin g t h e
firs t let t er of t h e m a t chin g var ia ble n a m e will t a k e you t o th e firs t va r ia ble
t h a t st a r t s wit h t h a t let t er . Repea t in g t h e let t er will m ove down t h e list t o t h e
second, third, and so fort h u nt il th e desired var iable is reached.
Miss in g Valu e s
Specify an y miss ing valu e codes for t h e var iables. Blan k r ecor ds will au t om a t ica lly be
con sid er ed a s m iss in g. If an y of t h e select ed depen den t or in depen den t va r ia bles h a ve
m iss in g valu es, t h ose r ecor ds will be excluded fr om t h e a n a lysis .
Ad d E x te r n a l Tr ip s
N u m b er o f e xt e r n a l t r i p s
13.34
O r i g i n ID
Specify th e origin ID var iable in t he dat a file. The extern al tr ips will be added as a n
extr a or igin zon e, ca lled th e “E xter n a l” zon e. Not e: t h e ID ’s u sed for t h e dest ina t ion file zon es
sh ould be t h e sa m e a s in t h e origin file. Th is will be n ecess a r y in su bsequ en t m odelin g
sta ges.
Ty pe o f R e gre s si on Mo de l
Specify t h e t yp e of r egr ession m odel t o be u sed. Th e defa u lt is a P ois son r egr ession
a n d t h e ot h er a lt er n a t ive is a Nor m a lly-d is t r ibu t ed/Or din a r y Lea st Squ a r es r egr ession .
S a v e P r e d i c te d Va lu e s
Ou tp u t
Sin ce, by defin ition, a ‘t r ip’ h a s a n or igin a n d a dest ina t ion , th e n u m ber of pr edict ed
origin s m u st equ a l t h e n u m ber of pr edicted dest in a t ion s. Beca u se of sligh t differ en ces in t h e
da t a set s of t h e or igin m odel a n d t h e dest ina t ion m odel, it is possible t h a t t h e t ot a l nu m ber
of pr edict ed or igin s (in clu din g a n y ext er n a l t r ip s – see Ma k e t r ip gen er a t ion pr edict ion
a bove) m a y n ot equ a l t h e t ot a l n u m ber of p red ict ed des t in a t ion s. Th is st ep , t h er efor e, is
essen t ial t o gua r a n t ee t h a t t h is con dit ion will be tr u e. The r ou t ine a djus t s eit h er t h e
n u m ber of pr edict ed or igin s or t h e n u m ber of pr edict ed dest in a t ion s so t h a t t h e con dit ion
h olds . Th e t r ip d ist r ibu t ion r out in es will n ot work u n less t h e n u m ber of pr edicted origin s
equ a ls t h e n u m ber of pr edict ed dest in a t ion s (wit h in a ver y s m a ll r ou n din g-off er r or ).
P re di ct e d Orig in Fi le
13.35
P re di ct e d D e st in at io n Fi le
D e st i n a t i o n v a r i a b l e
Ba la n cin g m eth od
Sa ve P re di ct e d Orig in /De st in at io n Fi le
Ou tp u t
E x a m p le Tr ip Ge n e r a t io n Mo d e l
13.36
S e tt in g Up th e Ori gi n Mo de l
In t h e fir st st ep, a n or igin m odel is cr ea t ed. F igu r e 13.6 sh ows t h e select ion of th e
depen den t va r ia ble a n d s ome poss ible in depen den t va r ia bles . Th e t ype of model is a n
or d in a r y P ois son r egr es sion . Th e d ep en d en t va r ia ble is t h e n u m ber of cr im es occu r r in g
bet ween 1993 an d 1997 in ea ch or igin zon e (BCORIG). Eight possible indep en den t var iables
h a ve been select ed: t h e 1996 popu la t ion of ea ch zon e (P OP 96), t h e m edia n h ou seh old in com e
of t h e zone r ela t ive t o th e zone wit h t h e h igh es t m edia n h ous eh old in com e (IN CE QU AL), t h e
n u m ber of 1996 n on-r et a il em ployees in ea ch zone (NON RE T96), t h e n u m ber of 1996 r et a il
em ployees in ea ch zon e (RE TE MP 96), t h e t ot a l lin ea r m iles of a r t er ia l r oa ds in ea ch zon e
(ARTE RIAL), a du m m y va r ia ble for w h et h er t h e Ba lt im ore Belt wa y (I-695) pa ss ed t h r ough
t h e zone or n ot (BE LTWAY), t h e lin ea r dis t a n ce of th e zone from Ba lt im ore h a r bor in t h e
CBD (DISTANCE ), a n d t h e n u m ber of h ous eh olds wit h out a u t omobiles (ZE ROAUTO - th is
ca n n ot be seen in t h e im a ge).
Th e m odel is set u p t o r u n a P ois son r egr ession wit h ou t a n over -dis per sion cor r ect ion .
It is a fixed m odel in wh ich a ll ind epen den t var iables a r e inclu ded. Th e coefficient s a r e sa ved
u n der “Sa ve es t im a t ed coefficien t s” dia logu e box a n d t h e out pu t (t h e pr edict ed va lu es) a r e
sa ved un der t h e “Sa ve ou t pu t ” dia logue box. Bot h boxes as k for a file n a m e.
Ta ble 13.2 sh ows t h e r esu lt s. Key st a t ist ics a r e h ighlight ed. Th e overa ll m odel is
h igh ly significa n t . The Likelihood Ra t io is high ly significa n t a n d t h e R-squ a r es a r e
r ea sona bly h igh (0.50 for t h e R-squ a r e a n d 0.42 for t h e devian ce R-squ a r e). The coefficient s
for ea ch of th e va r ia bles a r e sign ifica n t .
R e st ru c tu ri ng th e Ori gi n Mo de l
13.37
Figure 13.6:
Origin Poisson Model Setup
Figure 13.7:
Origin Poisson Model with Over-dispersion Correction
Ta ble 13.2
Model result:
Data file: BaltOrigins.dbf
Type of model: Origin
DepVar: BCORIG
N: 532
Df: 523
Type of regression model: Poisson with over-dispersion correction
Log Likelihood: -10678.051687
Likelihood ratio(LR): 25609.182621
P-value of LR: 0.0001
AIC: 21374.103373
SC: 21412.593165
Dispersion multiplier: 37.086973
R-square: 0.499539
Deviance r-square: 0.420031
-----------------------------------------------------------------------------
Pseudo-
Predictor DF Coefficient Stand Error Tolerance z-value p-value
CONSTANT 1 0.887266
0.037707 . 23.530608 0.001
POP96 1 0.000337
0.000016 0.463218 21.665568 0.001
INCEQUAL 1 -0.033017
0.001226 0.608346 -26.926013 0.001
NONRET96 1 -0.000173
0.000028 0.842042 -6.082943 0.001
RETEMP96 1 -0.000364
0.000117 0.960564 -3.107357 0.010
ARTERIAL 1 -0.108257
0.025888 0.771634 -4.181834 0.001
BELTWAY 1 0.150967
0.036047 0.958973 4.188082 0.001
DISTANCE 1 0.034289
0.007842 0.491906 4.372170 0.001
ZEROAUTO 1 -0.000462
0.000141 0.355510 -3.283930 0.010
-----------------------------------------------------------------------------
R e s i d u a l An a l y s is o f Or ig i n Mo d e l
13.40
Figure 13.8:
Plot of Residual Errors and Predicted Values
t h e er r or s in cr ea se wit h t h e va lu e of t h e pr edict ed depen den t va r ia ble. Wit h t h e P ois son
m odel, t h is is exp ect ed a n d does n ot in dica t e t h e viola t ion of t h e in depen den t er r or s
a ssu m pt ion , a s it does wit h t h e OLS. Th e er r or s a r e r ea son a bly sym m et r ica l a n d do n ot
in dica t e differ en ces in over- an d u n der -est im a t ion a cross t h e ba n d of th e pr edicte d va lu es .
Ta ble 13.3
R e su lt s o f S e c on d Ori gi n Mo d e l
Model result:
Data file: BaltOrigins.dbf
Type of model: Origin
DepVar: BCORIG
N: 532
Df: 525
Type of regression model: Poisson with over-dispersion correction
Log Likelihood: -11262.292156
Likelihood ratio(LR): 24440.701682
P-value of LR: 0.0001
AIC: 22538.584312
SC: 22568.520816
Dispersion multiplier: 1.000000
R-square: 0.455630
Deviance r-square: 0.446502
-----------------------------------------------------------------------------
Pseudo-
Predictor DF Coefficient Stand Error Tolerance z-value p-value
CONSTANT 1 2.286699
0.039339 . 58.127787 0.001
POP96 1 0.000284
0.000013 0.943426 22.473451 0.001
INCEQUAL 1 -0.018525
0.001026 0.849679 -18.048743 0.001
NONRET96 1 -0.000186
0.000030 0.866522 -6.139941 0.001
RETEMP96 1 -0.000353
0.000125 0.960769 -2.820286 0.010
ARTERIAL 1 -0.085070
0.027006 0.938167 -3.150019 0.010
BELTWAY 1 0.123109
0.037868 0.970051 3.251004 0.010
-----------------------------------------------------------------------------
S e tt in g Up th e D e st in a ti on Mo de l
Th e sa m e logic is a pp lied for t h e dest in a t ion m odel. In t h is ca se, t h e dest in a t ion file
h a s da t a on 325 zon es wit h in Ba ltim or e Cou n t y on ly. Similar possible pr edict or var iables
a r e in clud ed in t h e file. Aside from popu la t ion, r et a il a n d n on-r et a il em ploym en t , a n d t h e
r oa dwa y va r ia bles, m or e det a iled a n a lysis on la n d u ses wer e in clu ded (a cr ea ge of
com m er cial, r esiden t ia l, office spa ce, recrea t ion a l, an d con ser va t ion la n ds ). Th e m odel t h a t
wa s r u n wa s a P ois son wit h a n over -dis per sion cor r ect ion . Aga in , a ba ckwa r d elim in a t ion
pr ocedu r e wa s a dopt ed. On ce a fin a l m odel wa s s elect ed, it wa s r e-ru n a s a fixed m odel t o
ensur e tha t t he coefficient s were consistent ly estimat ed. Table 13.4 present s th e results.
13.42
F ive va r ia bles en ded u p in t h e fina l m odel. Aga in , popu la t ion wa s s ignifican t ly
r ela t ed t o t h e n u m ber of cr im es a t t r a ct ed t o a zon e, bu t wa s n ot t h e st r on gest pr edict or a s
in dica t ed by t h e Z-t es t . Th e st r ongest r ela t ions h ip w a s for t h e n u m ber of r et a il em ployees .
Th is su ggest s t h a t r et a il/com m er cia l a r ea s a t t r a ct m a n y cr im es. Th is is su ppor t ed by on e of
t h e lan d u se var iables - th e a cr ea ge ass ociat ed wit h very lar ge ma lls; in oth er wor ds, t h er e
a r e a dd it ion a l cr im es a t t r a cted t o ver y lar ge m a lls a bove-a n d-beyond t h e n u m ber of r et a il
em ployees in t h ose zones . Two ot h er va r ia bles a r e in t h e equ a t ion . Rela t ive in com e equ a lit y
wa s, aga in, n egat ively r elat ed t o cr ime d est ina t ion s/at t r a ct ion s; zon es wit h low incom e t en d
t o at t r a ct m ore crim es . Also, t h er e wa s a n ega t ive a ss ocia t ion wit h dis t a n ce fr om t h e CBD.
Th e fa r t h er a wa y fr om t h e CBD, th e lower t h e n u m ber of cr imes . Over a ll, t h e m odel
su ggest s t h a t zon es wit h com m er cia l a ct ivit ies, p a r t icu la r ly wit h la r ge m a lls, bu t wh ich a r e
clos er t o t h e cit y cen t er a n d wh ich h a ve h ou seh old s wit h r ela t ively lower in com es a r e t h ose
th at at tr act t he most crimes.
Ta ble 13.4
R e su lt s o f F irs t D e st in a ti on Mo d e l
Model result:
Data file: BCDestinations.dbf
Type of model: Destination
DepVar: BCDEST
N: 325
Df: 319
Type of regression model: Poisson with over-dispersion correction
Log Likelihood: -10347.872494
Likelihood ratio(LR): 41708.925054
P-value of LR: 0.0001
AIC: 20707.744988
SC: 20730.447939
Dispersion multiplier: 1.000000
R-square: 0.596921
Deviance r-square: 0.310251
-----------------------------------------------------------------------------
Pseudo-
Predictor DF Coefficient Stand Error Tolerance z-value p-value
CONSTANT 1 5.485851
0.218977 . 25.052182 0.001
POP96 1 0.000190
0.000027 0.928694 6.935850 0.001
INCEQUAL 1 -0.017176
0.005464 0.903130 -3.143462 0.010
RETEMP96 1 0.001018
0.000062 0.717076 16.297855 0.001
VERYLRGMLACR 1 0.006446
0.000974 0.740927 6.616423 0.001
DISTANCE 1 -0.115709
0.017069 0.876461 -6.778875 0.001
-----------------------------------------------------------------------------
13.43
R e si du a l An a ly si s o f D e s ti na ti on Mo de l
13.44
Figure 13.9:
2000
1500
(Observed-Predicted)
1000
500
Residual error
0
0 500 1000 1500 2000 2500 3000
-500
-1000
Predicted values
Ta ble 13.5
R e su lt s o f S e c on d D e st in a ti on Mo d e l
Model result:
Data file: BcDestinations.dbf
Type of model: Destination
DepVar: BCDEST
N: 325
Df: 317
Type of regression model: Poisson with over-dispersion correction
Log Likelihood: -7852.238456
Likelihood ratio(LR): 46700.193131
P-value of LR: 0.0001
AIC: 15720.476911
SC: 15750.747513
Dispersion multiplier: 1.000000
R-square: 0.784194
Deviance r-square: 0.227710
-----------------------------------------------------------------------------
Pseudo-
Predictor DF Coefficient Stand Error Tolerance z-value p-value
CONSTANT 1 5.182117
0.067867 . 76.356923 0.001
INCEQUAL 1 -0.020797
0.003942 0.902950 -5.276135 0.001
RETEMP96 1 0.000995
0.000051 0.700294 19.338957 0.001
VERYLRGMLACR 1 0.006590
0.000869 0.716299 7.582758 0.001
POP96 1 0.000238
0.000020 0.921456 12.164552 0.001
DISTANCE 1 -0.087826
0.012462 0.872535 -7.047735 0.001
GOLDENRING 1 1.933321
0.069636 0.969044 27.763123 0.001
EASTPOINT 1 1.602000
0.067934 0.943548 23.581751 0.001
-----------------------------------------------------------------------------
Co m p a r i n g D i ffe r e n t Cr im e s T y p e s
13.46
Ta ble 13.6
Mo d e l s fo r S p e c i fi c Cr im e T yp e s :
Ori gi n Mo d e l
Vehicle
All Crimes Robbery Burglary Theft
CONSTANT 2.286699 -0.652291 1.621546 -0.800759
INCOME
EQUALITY -0.018525 -0.023964 - -0.019620
NON-RETAIL
EMPLOYMENT -0.000186 -0.000237 -0.000239 -0.000188
RETAIL
EMPLOYMENT -0.000353 - -
BELTWAY 0.123109 - - -
MILES OF
ARTERIAL -0.085070 - - -0.180966
Ta ble 13.7
Mo d e l s fo r S p e c i fi c Cr im e T yp e s :
D e st in a ti on Mo d e l
Vehicle
All Crimes Robbery Burglary Theft
CONSTANT 5.485851 3.284488 3.246183 2.610299
INCOME
EQUALITY -0.017176 -0.027946 -0.034598 -0.012910
RETAIL
EMPLOYMENT 0.001018 0.000844 - 0.000507
VERY
LARGE
MALL
ACREAGE 0.006446 0.004332 - -
DISTANCE
FROM CBD -0.115709 -0.096330 -0.038715 -0.096088
Th e popula t ion var iable a ppea r s in every sin gle m odel. As m en t ion ed, all ot h er
t h in gs bein g equ a l, th e la r ger t h e n u m ber of per son s in a zon e, t h e m ore cr im e even t s will
occu r wh et h er t h os e even t s a r e cr im e p r od u ct ion s (or igin s ) or cr im e a t t r a ct ion s
13.47
(dest in a t ion s). Sim ila r ly, re la t ive in com e equ a lit y a pp ea r s in five of t h e six m odels w it h t h e
coefficien t a lwa ys bein g n ega t ive. In gen er a l, zones w it h r ela t ively lower in com es will h a ve
m ore r obber ies, bu r gla r ies, an d veh icle t h efts. Th e only m odel for wh ich in com e equ a lit y did
n ot a ppea r wa s a s a n or igin va r iable for bur gla r ies; a ppa r en t ly, bur gla r s com e fr om zon es
wit h va r ious in com e levels, a t lea st in Ba lt im ore .
In ot h er wor ds, t h ese m odels a r e m ea su r ing gener a l con dit ion s a ssociat ed wit h cr ime,
not cau ses per se. Th ey ca pt u r e t h e gen er a l con t ext u a l r ela t ion sh ip s a ss ocia t ed wit h cr im e
pr odu ction s a n d a t t r a ction s. Bu t , t h ey don’t n ecessa r ily pr edict in divid u a l beh a vior.
Never t h eles s, t h e m odels ca n be u sed for pr ediction sin ce th e con dit ion s a pp ea r t o be qu it e
gen er a l.
Ad d i n g E x te r n a l Tr ip s
A sim ple est ima t e of exter n a l tr ips is obt a ined by ta kin g th e differ en ce bet ween t h e
t ot a l n u m ber of crim es occur r in g in t h e st u dy a r ea (Ba lt im or e Cou n t y dest in a t ion s) a n d t h e
t ot a l n u m ber of crim es or igin a t in g in t h e m odelin g a r ea (t a ble 13.8).
13.48
Ta ble 13.8
N u m ber of
crim es en din g
in 325 Ba lt im or e
Coun ty zones: 41,969
N u m ber of
crim es origin a t in g
in 532 Ba lt im or e
Cou n t y/City zon es: 40,342
Cr im es fr om
ou t s id e t h e
m od elin g a r ea : 1,627
P r e d i c t i n g Ex t e r n a l Tr ip s
If a m odel is bein g a pp lied t o an oth er da t a set from wh ich it wa s in it ia lly est im a t ed, a
pr oblem em er ges a bout h ow to est im a t e t h e n u m ber of ext er n a l t r ips . It is one t h in g t o ap ply
sim ple a r ith m et ic in order t o det er m ine h ow m a n y tr ips origin a t ed out side t h e m odeling a r ea
(as in t a ble 13.8). It is a n oth er t o kn ow how t o ca lcula t e ext er n a l t r ips wh en t h e m odel is
bein g a pplied t o ot h er da t a . F or t h e m odeled zon es, t h e coefficien t s a r e a pplied t o t h e
var iables of t h e m odel (see “Ma ke P r edict ion ” below). But , th e exter n a l tr ips h a ve to be
es t im a t ed in depen den t ly.
There is not a simple way to estimat e extern al crime tr ips. Unlike regular tr ips th at
ca n be es t im a t ed th r ou gh cor don cou n t s, cr im e t r ip s a r e n ot det ect a ble wh ile t h ey a r e
occu r r in g (i.e., on e ca n n ot s t a n d by a r oa d a n d cou n t offen d er s t r a velin g by). Th u s , t h ey
ha ve to be estimat ed.
13.49
yea r s. E ssen t ia lly, a slop e is bein g ca lcu la t ed a n d a pplied a s a lin ea r equ a t ion :
Yi = 1850 + 223*Xi
H ow r ea list ic is t h is ? F or sh or t t im e per iod s, lin ea r ext r a pola t ion is pr oba bly a s good
a m et h od a s a n y. But for lon ger t im e per iods , it can lea d t o spu r iou s con clus ion s (e.g., cr im e
t r ips from out side t h e r egion will a lways in cr ea se). Sh or t of developin g a soph ist ica t ed
m odel t h a t r elat es crim e t r ips t o t h e growth of t h e m et r opolita n a r ea a n d t o ot h er
m et r opolit a n a r ea s wit h in , s a y, 500 m iles, a lin ea r ext r a pola t ion is on e of t h e few m et h ods
t h a t one can a pp ly. 7
Ma k e P r e d i c ti o n
A second r eason for separ at ing th e calibrat ion a nd a pplicat ion pa ges is to add
ext er n a l t r ip s t o t h e or igin zon es. As m en t ion ed a bove, ext er n a l t r ip s a r e, by d efin it ion , t h ose
t h a t wer e n ot m odeled in t h e ca libr a t ion . Th ey h a ve t o be ca lcu la t ed in depen den t ly of th e
m od el a n d t h en a d ded in .
Th u s, t h e “Ma ke p r edict ion ” pa ge allows t h ese opera t ion s t o occu r . Figur e 13.10
sh ows t h e pa ge. Ther e a r e severa l st eps t h a t h a ve to be im plem en t ed for t h is pa ge to be
oper a t ive.
13.50
Figure 13.10:
"Make Prediction" Setup Page
2. A t r ip gener a t ion coefficient s file needs t o be inpu t . These wer e t h e est ima t ed
coefficien t s fr om t h e calibr a t ion st a ge. In pu t t in g t h is file br in gs in t h e
coefficien t s in t h e ord er in wh ich t h ey wer e sa ved. Th ey a r e list ed in t h e
“Matching para met ers” dialogue box on t he right side of th e page.
For each zone, the rout ine will th en ta ke th e appropriat e var iable from t he input dat a
set a n d a pply th e m a t ch ing coefficient fr om t r ip gener a t ion coefficient s file to produ ce a
pr edicte d est im a t e of t h e n u m ber of t r ips . To ca lcula t e t h is va lu e, for t h e OLS m odel, t h e
r out in e will u se equ a t ion 13.2 a bove wh ile for t h e P oiss on m odel, t h e r out in e will u se
equ a t ion 13.6 above; for t h e lat t er , it will t h en r a ise t h e pr edict ed log va lue t o t h e power, e,
to produce a prediction for t he expected nu mber of crime tr ips:
8i = e ]L n (8i) ] (13.26)
13.52
If ext er n a l t r ips a r e a dd ed, a n ew zone is cr ea t ed called E XTE RN AL in t h e ID field
t h a t wa s in dica t ed on t h e pa ge. Then , th e specified nu m ber of exter n a l tr ips is sim ply placed
in t h a t field wit h zer os bein g pla ced for t h e va lu es of a ll t h e r em a in in g va r ia bles in t h e file.
By d efa u lt , t h e ou t pu t n a m e for t h e pr edict ed n u m ber of crim es will be ca lled P RE DORIG for
a n origin m odel a n d P RE DDE ST for a dest in a t ion m odel. An exa m ple da t a set is a va ila ble
on t h e Crim eS tat downloa d pa ge.
Note: for a dest in a t ion m odel, t h is “Ma k e pr ediction” oper a t ion is n ot n ecess a r ily
n eed ed if t h e sa m e da t a set is u sed for calibr a t ion a n d p r ediction. Th is s t ep is
pr im a r ily for t h e origin file
B a la n ci ng P re d ic te d Ori gi ns an d De s ti na ti on s
13.53
a n d t h e pr edicte d n u m ber of dest in a t ions is m u lt iplie d by M i . Th e m u lt iplicat ion sim ply
en su r es t h a t t h e su m s of t h e pr edicted origin s a n d p r edicted dest in a t ion s a r e equ a l.
The out put produces a new column with th e adjust ed values. Table 13.9 shows the
origin out put for t he Baltimore data of th e first 11 records. Once the balancing has been
com plet ed, t h e t r ip gen er a t ion m odel is finish ed a n d t h e u ser can go on t o th e t r ip
dis t r ibu t ion m odel. In oth er words , t h e out pu t file en su r es t h a t both t h e pr edicted origin file
(cr ime p r odu ct ion s) an d pr edict ed des t ina t ion file (cr ime a t t r a ct ion s) ar e bala n ced.
S t re n g t h s a n d We a k n e s s e s o f Re g r e s s io n Mo d e l in g o f Tr ip s
As m en t ion ed ea r lier, th e u se of r egres sion for pr odu cing t h e t r ip gener a t ion m odel
h a s it s s t r en gt h s a n d wea k n esses. Th e a dva n t a ges a r e t h a t , fir st , t h e a pp r oach is a pp lica ble
t o cr im e in cid en t s. U n lik e r egu la r t r a vel beh a vior , cr im e t r ip s h a ve t o be in fer r ed fr om
police r eport s; one ca n n ot con du ct a h ous eh old s u r vey of offen der s a sk in g t h em a bout t h eir
cr ime t r a vel. Thu s, st a r t ing with cou n t s of t h e n u m ber of cr imes occu r r ing in ea ch zon e a n d
t h e n u m ber of cr imes t h a t or igina t e fr om ea ch zon e, a m odel ca n be con st r u ct ed.
13.54
Figure 13.11:
Balance Predicted Origins and Destinations Setup
Ta ble 13.9
Ad ju s t e d D a t a S h o u l d Ha v e Th e s e F i e l d s
Th ird , th e u se of a m u ltiple r egres sion m odel a llows m u ltiple ind epen den t var iables
t o be inclu ded. In ou r exam ple, th er e wer e six an d five va r iables r espectively in t h e gener a l
or igin a n d dest in a t ion m odels . Tr ip t a bles, on t h e ot h er h a n d, t yp ica lly on ly h a ve t h r ee or
fou r in depen den t pr edict or s; it becom es t oo com plica t ed t o keep t r a ck of m u lt ip le con dit ion s
of p red ict or va r ia bles . Th u s, a m or e com p lex a n d s op his t ica t ed mod el ca n be p rod uced wit h
a r egr es sion fra m ework .
F ou r t h , an d fin a lly, a r egres sion fr a m ewor k a llows for com plex in t er a ct ion s t o be
est im a t ed. F or exa m ple, t h e log of a n in depen den t va r ia ble ca n be defin ed. An in t er a ct ion
bet ween t wo of t h e in depen den t va r ia bles ca n be exa m in ed (e.g., m edia n h ou seh old in com e
for t h os e zon es h a vin g a s izea ble a m ou n t of r et a il em p loym en t ). In t h e t r ip t a ble a p pr oa ch ,
t h ese in t er a ct ion s a r e im plicit in t h e cell m ea n s. Th u s, over a ll, t h e r egr ession fr a m ewor k
a llows for a m or e com plex m odel t h a n is a va ila ble wit h a t r ip t a ble a ppr oa ch .
13.56
On t h e ot h er h a n d, t h er e a r e pot en t ia l p r oblem s a ssocia t ed wit h a r egr ession
fr a m ewor k. F irs t , th e r egres sion coefficient s can be influ en ced by zon e size. Since th e m odel
is est ima t ing differ en ces bet ween zon es (i.e., differ en ces in t h e n u m ber of cr imes a s a
fu n ct ion of differ en ces in t h e valu es of t h e indep en den t var iables), zon e size affect s t h e level
of t h ose differ en ces. Wit h sm a ll zon e sizes, th er e will be su bst a n t ial differ en ces bet ween
zon es in both t h e in depen den t a n d d epen den t va r ia bles . Con ver sely, lar ge zone s izes w ill
m in im ize wit h in -zon e differ en ces, bu t will u su a lly in cr ea se t h e est im a t e of t h e bet ween -zon e
differ en ces. The r esu lt cou ld be an exaggera t ion of t h e effect of a var iable t h a t wou ld n ot be
seen with sm a ll zon e geogra ph y. As we a r gued in ch a pt er 12, on e sh ou ld ch oose t h e sm a llest
zon e geogra ph y th a t is pr a ct ica l in or der t o m inim ize t h is pr oblem.
Th e n ew gener a t ion of t r a vel dem a n d m odels is sp ecifica lly beha vior a l an d in volves
m odelin g t h e beh a vior of specific in divid u a ls . P r oba bilit ies a r e ca lcu la t ed ba sed on
in divid u a l choice an d a m icro-sim u la t ion r out in e can a pp ly t h es e pr obabilit ies t o a la r ge
m et r opolit a n a r ea (RDC, 1995; P a s, 1996; Recker , 2000; Sh ifton et a l, 2003). While t h is
a ppr oa ch offer s some d efinit e t h eor et ica l ad van t a ges an d is t h e su bject of m u ch cu r r en t
r esea r ch , to dat e t h er e h a s n ot been a dem on st r a t ion t h a t t h is a ppr oa ch is m or e a ccu r a t e a t
pr edictin g t r ips t h a n t h e t r a dit ion t r ip-ba sed t r a vel dem a n d m odel.
Su m m ary
13.57
En d n ot e s fo r Ch ap te r 13
13.58