You are on page 1of 42

Chapter 5

Distance Analysis I and II


In t h is ch a pt er , t ools t h a t iden t ify cha r a cter ist ics of th e dist a n ces bet ween point s
will be d escribed. Th e pr evious cha pt er pr ovided t ools for descr ibin g t h e gen er a l sp a t ia l
dis t r ibu t ion of cr im e in cid en t s or first-ord er pr oper t ies of t h e inciden t dist r ibut ion (Ba iley
a n d Ga t t r ell, 1995). F ir st -or der pr oper t ies a r e global be cau se t h ey r epr esen t t h e domin a n t
pa t t er n of dis t r ibu t ion - wh er e it is cen t er ed, how fa r it sp r ea ds out , an d wh et h er t h er e is
a n y orien t a t ion or dir ect ion t o its dis per sion . S econ d -ord er (or local) pr oper t ies, on t h e
ot h er h a n d, r efer t o s u b-r egion a l p a t t er n s or ‘n eigh bor h ood ’ p a t t er n s wit h in t h e over a ll
dist r ibut ion . If t h er e a r e dist inct ‘h ot spots ’ wh er e m a n y cr ime in ciden t s clu st er t ogeth er ,
t h eir d ist r ibut ion is spa t ially rela t ed n ot so mu ch t o t h e overa ll globa l pa t t er n a s t o
som et h in g u n iqu e in t h e su b-r egion or n eigh bor h ood. Th u s, s econ d-or der ch a r a ct er is t ics
t ell some t h in g a bout pa r t icula r en vir onm en t s t h a t m a y con cent r a t e crim e in ciden t s.

Th er e a r e t wo dist a n ce a n a lysis pa ges. In Dist a n ce a n a lysis I, var iou s secon d-or der
st a t ist ics a r e pr ovided , in clud in g:

1. NN
2. Linear NN
3. Ripley
4. Assign pr im a r y point s t o secon da r y point s

In Dist a n ce an a lysis II, t h er e a r e fou r r out in es for calcula t in g a n d ou t pu t in g


dist a n ce m a t r ices. This cha pt er will discuss bot h set s of r ou t ines .

F igu r e 5.1 sh ows t h e Dis t a n ce a n a lysis I scr een a n d t h e dis t a n ce st a t is t ics on t h a t


pa ge t h a t a r e calcula t ed by Crim eS tat.

N ea r e st N eighbor Index (N na)

On e of t h e oldest dist a n ce st a t ist ics is t h e n earest n eigh bor in d ex. It is p a r t icula r ly


u se ful becau se it is a sim ple t ool t o un der st a n d a n d t o ca lcula t e. It wa s d evelope d by t wo
bot a n ist s in t h e 1950s (Clar k a n d E van s, 1954), prim a r ily for field wor k, bu t it h a s been
u sed in m a n y differ en t fields for a wide va r iet y of pr oblem s (Cr essie, 1991). It h a s a lso
becom e t h e bas is of m a n y ot h er t ypes of dist a n ce st a t ist ics, som e of wh ich a r e imp lemen t ed
in Crim eS tat.

Th e n ea r est n eigh bor in dex comp a r es t h e dist a n ces bet ween n ea r est point s a n d
dis t a n ces t h a t wou ld be exp ect ed on t h e ba sis of ch a n ce. It is a n in dex t h a t is t h e r a t io of
t wo su m m a r y mea su r es. Fir st , th er e is th e n earest n eigh bor d istan ce. F or ea ch poin t (or
in cid en t loca t ion ) in t u r n , t h e d is t an ce t o t h e clos es t ot h er poin t (n ea r es t n eigh bor ) is
calculat ed and a veraged over all points.

5.1
Figure 5.1: Distance Analysis I Screen
N Min (d ij)
Nea r est Neigh bor Dis t a n ce = d(NN) = G[ ----------- ] (5.1)
i=1 N

wh er e Min (d ij) is th e dista nce between each point a nd its nea rest n eighbor a nd N is th e
n u m ber of point s in t h e dist r ibu t ion . Th u s, in Crim eS tat, t h e dist a n ce fr om a sin gle point
t o every ot h er poin t is ca lcu lat ed a n d t h e sm a llest d ist a n ce (t h e m inim u m ) is select ed.
Th en , t h e n ext point is t a k en a n d t h e dist a n ce to a ll ot h er point s (in clud in g t h e firs t point
m ea su r ed) is ca lcula t ed wit h t h e n ea r est bein g selected a n d a dd ed t o th e firs t m in im u m
dis t a n ce. Th is p r oces s is r epea t ed u n t il a ll poin t s h a ve h a d t h eir n ea r es t n eigh bor select ed.
Th e t ot a l s u m of t h e m in im u m dis t a n ces is t h en divid ed by N , t h e sa m ple size, t o pr odu ce
a n a vera ge min imu m dist a n ce.

Th e second su m m a r y m ea su r e is t h e expect ed n ea r est n eigh bor dis t a n ce if t h e


dist r ibut ion of poin t s is com plet ely spa t ially ra n dom. This is t h e m ean ran d om d istan ce (or
th e mean ra ndom n earest neighbor dista nce). It is defined as

A
Mea n Ra n dom Dis t a n ce = d(r a n ) = 0.5 SQRT [ ------] (5.2)
N

wh er e A is t h e a r ea of t h e r egion a n d N is t h e n u m ber of inciden t s. Since A is defin ed by


th e squa re of th e unit of measu rem ent (e.g., squar e mile, squar e meters, etc.), it yields a
r a n dom dis t a n ce m ea su r e in t h e sa m e u n it s (i.e ., m iles, m et er s, et c.). 1 If d efin ed on t h e
m ea su r em en t pa r a m et er s p a ge by t h e u se r , Crim eS tat will u se t h e specified a r ea in
calculat ing th e mean ra ndom dista nce. If no ar ea mea sur ement is provided, Crim eS tat will
t a k e t h e r ecta n gle defined by t h e m in im u m a n d m a xim u m X a n d Y point s.

Th e n ea r est n eigh bor ind ex is t h e r a t io of t h e obser ved nea r est n eigh bor dist a n ce t o
t h e m ea n r a n dom d is t a n ce

d(NN)
Nea r est Ne ighbor In dex = N NI = --------------- (5.3)
d(r a n )

Th u s, t h e in dex com pa r es t h e a ver a ge dis t a n ce fr om t h e clos est n eigh bor t o ea ch


poin t with a dist a n ce t h a t wou ld be expect ed on t h e bas is of ch a n ce. If t h e obser ved
a vera ge dista n ce is about t h e sa m e a s t h e m ea n r a n dom dist a n ce, th en t h e r a t io will be
a bou t 1.0. On t h e ot h er h a n d, if t h e obser ved aver a ge dista n ce is sm a ller t h a n t h e m ea n
r a n dom dis t a n ce, t h a t is , p oin t s a r e a ct u a lly closer t oget h er t h a n wou ld be exp ect ed on t h e
ba sis of ch a n ce, t h en t h e n ea r est n eigh bor in dex will be less t h a n 1.0. Th is is eviden ce for
clustering. Conversely, if th e observed average dista nce is great er th an th e mean ra ndom
dis t a n ce, t h en t h e in dex will be gr ea t er t h a n 1.0. Th is wou ld be eviden ce for dis per sion ,
t h a t poin t s a r e m or e widely disper sed t h a n wou ld be expect ed on t h e bas is of ch a n ce.

5.3
Te s t i n g t h e S i g n i fi c a n c e o f t h e N e a r e s t N e i g h b o r In d e x

Some differ en ces from 1.0 in t h e n ea r est n eigh bor ind ex wou ld be expect ed by
ch a n ce. Cla r k a n d E van s (1954) pr oposed a Z-t est t o ind ica t e wh et h er t h e obser ved
a ver a ge n ea r est n eigh bor dis t a n ce wa s sign ifica n t ly differ en t fr om t h e m ea n r a n dom
dist a n ce (H a m m on d a n d McCullagh, 1978; Ripley, 1981). The t est is betw een t h e obser ved
n ea r est n eigh bor dist a n ce a n d t h a t expect ed from a r a n dom dist r ibut ion a n d is given by

d(N N ) - d (r a n )
Z = ---------------------- (5.4)
SE d (r a n )

wh er e t h e st a n da r d er r or of th e m ea n r a n dom dis t a n ce is a pp r oxima t ely given by:

(4 - B) A 0.26136
SE d (r a n ) . SQRT [--------------- ] . --------------------- (5.5)
4BN 2 SQRT[ N 2 /A ]

with A being t h e a r ea of r egion a n d N t h e n u m ber of poin t s. Ther e h a ve been oth er


su ggest ed t est s for t h e n ea r est n eigh bor dist a n ce a s well as cor r ect ion s for edge effect s (see
below). H owever , equ a t ions 5.4 a n d 5.5 a r e u se d m ost frequ en t ly t o test t h e a ver a ge
n ea r est n eigh bor dist a n ce. See Cress ie (1991) for det a ils of ot h er t est s.

Ca lc u l a ti n g t h e s t a t is t i c s

On ce n ea r est n eigh bor a n a lysis h a s been select ed, t h e u ser clicks on Com pute t o r u n
t h e r ou t in e. Th e pr ogr a m out pu t s 10 st a t is t ics :

1. Th e sa m ple size
2. Th e m ea n n ea r est n eigh bor dis t a n ce
3. Th e st a n da r d devia t ion of t h e n ea r est n eigh bor dis t a n ce
4. Th e m in im u m d is t a n ce
5. Th e m a xim u m d is t a n ce
6. Th e m ea n r a n dom dist a n ce for bot h t h e bou n din g recta n gle a n d t h e u ser
inp u t a r ea , if pr ovided
7. Th e m ea n disper sed d ist a n ce for bot h t h e bou n din g recta n gle a n d t h e u ser
inp u t a r ea , if pr ovided
8. Th e n ea r est n eigh bor ind ex for bot h t h e bou n din g recta n gle a n d t h e u ser
inp u t a r ea , if pr ovided
9. Th e st a n da r d er r or of t h e n ea r est n eigh bor in dex for bot h t h e m a xim u m
bou n din g recta n gle a n d t h e u ser inp u t a r ea , if pr ovided
10. A significa n ce t est of t h e n ea r est n eigh bor ind ex (Z-t est )
11. Th e p-va lues a ssociat ed wit h a on e t a il a n d t wo t a il significa n ce t est .

In a dd it ion , t h e out pu t can be s a ved t o a ‘.dbf’ file, wh ich can t h en be im por t ed in t o


spreadsh eet or gra phics progra ms.

5.4
Exam ple 1: The ne ares t ne ighbo r inde x for street robbe ries

In 1996, t h er e wer e 1181 st r eet r obber ies in Ba lt im or e Cou n t y. Th e a r ea of th e


Coun t y is a bout 607 squ a r e m iles a n d is sp ecified on t h e m ea su r em en t pa r a m et er s p a ge.
Crim eS tat r et u r n s t h e st a t ist ics sh own in Ta ble 5.1 with t h e NN A r ou t ine. The m ea n
n ea r est n eigh bor dist a n ce wa s 0.116 miles wh ile t h e m ea n n ea r est n eigh bor dist a n ce u n der
r a n dom n ess wa s 0.358. Th e n ea r est n eigh bor in dex (t h e r a t io of t h e a ct u a l t o t h e r a n dom
n ea r es t n eigh bor dis t a n ce) is 0.3236. Th e Z-va lu e of -44.4672 is h igh ly sign ifica n t . In
ot h er wor ds, t h e dis t r ibu t ion of t h e n ea r est n eigh bor s of st r eet r obber ies in Ba lt im or e
Cou n t y is significa n t ly sm a ller t h a n wh a t wou ld be expect ed r a n domn ess.

Ta ble 5.1
Ne are st N e igh bo r Sta tis tic s for
1996 Street Robb erie s in B altimore County
(N =1181)

Mea n n ea r est n eigh bor dist a n ce: 0.11598 m i


Mea n r a n dom dist a n ce bas ed on u ser inp u t a r ea : 0.35837 m i
Nea r es t n eigh bor in dex: 0.3236
S ta n da r d er r or : 0.00545 m i
Tes t St a t ist ic (Z): -44.4672
p-va lu e (on e t a il) #.0001
p-va lu e (tw o ta il) #.0001

It sh ou ld be n ot ed t h a t t h e sign ifica n ce t est for t h e n ea r est n eigh bor in dex is n ot a


test for complete spatial ran domn ess, for which it is somet imes mistaken . It is only a t est
wh et h er t h e a ver a ge n ea r est n eigh bor d ist a n ce is sign ifican t ly differ en t t h a n wh a t would
be expect ed on t h e ba sis of cha n ce. In ot h er wor ds, it is a t est of first-ord er near est
neighbor r an domn ess.2 Th er e a r e a ls o secon d-or der , t h ir d-or der , a n d so for t h dis t r ibu t ion s
t h a t m a y or m a y not be significa n t ly differ en t fr om t h eir cor r espond ing or der s u n der
com ple t e spa t ia l r a n dom n es s. A comp let e t es t would h a ve t o test for a ll t h ose effects , wh a t
ar e called K -ord er effect s.

Exam ple 2: The ne ares t ne ighbo r inde x for res iden tial burglaries

Th e n ea r es t n eigh bor in dex a n d t es t can be ver y u se ful for u n der st a n din g t h e


degr ee of clu st er in g of cr im e in cid en t s in spit e of it s lim it a t ion s. F or exa m ple, in Ba lt im or e
Cou n t y, t h e dist r ibut ion of 6051 res iden t ial bu r gla r ies in 1996 yields t h e following n ea r est
n eigh bor st a t is t ics (Ta ble 5.2):

5.5
SARS and the Distribution of Passengers on an Airplane

Marta A. Guerra
Senior Staff Epidemiologist,
Centers for Disease Control and Prevention
Atlanta, GA

Illness in passengers on board airplanes occurs rather frequently, and


investigations are performed to assess whether transmission to other passengers has
occurred. During 2002, several passengers with Severe Acute Respiratory Syndrome
(SARS) traveled to the United States by airplane while they were infectious. Since
transmission of SARS can be airborne, there is concern that it could spread during
an airline flight. A survey was undertaken on a flight where a confirmed SARS case
was on board. Serum samples of passengers were taken to evaluate if transmission
of SARS had occurred during the flight, and whether transmission is related to
sitting near the SARS case.

The nearest neighbor index was used to compare the distances between the
seats of passengers on this flight to distances expected on the basis of chance. A grid
(7 m x 32 m) was superimposed on the airline seat configuration, and each seat was
assigned an X, Y coordinate based on the width (x) and the length (y) of the airplane.
In the diagram below, the seat location of the SARS index case is indicated by an X,
and the passengers’ seat locations are shaded in black.

Nearest Neighbor Statistics for Airline Flight with SARS Case

The nearest neighbor index of passengers’ seats was 0.931 indicating that the
distribution was random, not clustered. This preliminary analysis was important in
order to establish that the seating arrangement of the passengers was random and
independent, and that the passengers’ seats were not clustered around the SARS
case. Therefore, if any passengers have positive serum samples for SARS, we would
be able to evaluate their locations in relation to the SARS case and assess patterns
of transmission. In this survey, however, there was no evidence of transmission
since none of the passengers had positive serum samples for SARS.
Ta ble 5.2
Ne are st Ne ig h bo r St at is tic s for
1996 R esid e n tial Bu r glar ie s in B alt im or e Coun t y
(N=6051)

Mea n n ea r est n eigh bor dist a n ce: 0.07134 m i


Mea n r a n dom dist a n ce bas ed on u ser inp u t a r ea : 0.16761 m i
Nea r es t n eigh bor in dex: 0.4256
S ta n da r d er r or : 0.00113 m i
Tes t St a t ist ic (Z): -85.4750
p-va lu e (on e t a il) #.0001
P -va lu e (tw o ta il) #.0001

Th e dis t r ibu t ion of r esid en t ia l bu r gla r ies is a ls o h igh ly sign ifica n t . N ow, s u ppose
we wa n t t o comp a r e t h e dist r ibu t ion of st r eet r obber ies (ta ble 5.1) wit h t h a t r esiden t ia l
bu r gla r ies (t a ble 5.2). Th e sign ifica n ce t est is n ot ver y u sefu l for t h e com pa r is on beca u se
t h e sa m ple sizes a r e so la r ge (1181 v. 6051); t h e m u ch h igher Z-va lu e for r esiden t ia l
bu r gla r ies indicat es pr ima r ily t h a t t h er e wa s a lar ger s a m ple size to test it. H owever,
com pa r in g t h e r ela t ive n ea r est n eigh bor in dices ca n be m ea n in gfu l.

Rela t ive
Near est
Neigh bor NNI(A)
Com pa r is on = ----------------- (5.6)
NN I(B)

wh er e NN I(A) is t h e n ea r est n eigh bor ind ex for on e group (A) a n d N NI (B) is t h e n ea r est
n eigh bor in dex for a n oth er gr oup (B). Th u s, com pa r in g st r eet r obber ies wit h r esiden t ia l
bu r gla r ies , we h a ve

NNI (A) NNI (robberies) 0.3057


-------------- = ------------------------ = ---------- = 0.7182
NN I (B) NNI (burglar ies) 0.4256

In ot h er wor ds, t h e dis t r ibu t ion of st r eet r obber ies r ela t ive t o a n exp ect ed r a n dom
dis t r ibu t ion a ppea r s t o be m or e con cen t r a t ed t h a n t h a t of bu r gla r ies r ela t ive t o a n
exp ect ed r a n dom dis t r ibu t ion . Th er e is n ot a sim ple sign ifica n ce t est of t h is com pa r is on
since th e st a n da r d er r or of t h e joint dist r ibut ion s is not k n own . 3 But t h e r elat ive index
su ggest s t h a t r obber ies a r e m ore concen t r a t ed t h a n bu r gla r ies a n d, h en ce, ar e m ore lik ely
t o h a ve ‘h ot spot ’ or ‘h ot zon es’ wh er e t h ey a r e pa r t icu la r ly con cen t r a t ed. Th is in dex, of
cou r s e, d oes n ot p r ove t h a t t h er e a r e ‘h ot s pot s ’, bu t on ly p oin t s u s t owa r d s t h e h igh er
con cent r a t ion of robberies rela t ive to bur gla r ies. In t h e pr eviou s ch a pt er, it wa s sh own
t h a t r obber ies h a d a sm a ller dis per sion t h a n bu r gla r ies. H er e, h owever , t h e a n a lysis is
ta ken a step fur th er to suggest th at robberies ar e more concentr at ed tha n bur glaries.

5.7
U s e o f N e t w o r k D i s ta n c e

In calcula t in g t h e n ea r est n eigh bor in dex, net work dis t a n ce ca n be u sed t o ca lcula t e
t h e dis t a n ce bet ween poin t s (see ch a pt er 3). H owever , u n less t h e da t a set is ver y s m a ll or
you h a ve a lot of pa t ien ce, I h igh ly r ecom m en d t h a t you d on ’t do th is. N et wor k
ca lcu lat ion s a r e very slow a n d will t a ke a lon g tim e t o com plet e for a lar ge file.

K-Orde r Ne are st Ne ig h bo rs

As m en t ion ed a bove, t h e n ea r est n eigh bor ind ex is only an ind ica t or of firs t -or der
spa t ia l r a n dom n ess. It com pa r es t h e a ver a ge dis t a n ce for t h e n ea r est n eigh bor t o a n
expe cted r a n dom dis t a n ce. But wh a t a bout t h e secon d n ea r est n eigh bor? Or t h e t h ir d
n ea r est n eigh bor ? Or t h e K t h n ea r es t n eigh bor ? Crim eS tat const ru cts K-order n earest
n eigh bor in dices. On t h e dis t a n ce a n a lysis pa ge, t h e u ser ca n specify t h e n u m ber of
n ea r es t n eigh bor in dices t o be calcu la t ed.

Th e K-or der n ea r est n eigh bor r out in e r et u r n s fou r colu m n s:

1. Th e or der , s t a r t in g fr om 1
2. Th e m ea n n ea r est n eigh bor dist a n ce for ea ch or der (in m et er s)
3. Th e expect ed n ea r est n eigh bor dist a n ce for ea ch or der (in m et er s)
4. Th e n ea r est n eigh bor ind ex for ea ch or der

F or ea ch or der , Crim eS tat ca lcu lat es t h e K t h n ea r est n eigh bor dis t a n ce for ea ch
obser va t ion a n d t h en t a k es t h e a ver a ge. Th e exp ect ed n ea r est n eigh bor dis t a n ce for ea ch
ord er is ca lcula t ed by:

Mea n Ra n d om Dis t a n ce K (2K)!


t o K t h n ea r est n eigh bor = d(K r a n ) = ------------------------------ (5.7)
(2 K K!)2 SQRT [N/A]

wh er e K is t h e or der a n d ! is t h e fa ct or ia l op er a t ion (e.g., 4! = 4 x 3 x 2 x 1; Th om pson ,


1956). The K t h n ea r est n eigh bor ind ex is t h e r a t io of t h e obser ved K t h n ea r est n eigh bor
dist a n ce t o t h e K t h m ea n r a n dom dist a n ce. Ther e is not a good significa n ce t est for t h e K t h
n ea r est n eigh bor ind ex due t o t h e n on -ind epen den ce of t h e differ en t or der s, t h ou gh t h er e
h a ve been a t t em pt s (see exa m ple s in Get is a n d Boot s, 1978; Aplin , 1983). Cons equ en t ly,
Crim eS tat does n ot p r ovide a t es t of sign ifica n ce.

Th er e a r e n o rest r iction s on t h e n u m ber of n ea r es t n eigh bors t h a t can be ca lcula t ed.


H owever , sin ce th e a ver a ge dist a n ce incre a se s w it h h igh er -or der n ea r es t n eigh bors, t h e
poten t ial for bias fr om edge effect s will a lso increa se. It is su ggest ed t h a t n ot m or e t h a n
100 nearest neighbors be calculat ed.4

Never t h eless, t h e K-or der n ea r est n eigh bor dis t a n ce a n d in dex ca n be u sefu l for
u n der st a n din g th e overa ll spa t ial dist r ibut ion s. Figur e 5.2 com pa r es t h e K-or der n ea r est
neighbor index for st reet r obberies with th at of resident ial burglar ies. The out put was

5.8
Figure 5.2
K-Order Nearest Neighbor Indices
1996 Street Robberies and Residential Burglaries
2.0

1.8

1.6
Nearest Neighbor Index

1.4

1.2

K-order spatial randomness


1.0

0.8
Residential burglaries
0.6

0.4 Street robberies

0.2

0.0
1 5 9 13 17 21 25 29 33 37 41 45 49
3 7 11 15 19 23 27 31 35 39 43 47

Order of Nearest Neighbor Index


Nearest Neighbor Analysis
Man With A Gun Calls
Charlotte, N.C.: 1989

James L. LeBeau
Administration of Justice
Southern Illinois University-Carbondale

A comparison was made of Man with a Gun calls for the weekend in which
Hurricane Hugo hit the North Carolina coast ( September 22 – 24) with the
following New Year’s Eve weekend (December 29-31, 1989). There were 146 Man
with a Gun calls during the Hurricane Hugo weekend compared to 137 calls for New
Year’s Eve.

Nearest Neighbor Index of Man With A Gun Calls

0.85

0.80
Clustered - Index - Dispersed

0.75

0.70

New Year's Eve Weekend


Hurricane Hugo Weekend
0.65

0.60

0 5 10 15 20 25
ORDER

The Nearest Neighbor Index in CrimeStat was used to compare the


distributions. From the onset, the Hurricane Hugo Man With a Gun locations are
more dispersed than New Year’s Eve. After the 5th nearest neighbor (Order 5) the
differences become more pronounced
sa ved as a ‘.dbf’ an d wa s t h en im por t ed int o a gr a ph ics pr ogra m . Th e gra ph sh ows t h e
n ea r est n eigh bor ind ices for bot h r obber ies a n d bu r gla r ies u p t o t h e 50 t h or der (i.e., t h e 50 t h
n ea r est n eigh bor ). Th e n ea r est n eigh bor in dex is sca led fr om 0 (ext r em e clu st er in g) u p t o 1
(extr em e disper sion ). Since a n ea r est n eigh bor ind ex of 1 is expect ed u n der r a n domn ess,
t h e t h in st r a igh t lin e a t 1.0 in dica t es t h e exp ect ed K-or der in dex. As ca n be seen , bot h
st r eet r obber ies a n d r esiden t ial bu r gla r ies a r e m u ch m or e con cen t r a t ed t h a n K-or der
spa t ia l r a n dom n ess. F u r t h er , r obber ies a r e m or e con cen t r a t ed t h a n even bu r gla r ies for
ea ch of t h e 50 nea r est n eigh bor s. Thu s, t h e gra ph r einfor ces t h e a n a lysis a bove th a t
r obber ies a r e m or e con cen t r a t ed t h a n bur gla r ies, an d both a r e m or e con cen t r a t ed t h a n a
r a n dom dis t r ibu t ion.

In ot h er wor ds, even t h ou gh t h er e is not a good significa n ce t est for t h e K-or der
n ea r est n eigh bor ind ex, a gra ph of t h e K-or der ind ices (or t h e K-or der dist a n ces) ca n give a
p ict u r e of h ow clu s t er ed t h e d is t r ibu t ion is a s well a s a llow com p a r is on s in clu s t er in g
bet ween t h e differ en t t yp es of crim es (or t h e sa m e cr im e a t t wo differ en t t im e per iods).

Gr a p h in g t h e K-or d er n ea r est n eig h bor

On t h e out pu t pa ge, t h er e is a qu ick gr a ph fun ction t h a t dis pla ys a cur ve sim ila r t o
figu r e 5.2. Th is is u seful for qu ickly exa m in in g t h e t r en ds . H owever , a bet t er gr a ph is
m a de by im por t in g t h e ‘dbf’ file ou t pu t in t o a s pr ea ds h eet or gr a ph ics pr ogra m .

Edge Effec ts

It sh ou ld be noted t h a t t h er e a r e poten t ial edge effect s t h a t ca n bias t h e n ea r est


n eigh bor in dex. An in ciden t occur r in g n ea r t h e bor der of t h e st u dy a r ea m a y a ctu a lly h a ve
it s n ea r es t n eigh bor on t h e oth er sid e of t h e bor der . H owever , sin ce th er e a r e u su a lly n o
da t a on t h e dist r ibut ion of inciden t s out side t h e st u dy ar ea , th e pr ogra m select s a n ot h er
poin t wit h in t h e st u dy a r ea a s t h e n ea r est n eigh bor of th e border point . Th u s, t h er e is t h e
poten t ial for exaggera t ing t h e n ea r est n eigh bor dist a n ce, th a t is, th e obser ved nea r est
n eigh bor d ist a n ce is pr obably grea t er t h a n wh a t it sh ould be a n d, t h er efor e, t h er e is a n
overestim ation of t h e n ea r est n eigh bor d ist a n ce. In oth er words , t h e in ciden t s a r e pr obably
m ore clus t er ed t h a n wh a t h a s been m ea su r ed (see Cr es sie , 1991 for det a ils).

N e a re s t N e ig h bo r E dg e Co rre c ti on s

Th e defau lt con dit ion is no edge cor r ect ion . However , on e wa y th a t t h e m ea su r ed


dis t a n ce t o t h e n ea r est n eigh bor ca n be cor r ect ed for possible edge effect s is t o a ssu m e for
ea ch obser ved point t h a t t h er e is an ot h er poin t ju st ou t side t h e bor der a t t h e closest
dist a n ce. If t h e dist a n ce fr om a poin t t o t h e bor der is sh or t er t h a n t o its m ea su r ed n ea r est
n eigh bor, t h en t h e n ea r er t h eor et ical p oint is t a k en a s a pr oxy for t h e n ea r es t n eigh bor.
Th is cor r ect ion h a s t h e effect of r edu cing t h e a vera ge neighbor dist a n ce. Since it a ssu m es
t h a t t h er e is a lwa ys a n oth er poin t a t t h e bor der , it pr obably un derestim ates t h e t r u e
n ea r est n eigh bor dist a n ce. The t r u e valu e is pr oba bly somewh er e in bet ween t h e m ea su r ed
a n d t h e a ssu m ed n ea r est n eigh bor dist a n ce.

5.11
Crim eS tat h a s t wo differ en t edge cor r ect ion s. Beca u se Crim eS tat is not a GIS
pa cka ge, it ca n n ot loca t e t h e a ct u a l bor der of a st u dy a r ea . On e wou ld n eed a t opologica l
GIS p a cka ge in wh ich t h e dist a n ce fr om ea ch p oint t o th e n ea r es t boun da r y is ca lcula t ed.
In st ea d, th er e a r e t wo differ en t geom et r ic m odels t h a t ca n be ap plied. The firs t a ssu m es
t h a t t h e st u dy a r ea is a r ect a n gle wh ile t h e secon d a ss u m es t h a t t h e st u dy a r ea is a circle.
Depen din g on t h e sh a pe of t h e a ct u a l st u dy ar ea , on e or eith er of t h ese m odels m a y be
a ppr opr iat e.

R ect a n gu la r st u d y a r ea

In t h e r ect a n gula r a djus t m en t , th e a r ea of t h e st u dy ar ea , A, is fir st ca lcu lat ed,


eit h er fr om t h e u s er in p u t on t h e m ea s u r em en t p a r a m et er s t a b or fr om t h e m a xim u m
boun din g r ect a n gle defin ed by t h e m in im u m a n d m a xim u m X/Y va lu es (see cha pt er 3). If
t h e u ser pr ovides a n est im a t e of t h e a r ea , t h e r ect a n gle is pr opor t ion a t ely r e-sca led so t h a t
t h e a r ea of t h e r ect a n gle equa ls A. Secon d, for ea ch poin t , th e dist a n ce t o t h e n ea r est ot h er
poin t is calcula t ed. Th is is t h e obser ved n ea r est n eigh bor d ist a n ce for p oin t i.

Th ird , th e m inim u m dist a n ce t o t h e n ea r est edge of t h e r ect a n gle is ca lcu lat ed a n d


is com pa r ed t o t h e obser ved nea r est n eigh bor dist a n ce for poin t i. If t h e obser ved nea r est
n eigh bor d ist a n ce for p oin t i is equ a l t o or less t h a n t h e dist a n ce to t h e n ea r est bord er , it is
r et a in ed. On t h e oth er h a n d, if th e obser ved n ea r est n eigh bor d ist a n ce for p oin t i is
great er th an th e dista nce to th e nearest border, th e dista nce to th e border is used as a
pr oxy for t h e n ea r est n eigh bor d ist a n ce of poin t i.

Ci r cu la r st u d y a r ea

In t h e cir cu la r a dju st m en t , fir st , t h e a r ea of t h e st u dy a r ea is ca lcu la t ed, eit h er fr om


t h e u ser in pu t on t h e m ea su r em en t pa r a m et er s t a b (see ch a pt er 3) or from t h e m a xim u m
bou n din g recta n gle defin ed by th e m inim u m a n d m a xim u m X/Y valu es. If t h e u ser h a s
s pecified a s tu dy a r ea on t h e m ea su r em en t pa r a m et er s p age, t h en t h a t va lu e is ta k en for A
a n d t h e r a diu s of t h e circle is ca lcu lat ed by

R = SQRT [A / B ] (5.8)

If t h e u ser h a s n ot s pecified a st u dy a r ea on t h e m ea su r em en t pa r a m et er s p a ge, t h en A is


ca lcu la t ed fr om t h e m in im u m a n d m a xim u m X a n d Y coor d in a t es (t h e bou n d in g r ect a n gle)
a n d t h e r a diu s of t h e circle is ca lcu lat ed wit h equa t ion 5.8.

Secon d, for ea ch p oin t , t h e dist a n ce to t h e n ea r est oth er point is calcula t ed. Th is is


t h e obs er ved n ea r est n eigh bor dis t a n ce for poin t i. Th ir d, for ea ch poin t , i, t h e dis t a n ce
from t ha t point t o th e mean center is calculat ed, R i . F ou r t h , t h e m in im u m d is t a n ce t o t h e
n ea r est edge of t h e circle is ca lcula t ed u sin g

R iC = R - R i (5.9)

5.12
F ift h , for ea ch poin t , i, th e obser ved m inim u m dist a n ce is com pa r ed t o t h e n ea r est
edge of t h e circle, R iC . If t h e obs er ved n ea r est n eigh bor dis t a n ce for poin t i is equ a l t o or
less t h a n t h e dis t a n ce t o t h e n ea r est ed ge, it is r et a in ed. On t h e ot h er h a n d, if t h e
observed nearest n eighbor dista nce for point i is great er th an th e dista nce to th e nearest
edge, t h e dis t a n ce t o t h e bor der is u sed a s a pr oxy for t h e t r u e n ea r est n eigh bor dis t a n ce of
poin t i.

For eit h er cor r ect ion

Th e a ver a ge n ea r est n eigh bor dis t a n ce is ca lcu la t ed a n d com pa r ed t o t h e t h eor et ica l


a vera ge nea r est n eigh bor dist a n ce u n der r a n dom con dit ion s. The in dices a n d t est s a r e a s
befor e (see ch a pt er 4). F igu r e 5.3 below sh ows a gr a ph of t h e K-or der n ea r est n eigh bor
in dex for t h e 50 n ea r est n eigh bor s for 1996 m ot or veh icle t h eft s in police P r ecin ct 11 of
Baltimore Coun ty. The uncorr ected near est neighbor indices are compa red with th ose
cor r ected by a r ecta n gle a n d a cir cle. As ca n be s een , both cor r ection s a r e ver y sim ila r t o
t h e u n cor r ect ed. However, th ey bot h sh ow grea t er con cen t r a t ion s t h a n t h e u n cor r ect ed
index. The recta ngular corr ection sh ows great er concentr at ion t ha n t he circular becau se it
is less com pa ct (i.e., t h e a ver a ge dis t a n ce fr om t h e cen t er of t h e geom et r ic object t o t h e
bor der is sligh t ly la r ger ). In gen er a l, t h e r ect a n gle will lea d t o m or e cor r ect ion t h a n t h e
circle sin ce it s u bst it u t es a gr ea t er n ea r est n eigh bor d ist a n ce, on a ver a ge, for a point
n ea r er t h e bor der t h a n t o it s m ea su r ed n ea r est n eigh bor .

Th e u se r h a s t o decide wh et h er eit h er of t h es e cor r ect ions a r e m ea n in gful or n ot.


Depen din g on t h e sh a pe of t h e st u dy a r ea , eit h er cor r ect ion m a y or m a y n ot be a pp r opr ia t e.
If t h e st u dy a r ea is r ela t ively r ect a n gu la r , t h en t h e r ect a n gu la r m odel m a y p r ovide a good
a ppr oxim a t ion . Similar ly, if t h e st u dy ar ea is com pa ct (circula r ), th en t h e circula r m odel
m a y pr ovide a good a pp r oxim a t ion . On t h e oth er h a n d, if th e st u dy a r ea is of ir r egu la r
sh a pe, th en eith er of t h ese cor r ect ion s m a y produ ce m or e dist or t ion t h a n t h e r a w n ea r est
n eigh bor in dex. On e h a s t o us e t h ese cor r ection s wit h jud gem en t . Also, in s ome cas es, it
m a y not m a k e a n y sen se t o corr ect t h e m ea su r ed n ea r est n eigh bor d ist a n ces. In H onolulu ,
for exa m ple, on e wou ld not cor r ect t h e m ea su r ed nea r es t n eigh bor dis t an ces beca u se t h er e
a r e n o incide n t s ou t sid e t h e isla n d’s boun da r y.

Linea r Nea r est Neigh bor In dex (Ln n a )

Th e lin ear n earest n eigh bor in d ex is a va r iat ion on t h e n ea r est n eigh bor r ou t ine, bu t
on e a p plied t o a s t r eet n et wor k . All d is t a n ces a lon g t h is n et wor k a r e a s su m ed t o t r a vel
a lon g a gr id, h en ce ind ir ect dis t a n ces a r e u sed. Wh er ea s t h e n ea r est n eigh bor r out in e
calculat es the distan ce between each point a nd its nea rest n eighbor u sing direct dista nces,
t h e lin ea r n ea r es t n eigh bor r out in e u se s in dir ect (‘Ma n h a t t a n ’) dist a n ces (see cha pt er 3).
Sim ilar ly, wher ea s t h e n ea r est n eigh bor r ou t ine calcu lat es t h e expect ed dist a n ce bet ween
n eigh bor s in a r a n dom dist r ibut ion of N p oint s u sin g th e geogra ph ica l ar ea of t h e st u dy
r egion , t h e lin ea r n ea r est n eigh bor r ou t in e u ses t h e t ot a l len gt h of t h e st r eet n et wor k .

5.13
Figure 5.3:

Correction of Nearest Neighbor Indices


Motor Vehicle Thefts in Precinct 11

Dispersed

Random
1

Concentrated
Nearest neighbor index

0.9

No correction

0.8 Circular correction

Rectangular correction

0.7

10 20 30 40
5 15 25 35 45

Order
Th e t h eor y of lin ea r n ea r es t n eigh bors comes from H a m m ond a n d McCulla gh
(1978). Th e obse r ved lin ea r n ea r es t n eigh bor dis t a n ce, Ld(N N), is calcu la t ed by Crim eS tat
a s t h e a vera ge of ind irect d ist a n ces bet ween ea ch poin t a n d it s n ea r est n eigh bor . The
expect ed lin ea r n ea r es t n eigh bor dis t a n ce is given by

L
Ld (r a n ) = 0.5 [------------------] (5.10)
N -1

wh er e L is th e tota l length of str eet n etwork an d N is the sam ple size (Ha mm ond a nd
McCullagh, 1978, 279). Consequent ly, th e linear n earest neighbor index is defined as

Lin ea r N ea r es t Ld(NN)
Neighbor In dex = LN NI = --------------- (5.11)
Ld (r a n )

Te s t i n g t h e S i g n i fi c a n c e o f t h e Li n e a r N e a r e s t N e i g h b o r In d e x

Sin ce t h e t h eor et ica l s t a n da r d er r or for t h e r a n dom lin ea r n ea r est n eigh bor dis t a n ce
is n ot kn own , t h e a u t h or h a s con st r u ct ed a n a ppr oxim a t e st a n da r d devia t ion for t h e
obser ved lin ea r n ea r est n eigh bor dist a n ce:

G ( Min (d ij) - Ld (NN) )2


S L d(N N ) . SQRT [ ------------------------------------ ] (5.12)
N -1

wh er e Min (d ij) is t h e n ea r est n eigh bor dist a n ce for poin t i an d Ld(NN ) is t h e a vera ge linea r
n ea r est n eigh bor dis t a n ce. Th is is t h e st a n da r d devia t ion of t h e lin ea r n ea r est n eigh bor
dist a n ces. The s t a n da r d er r or is ca lcu lat ed by

S L d(N N )
SE L d(N N ) = -------------- (5.13)
SQRT[N]

An a ppr oxim a t e significa n ce t est ca n be obt a ined by

Ld (N N ) - Ld (r a n )
t = ----------------------------- (5.14)
SE L d(N N )

wh er e Ld(NN) is t h e a vera ge linea r n ea r est n eigh bor dist a n ce, Ld(r a n ) is t h e expect ed
lin ea r n ea r es t n eigh bor dis t a n ce (equ a t ion 5.10), a n d S E L d(N N ) is th e a pp roxim a t e s ta n da r d
er r or of t h e lin ea r n ea r es t n eigh bor dis t an ce (equ a t ion 5.13). Sin ce t h e em p ir ica l s ta n da r d
devia t ion of th e lin ea r n ea r es t n eigh bor is bein g u se d in st ea d of a t h eor et ical va lu e, t h e
t es t is a t-test r a t h er t h a n a Z-t es t .

5.15
Ca lc u l a ti n g t h e s t a t is t i c s

On th e measu rem ents pa ra met ers page, th ere ar e two par am eters t ha t a re input ,
t h e geogr a ph ica l a r ea of t h e st u dy r egion a n d t h e len gt h of st r eet n et wor k . At t h e bot t om
of t h e p age, t h e u ser m u st select wh ich t yp e of d is t an ce m ea su r em en t t o u se, d ir ect or
in d ir ect . If t h e m ea s u r em en t t yp e is dir ect , t h en t h e n ea r es t n eigh bor r ou t in e r et u r n s t h e
sta nda rd n earest neighbor a na lysis (somet imes called areal nea r est neighbor ). On t h e
ot h er h a n d, if t h e m ea su r em en t t yp e is in dir ect , t h en t h e r ou t in e r et u r n s t h e lin ea r n ea r es t
n eigh bor a n a lysis . To ca lcu la t e t h e lin ea r n ea r est n eigh bor in dex, t h er efor e, d is t a n ce
m ea su r em en t m u st be specified a s in dir ect a n d t h e lengt h of t h e st r eet n et wor k m u st be
defined.

On ce n ea r est n eigh bor a n a lysis h a s been select ed, t h e u ser clicks on Com pute t o r u n
t h e r ou t ine. The L n n a rout ine out put s 9 stat istics:

1. Th e sa m ple size
2. Th e m ea n lin ea r n ea r est n eigh bor dist a n ce
3. The minimum linear distan ce between n earest neighbors
4. Th e m a xim u m lin ea r dis t a n ce bet ween n ea r est n eigh bor s
5. Th e m ea n lin ea r r a n dom dist a n ce
6. Th e lin ea r n ea r es t n eigh bor in dex
7. Th e st a n da r d deviat ion of t h e lin ea r n ea r est n eigh bor dist a n ce
8. Th e st a n da r d er r or of t h e lin ea r n ea r est n eigh bor dis t a n ce
9. A significa n ce t est of t h e n ea r est n eigh bor ind ex (t -t est )
10. Th e p-va lues a ssociat ed wit h a on e t a il a n d t wo t a il significa n ce t est .

E x a m p l e 3: Au t o t h e ft s a lo n g t w o h i g h w a y s

Th e lin ea r n ea r est n eigh bor in dex is u seful for a n a lyzing t h e dist r ibu t ion of crim e
in ciden t s a lon g pa r t icula r st r eet s. F or exa m ple, in Ba lt im ore Coun t y, st a t e h ighwa y 26 in
t h e west er n pa r t a n d st a t e h igh wa y 150 in t h e ea st er n pa r t h a ve h igh con cen t r a t ion s of
m ot or vehicle th eft s (figu r e 5.4). In 1996, th er e wer e 87 vehicle th eft s on h igh wa y 26 an d
47 on h igh wa y 150. A GIS ca n be u sed wit h t h e lin ea r n ea r est n eigh bor in dex t o in dica t e
wh et h er t h es e in ciden t s a r e gr ea t er t h a n wh a t would be exp ect ed on t h e ba sis of cha n ce.

Ta ble 5.3 pr esen t s t h e da t a . Usin g th e GIS, we est ima t e t h a t t h er e a r e 3,333.54


m iles of roa dwa y s egm en t s; t h is n u m ber wa s es t im a t ed by a ddin g u p t h e t ot a l len gt h of th e
st r eet n et wor k in t h e GIS. Of a ll t h e r oa d segm en t s in Balt imore Coun t y, t h er e a r e 241.04
m iles of m a jor a r t er ial r oa ds of wh ich st a t e h igh wa y 26 ha s a t ot a l len gth of 10.42 miles
a n d s t a t e h igh wa y 150 h a s a t ota l r oad len gt h of 7.79 m iles .

In 1996, th er e wer e 3,774 m ot or vehicle th eft s in t h e cou n t y. If t h ese t h eft s wer e


dist r ibut ed r a n domly, t h en t h e r a n dom expect ed dist a n ce bet ween inciden t s would be 0.44
m iles (equ a t ion 5.10). Usin g th is est ima t e, ta ble 5.3 sh ows t h e n u m ber of inciden t s t h a t
wou ld be exp ect ed on ea ch of t h e t wo st a t e h igh wa ys if t h e dis t r ibu t ion wer e r a n dom a n d
t h e r a t io of t h e a ct u a l nu m ber of m ot or vehicle th eft s t o t h e expect ed n u m ber . As can be

5.16
Figure 5.4:

1996 Auto Thefts in Baltimore County


Incident Distribution on State Highways 26 and 150

Sta
te
Hig
hw
ay
26

0
y 15
a
ighw
H
te
Sta

Miles
0 2 4
Ta ble 5.3

Com pa r ison of 1996 Ba lt im or e Cou n t y Au t o Th eft s


for Differen t Types of Roa ds
(N = 3774 Incidents)

Length of Road Segment s:

H igh wa y 26 10.42 m i
H igh wa y 150 7.79 m i
All Ma jor
Ar t er ia ls 241.04 m i
All
Roads 3333.54 m i

Random E xpected
Dist a n ce
Bet ween In ciden t s = 0.44 miles

P r op or t ion a l To N et wor k P r oport ion a l to Sa m e Roa d

A ve ra ge “R el a t i ve
“R e la tive A ve ra ge R a n d om t o I t s e lf”
t o R a n d om ” Linea r Linea r Linea r
Wh e r e Number E x p e c te d Ne arest N e are st Ne arest
Inc ide n ts of Number R a t io o f Neighbor Neighbor Neighbor
Oc cu rre d Inc ide n ts If R a n d o m Frequen cy D i s ta n c e D i s ta n c e In d e x

H igh wa y 26 87 11.8 7.4 0.05 m i 0.06 0.96

H igh wa y 150 47 8.8 5.3 0.08 m i 0.08 0.94

All Ma jor
Ar t er ia ls 607 272.8 2.2 0.13 m i 0.20 0.64
(p#.001)

All Roads 3774 3774.0 1.0 0.09 m i 044 0.21


(p#.001)

5.18
seen , th e dist r ibut ion of m ot or vehicle th eft s is n ot r a n dom. On a ll m a jor a r t er ial r oa ds, t h er e
a r e 2.2 t im es a s m a n y t h eft s a s wou ld be exp ect ed by a r a n dom spa t ia l d is t r ibu t ion . In fa ct ,
in 1996, of 28,551 r oa d segm en t s in Ba lt im or e Cou n t y, on ly 7791 (27%) h a d on e or m or e m ot or
veh icle t h eft s occu r on t h em ; m os t of t h es e a r e m a jor r oa ds . F u r t h er , on h igh wa y 26 t h er e
wer e 7.4 tim es a s m u ch a n d on h igh wa y 150 th er e wer e 5.3 tim es a s m u ch a s would be
expe cted if t h e dist r ibu t ion wa s r a n dom . Clea r ly, th ese t wo high wa ys h a d m ore t h a n t h eir
sh a r e of a u t o th eft s in 1996.

But wha t a bout th e distr ibut ion of th e incidents a lon g each of th ese highwa ys? If
t h er e wer e a n y p a t t er n , for exa m ple, m ost of t h e in cid en t s clu st er in g on t h e west er n edge or
in t h e cen t er , th en police cou ld u se t h a t infor m a t ion t o m or e efficient ly deploy veh icles t o
r espond quickly to event s. On t h e ot h er h a n d, if t h e dist r ibut ion a lon g th ese h igh wa ys wer e
n o differ en t t h a n a r a n dom dist r ibut ion , th en police vehicles m u st be posit ion ed in t h e m iddle,
sin ce t h a t wou ld m inim ize t h e dist a n ce t o a ll occu r r ing incident s.

Un for t u n a t ely, t h e r es u lt s a pp ea r t o be close t o a r a n dom dis t r ibu t ion. Crim eS tat


ca lcu lat es t h a t for h igh wa y 26, t h e a vera ge linea r n ea r est n eigh bor dist a n ce is 0.05 m iles
wh ich is close t o th e a ver a ge r a n dom linea r n ea r est n eigh bor d ist a n ce (0.06 m iles). Th e r a t io
- th e lin ea r n ea r est n eigh bor in dex, is 0.96 wit h a t -va lu e of -0.16, wh ich is n ot s ignifican t ly
differ en t fr om ch a n ce. Sim ila r ly, for h igh wa y 150, t h e a ver a ge lin ea r n ea r est n eigh bor
dist a n ce is 0.079 m iles which , aga in, is a lmost ident ica l to th e a vera ge ra n dom linea r n ea r est
n eigh bor dist a n ce (0.084 miles); t h e n ea r est n eigh bor ind ex is 0.94 an d t h e t -valu e is -0.41
(not significan t). In short, even th ough th ere was a h igher concentr at ion of vehicle th efts on
t h ese t wo st a t e h igh wa ys t h a n wou ld be exp ect ed on t h e ba sis of ch a n ce, t h e dis t r ibu t ion
a lon g ea ch h igh wa y is n ot very differ en t t h a n wh a t wou ld be expect ed on t h e bas is of ch a n ce. 5

K-Or d e r Li n e a r N e a r e s t N e i g h b o rs

There is also a K-order linear near est neighbor a na lysis, as with t he ar eal nearest
n eigh bors. Th e u se r can sp ecify h ow m a n y a dd it iona l n ea r es t n eigh bors a r e t o be calcu la t ed.
Th e lin ea r K-or der n ea r est n eigh bor r out in e r et u r n s fou r colu m n s:

1. Th e or der , s t a r t in g fr om 1
2. Th e m ea n lin ea r n ea r est n eigh bor dist a n ce for ea ch or der (in m et er s)
3. Th e expect ed linea r n ea r est n eigh bor dist a n ce for ea ch or der (in m et er s)
4. Th e lin ea r n ea r est n eigh bor ind ex for ea ch or der

Sin ce t h e expect ed linea r n ea r est n eigh bor dist a n ce h a s n ot been work ed out for or der s
h igh er t h a n on e, t h e ca lcu la t ion pr odu ced h er e is a r ou gh a ppr oxim a t ion . It a pplies equ a t ion
5.10 only a dju st in g for t h e decr ea sin g sa m ple size, N k , wh ich occu r s a s degr ees of fr eedom a r e
lost for each successive order. In th is sense, th e index is really th e k-order linear near est
n eigh bor dis t a n ce r ela t ive t o t h e exp ect ed lin ea r n eigh bor dis t a n ce for t h e fir st or der . It is n ot
a st r ict n ea r est n eigh bor ind ex for or der s a bove on e.

Never t h eless, like t h e a r ea l k-or der n ea r est n eigh bor ind ex, t h e k-or der lin ea r n ea r est
n eigh bor ind ex ca n pr ovide ins igh t s in t o t h e dist r ibut ion of t h e poin t s, even if t h e firs t -or der

5.19
is r a n dom . Figur e 5.5 s h ows a gr a ph of 50 lin ea r n ea r est n eigh bors for 1996 r esiden t ia l
bu r gla r ies a n d st r eet r obber ies for Balt imore Coun t y. As wit h t h e a r ea l k-or der n ea r est
n eigh bor s (see figu r e 5.3) bot h bu r gla r ies a n d r obber ies sh ow eviden ce of clu st er in g. F or bot h ,
t h e firs t n ea r est n eigh bor s a r e closer t ogeth er t h a n a r a n dom dist r ibut ion . Similar ly, over t h e
50 or der s, s t r eet r obber ies a r e m or e clu st er ed t h a n bu r gla r ies. H owever , m ea su r in g d is t a n ce
on a gr id sh ows t h a t for bu r gla r ies, t h er e is only a sm a ll a m ou n t of clu st er in g. After t h e
four th order n eighbor, the distribution for bur glaries is more dispersed th an a r an dom
dis t r ibu t ion . An in t er pr et a t ion of t h is is t h a t t h er e a r e sm a ll n u m ber of bu r gla r ies wh ich a r e
clus t er ed, bu t t h e clust er s a r e r ela t ively disp er se d. S t r eet r obber ies , on t h e oth er h a n d, a r e
highly clustered, up t o over 30 near est neighbors.

Th e lin ea r k-or der n ea r est n eigh bor dis t r ibu t ion gives a sligh t ly differ en t per sp ect ive
on t he distribution t ha n t he ar eal. For one th ing, th e index is slight ly biased as t he
den om in a t or - t h e K-or der exp ect ed lin ea r n eigh bor dis t a n ce, is on ly a ppr oxim a t ed. F or
a n ot h er t h ing, th e index m ea su r es dist a n ce as if t h e st r eet follow a t r u e gr id, orien t ed in a n
ea st -west a n d n or t h -sout h dir ect ion . In t h is sen se, it m a y be un r ea listic for m a n y places,
especia lly if st r eet s t r a ver se in dia gon a l p a t t er n s; in t h ese ca ses, t h e u se of in dir ect dis t a n ce
m ea su r em en t will pr odu ce grea t er dis t a n ces t h a n wh a t a ctu a lly occu r on t h e n et work . St ill,
t h e lin ea r n ea r est n eigh bor ind ex is a n a t t em pt t o a ppr oxim a t e t r a vel a lon g th e st r eet
n et work . To t h e ext en t t h a t a pa r t icula r jur isd iction’s s t r eet pa t t er n fall in t h is m a n n er , it
ca n pr ovide u sefu l in for m a t ion .

Gr a p h in g t h e li n ea r K-ord er n ea r est n eig h bor

On t h e out pu t pa ge, t h er e is a qu ick gr a ph fun ction t h a t dis pla ys a cur ve sim ila r t o
figur e 5.5 below. Th is is u se ful for qu ickly exa m in in g t h e t r en ds .

Ripley’s K S t a t ist ic

R ipley’s K st a t ist ic is a n in dex of n on-r a n dom n es s for differ en t sca le va lu es (Ripley,


1976; Ripley, 1981; Ba iley a n d Ga t t r ell, 1995; Ven a bles a n d Rip ley, 1997) . In t h is s en se, it is
a ‘su per -or der ’ n ea r est n eigh bor st a t is t ic, pr ovidin g a t est of r a n dom n ess for ever y d is t a n ce
fr om t h e sm a llest u p t o some sp ecified lim it. a r ea . It is somet imes ca lled th e red u ced secon d
m om en t m easu re, im plyin g t h a t it is design ed t o m ea su r e secon d-or der t r en ds (i.e., loca l
clus t er in g a s opp osed t o a gen er a l pa t t er n over t h e r egion). However , it is a lso su bject t o fir st -
or der effect s s o t h a t it is not st r ict ly a secon d-or der m ea su r e.

Con sider a spatially ran d om dis t r ibu t ion of N point s. I f cir cles of r a diu s, t s , ar e dr a wn
a r oun d ea ch p oint , wh er e s is t h e order of r a dii fr om t h e sm a lles t t o th e la r gest , a n d t h e
n u m ber of oth er point s t h a t a r e fou n d wit h in t h e circle a r e cou n t ed a n d t h en su m m ed over a ll
poin t s (a llowin g for d u plicat ion ), t h en t h e expected n u m ber of point s wit h in t h a t r a diu s a r e

5.20
Figure 5.5

K-Order Linear Nearest Neighbor Indices


1996 Street Robberies and Residential Burglaries
4.5

3.5
Linear Nearest Neighbor Index

Residential burglaries

2.5

1.5

K-order spatial randomness


1

0.5 Street robberies

0
0 10 20 30 40
5 15 25 35 45

Order of Linear Nearest Neighbor Index


N
E (# of poin t s wit h in d ist a n ce d i ) = --------- K(t s ) (5.15)
A

wh er e N is t h e sa m ple size, A is t h e t ot a l s t u dy a r ea , a n d K(t s ) is t h e a r ea of a circle defin ed


by r a diu s, t s . F or exa m ple, if t h e a r ea defin ed by a pa r t icu la r r a diu s is on e-fou r t h t h e t ot a l
stu dy ar ea an d if t h er e is a s pa t ially ra n dom dist r ibut ion , on a vera ge app r oxim a t ely on e-
fou r t h of t h e ca ses will fa ll wit h in a n y on e cir cle (plu s or m in u s a sa m plin g er r or ). Mor e
for m a lly, wit h com plete spatia l ran d om n ess (cs r ), t h e exp ect ed n u m ber of p oin t s wit h in
dis t a n ce, t s , is

N
E (# un der csr ) = ------ B t s 2 (5.16)
A

On t h e ot h er h a n d, if t h e a ver a ge n u m ber of poin t s fou n d wit h in a cir cle for a


pa r t icu lar r a diu s pla ced over ea ch poin t , in t u r n , is gr ea t er t h a n t h a t fou n d in equa t ion 5.16,
t h is poin t s t o clu st er in g, t h a t is poin t s a r e, on a ver a ge, closer t h a n wou ld be exp ect ed on t h e
ba sis of cha n ce for t h a t r a diu s. Con ver sely, if t h e a ver a ge n u m ber of point s fou n d wit h in a
circle for a par ticular ra dius placed over each point, in t ur n, is less tha n t ha t foun d in
equ a t ion 5.16, t h is poin t s t o disper sion ; t h a t is point s a r e, on a vera ge, fa r t h er a pa r t t h a n
wou ld be exp ect ed on t h e ba sis of ch a n ce for t h a t r a diu s. By cou n t in g t h e n u m ber of t ot a l
n u m ber s wit h in a pa r t icu la r r a diu s a n d com pa r in g it t o t h e n u m ber exp ect ed on t h e ba sis of
com ple t e spa t ia l r a n dom n es s, t h e st a t ist ic is a n in dica t or of non -ra n dom n es s.

In t h is s en se, t h e K s t a t ist ic is s im ila r t o th e n ea r est n eigh bor d ist a n ce in t h a t it


pr ovides in for m a t ion a bou t t h e a ver a ge dis t a n ce bet ween poin t s. H owever , it is m or e
com p r eh en s ive t h a n t h e n ea r es t n eigh bor s t a t is t ic for t wo r ea s on s . F ir s t , it a p plies t o a ll
ord er s cu m u la t ively, not jus t a sin gle ord er . Secon d, it a pp lies t o all dist a n ces u p t o th e lim it
of t h e st u dy a r ea becau se t h e cou n t is con du cted over su ccessively in crea sin g r a dii.

Un der un const ra ined conditions, K is defined as

A
K(t s ) = ------ G G I (t ) ij (5.17)
N2 i i=/ j

where I(t ij) is t h e n u m ber of oth er poin t s, j, fou n d w it h in dis t a n ce, t s , sum med over all points,
i. Th a t is, a circle of r a diu s, t s , is pla ced over ea ch p oin t , i. The n , t h e n u m ber of oth er point s,
j, wit h in t h e circle is cou n t ed. Th e circle is m oved t o th e n ext i a n d t h e pr oces s is r epea t ed.
Th u s, t h e dou ble su m m a t ion poin t s t o t h e cou n t of a ll j’s for ea ch i, over a ll i’s. N ot e, t h e cou n t
does n ot in clud e it self, on ly oth er poin t s.

Aft er t h is p r ocess is com plet ed, t h e r a diu s of t h e circle is in crea sed, a n d t h e en t ir e


pr ocess is r epea t ed. Typ ica lly, t h e r a dii of cir cles a r e in cr ea sed in sm a ll in cr em en t s so t h a t

5.22
t h er e a r e 50-100 in t er va ls by wh ich t h e st a t ist ic ca n be coun t ed. In Crim eS tat, 100 in t er va ls
(r a dii) a r e u sed, ba sed on

R
t s = -------- (5.18)
100

wh er e R is th e r a diu s of a circle for wh ose a r ea is equa l to th e st u dy ar ea (i.e., t h e a r ea


en t er ed on t h e m ea su r em en t pa r a m et er s pa ge).

On e ca n gra ph K(t s ) aga in st t h e dist a n ce, t s , t o revea l wh et h er t h er e is a n y clu st er in g


a t cer t a in d ist a n ces or a n y disper sion a t ot h er s (if t h er e is clus t er ing a t some scales, t h en
t h er e m u st be disp er sion a t oth er s). Su ch a plot is n on-linea r , however , t ypically incr ea sin g
exp on en t ia lly (Ka lu zn y et a l, 1998. Con sequ en t ly, K(t s ) is t r a n sfor m ed in t o a squ a r e r oot
fu n ct ion , L(t s ), t o m a k e it m or e lin ea r . L(t s ) is defined as:

K(t s )
L(t s ) = S QRT [ --------- ] - t s (5.19)
B

Th a t is , K(t s ) is divid ed by B a n d t h en t h e squ a r e r oot is t a k en . Th en t h e dis t a n ce


in t er va l (t h e p a r t icu la r r a d iu s ), t s , is subtr acted from t his.6 In pr a ctice, on ly th e L st a t ist ic is
u se d even t h ough t h e n a m e of t h e st a t ist ic K is ba se d on t h e K d er iva t ion.

Be ca u se t h e L(t s ) is a m ea su r e of secon d-or der clu st er in g, it is u su a lly a n a lyzed for


only a sh ort distan ce. In Crim eS tat III, th e dist a n ce is set a t on e-t h ird t h e side of a squ a r e
defin ed by t h e a r ea (SQRT[A]/3). 7 Figu r e 5.6 shows a gra ph of L(t ) a gain st dist a n ce for 1996
r obber ies in Ba lt im or e Cou n t y. As ca n be seen , L(t ) in cr ea ses u p t o a dis t a n ce of a bou t 3
m iles wh ereu pon it decrea ses a gain. A “pu r e” ra n dom dist r ibut ion , kn own as com plete spatial
ran d om n ess (CSR), is sh own a s a h or izon t a l line a t L=0.

Co m p a r i s o n to A S p a t ia ll y R a n d o m D i s t ri b u t io n

To u n der st a n d wh et h er a n obs er ved K dis t r ibu t ion is differ en t fr om ch a n ce, on e


t yp ica lly u ses a r a n dom dis t r ibu t ion . Beca u se t h e sa m plin g dis t r ibu t ion of L(t s ) is not kn own ,
a sim u lat ion ca n be con du ct ed by ra n domly ass ign ing poin t s t o t h e st u dy ar ea . Beca u se a n y
on e s im u la t ion m igh t p r od u ce a clu s t er ed or d is per s ed pa t t er n s t r ict ly by ch a n ce, t h e
sim u la t ion is r epea t ed m a n y t im es, t yp ica lly 100 or m or e. Th en , for ea ch r a n dom sim u la t ion ,
t h e L st a t ist ic is ca lcu lat ed for ea ch dist a n ce int er val. Fin a lly, after a ll sim u lat ion s h a ve been
con d u ct ed , t h e h igh es t a n d lowes t L-va lu es a r e t a k en for ea ch d is t a n ce in t er va l. Th is is ca lled
a n envelope. Thu s, by com pa r ing t h e dist r ibut ion of L to th e r a n dom en velope, on e ca n a sses s
wh et h er t h e pa r t icu lar obser ved pa t t er n is likely to be differ en t fr om ch a n ce. 8 In figu r e 5.6,
t h e L en velope of r a n dom da t a is m u ch les s con cent r a t ed t h a n t h a t for r obber ies, indica t in g
t h a t it is h igh ly u n likely t h e con cen t r a t ion of r obber ies wa s du e t o ch a n ce.

5.23
Figure 5.6:
"K" Statistic For 1996 Robberies
Compared to Random and 2000 Population Distributions
L(t) = Sqrt[K(t)/pi] - t

Robbery

2
Population

1
L(t)

CSR
0

Envelope of 100 random simulations


-1

-2
0.1 1.4 2.7 4.0 5.3 6.6 7.9

Distance between points(miles)


S pe c ify in g si m u la ti on s

Beca u s e s im u la t ion s ca n t a k e a lon g t im e, p a r t icu la r ly if t h e d a t a s et s a r e la r ge, t h e


defau lt n u m ber of sim u la t ions is 0. H owever , a u se r can con du ct s im u la t ions by wr it in g a
posit ive n u m ber (e.g., 10, 100, 300). If sim u la t ions a r e select ed, Crim eS tat will con du ct t h e
n u m ber of sim u la t ion s specified by t h e u ser a n d will ca lcu la t e t h e u pper a n d lower lim it s for
ea ch dist a n ce int er val, as well a s t h e 0.5%, 2.5%, 5%, 95%, 97.5% a n d 99% int er vals; th ese
la t t er st a t ist ics only m a k e sen se if ma n y sim u la t ion r u n s a r e con du cte d (e.g. 1000).

Th e wa y Crim eS tat con d u ct s t h e s im u la t ion is a s follows . It t a k es t h e m a xim u m


bou n d in g r ect a n gle of t h e d is t r ibu t ion , t h a t is t h e r ect a n gle for m ed by t h e m a xim u m a n d
m in im u m X a n d Y coor din a t es r espectively a n d r e-scales t h is (u p or down ) un t il t h e r ecta n gle
h a s a n a r ea equa l to th e st u dy ar ea (defin ed on t h e m ea su r em en t pa r a m et er s pa ge). It t h en
a ss igns N point s, wh er e N is t h e sa m e n u m ber of point s a s in t h e in ciden t dis t r ibu t ion , usin g
a u n ifor m r a n dom n u m ber gener a t or t o t h is r ect a n gle a n d calcu lat es t h e L st a t ist ic. It t h en
r epea t s t h e exp er im en t for t h e n u m ber of sp ecified sim u la t ions , a n d ca lcula t es t h e a bove
st a t ist ics. For exa m ple, wit h 1181 robberies for 1996, th e Ripley’s K fun ct ion ca lcu lat es t h e
em pir ica l L sta t ist ics for 100 dist a n ce int er vals a n d com pa r es t h is t o a sim u lat ion of 1181
poin t s r a n domly distr ibut ed over a r ect a n gle k t imes , wher e k is a u ser -defin ed n u m ber .

In pr a ct ice, th e sim u lat ion t est a lso h a s bia ses a ssociat ed wit h edges. Un like t h e
t h eore t ical L u n der u n ifor m con dit ion s of com plet e spa t ia l r a n dom n ess (i.e., st r et chin g in a ll
dir ect ion s well beyon d t h e st u dy ar ea ) wh er e L is a s t r a igh t h or izon t a l line, t h e sim u lat ed L
a ls o declin es wit h in cr ea sin g dis t a n ce s epa r a t ion be t ween poin t s. Th is is a fu n ct ion of th e
sa m e t ype of edge bia s.

Co m pa ri so n to B as e li ne P o p u la ti on s

F or m ost social dist r ibut ion s, su ch a s crim e inciden t s, r a n domn ess is n ot a very
m ea n in gfu l ba selin e. Most socia l ch a r a cter ist ics a r e n on-r a n dom . Con sequ en t ly, to find t h a t
t h e a m oun t of clu st er in g t h a t is occur r in g is gr ea t er t h a n wh a t would be exp ected on t h e ba sis
of ch a n ce is not ver y useful for cr ime a n a lyst s. H owever, it is p ossible t o com pa r e t h e
dis t r ibu t ion of L for crim e in ciden t s wit h t h e dist r ibu t ion of L for va r iou s ba selin e
cha r a cte r ist ics, for exa m ple , for t h e popu la t ion d ist r ibu t ion or t h e dist r ibu t ion of em ploym en t .
In a lmost a ll m et r opolita n a r ea s, popu lat ion is m or e con cen t r a t ed t owa r ds t h e cen t er t h a n a t
t h e per iph er y; t h e dr op-off in popu la t ion den sit y is ver y sh a r p a s wa s s h own in t h e la st
ch a p t er . All ot h er t h in gs bein g equ a l, on e wou ld exp ect m or e in cid en t s t owa r d s t h e
m et r opolita n cen t er t h a n a t t h e per iph er y; con sequ en t ly, th e a vera ge dista n ce bet ween
in cid en t s will be s h or t er in t h e cen t er t h a n fa r t h er ou t . Th is is n ot h in g m or e t h a n a
con sequ en ce of t h e dis t r ibu t ion of people. H owever , t o sa y s om et h in g a bou t con cen t r a t ion s of
in cid en t s a bove-a n d-beyon d t h a t exp ect ed by p opu la t ion r equ ir es u s t o exa m in e t h e pa t t er n of
populat ion a s well as of crime incidents.

Crim eS tat a llows t h e u se of in t en sit y a n d weigh t in g va r ia bles in t h e ca lcu la t ion of th e


K st a t ist ic. The u ser m u st defin e a n int en sit y or a weigh t (or bot h in sp ecial circum st a n ces)
on t h e pr im a r y file pa ge. Th e K r out in e will t h en u se t h e in t en sit y (or weigh t ) in t h e

5.25
ca lcu la t ion of L. In F igu r e 5.6 a bove, t h er e is a n en velop e pr odu ced fr om 100 r a n dom
simu lat ion s a s well as t h e L distr ibut ion fr om t h e 2000 popu lat ion ; t h e lat t er va r iable was
obta in ed by t a k in g t h e cent r oid of t r a ffic a n a lysis zon es from t h e 2000 censu s a n d u sin g
popu la t ion a s t h e in t en sit y var ia ble. As can be s een , t h e a m oun t of clu st er in g for r obber ies is
grea t er t h a n bot h t h e r a n dom en velope a s well as t h e dist r ibut ion of popula t ion . The r obber y
fu n ct ion is h igh er t h a n t h e p op u la t ion fu n ct ion u p t o a bou t 6 m iles . Th is in d ica t es t h a t
r obber ies a r e m or e con cen t r a t ed t h a n wh a t wou ld be exp ect ed fr om t h e popu la t ion
dis t r ibu t ion for a fair ly la r ge a r ea .

In ot h er wor ds, r obber ies a r e m or e clus t er ed t ogeth er t h a n even wh a t wou ld be


exp ect ed on t h e ba sis of t h e p op ula t ion dis t ribu t ion a n d t h is hold s for dis t an ces up to a bou t 6
m iles, wh er eu pon t h e dis t r ibu t ion of r obber ies is in dis t in gu is h a ble fr om a r a n dom
dis t r ibu t ion . For la r ger dis t a n ce sep a r a t ion s, t h e L fun ction h a s lit t le u t ilit y sin ce it is
u su a lly u sed t o un der st a n d loca lized spa t ia l a u t ocorr ela t ion (Ba iley a n d Ga t t r ell, 1995).

F or com pa r ison , figur e 5.7 below sh ows t h e dist r ibu t ion of 1996 bu r gla r ies, aga in
com p ar ed to a r a n dom en velop e a n d t h e d is t ribu t ion of p op ula t ion . We fin d t h a t bu r gla r ies
a r e m or e clu st er ed t h a n popu la t ion , bu t less so t h a n for r obber ies; t h e L va lu e is h igh er for
robberies th an for bur glaries for n ear dista nces but becomes more dispersed at a bout 3 miles;
it is still more concentr at ed tha n a ra ndom distr ibut ion, however, as seen by th e ran dom
en velope.. Th u s, t h e dist r ibut ion of L con firm s t h e r esu lt t h a t bur gla r ies t en d t o be spr ea d
over a m u ch la r ger geogr a ph ica l a r ea in sm a ller clu st er s t h a n st r eet r obber ies , wh ich t en d t o
be m or e con cen t r a t ed in la r ge clu st er s . In t er m s of look in g for ‘h ot sp ot s ’, on e wou ld exp ect t o
find more with robberies th an with burglar ies.

E d g e Co r re c t i o n s fo r Ri p le y ’s K

Th e L st a t ist ic is p r one t o edge effects ju st like t h e n ea r est n eigh bor s t a t ist ic. Th a t is,
for poin t s loca t ed nea r t h e bou n da r y of t h e s tu dy a r ea , t h e n u m ber en u m er a t ed by a n y cir cle
for t h ose point s will, a ll ot h er t h ings bein g equa l, n ecessa r ily be less t h a n poin t s in t h e cen t er
of t h e s tu dy a r ea beca u se p oin t s ou t sid e t h e bou n da r y a r e n ot cou n t ed . F u r t h er , t h e gr ea t er
t h e dist a n ce bet ween point s t h a t a r e bein g t est ed (i.e., th e gr ea t er t h e r a diu s of t h e circle
placed over ea ch point ), th e grea t er t h e bia s. Thu s, a plot of L aga inst dist a n ce will show a
declin ing cu r ve as dist a n ce incr eas es a s figu r es 5.6 a n d 5.7 show.

Th er e a r e va r ious a dju st m en t s t o th e fun ction t o help corr ect t h e bia s. On e is a ‘gu a r d


r a il’ wit h in t h e st u dy a r ea so t h a t poin t s ou t sid e t h e gu a r d r a il, bu t in sid e t h e st u dy a r ea ca n
on ly be cou n t ed for p oin t s in s id e t h e gu a r d r a il, bu t ca n n ot be u s ed for en u m er a t in g ot h er
poin t s wit h in a circle pla ced over t h em (t h a t is, th ey ca n on ly be j’s a n d n ot i’s, t o u se t h e
la n gu a ge of equ a t ion 5.17). Su ch a n oper a t ion , however , r equ ir es m a n u a lly cons t r u ctin g
t h ese gu a r d r a ils a n d en u m er a t in g wh et h er ea ch poin t ca n be bot h a n en u m er a t or a n d a
r ecip ien t or a r ecip ien t on ly. F or com plex bou n da r ies, s u ch a s a r e fou n d in m ost police
depa r t m en t s, t h is t ype of oper a t ion is ext r em ely t ediou s a n d d ifficult . 9

5.26
Figure 5.7:
"K" Statistic For 1996 Burglaries
Compared to Random and 2000 Population Distributions
L(t) = Sqrt[K(t)/pi] - t

Population
2

Burglary

1
L(t)

0
CSR

Envelope of 100 random simulations


-1

-2
0.09 1.39 2.7 4.0 5.3 6.6 7.9

Distance between points (miles)


Sim ila r ly, Rip ley h a s pr oposed a sim ple weigh t in g t o a ccoun t for t h e pr opor t ion of th e
circle pla ced over ea ch poin t t h a t is with in t h e st u dy ar ea (Ven a bles an d Ripley, 1997). Thu s,
equa tion 5.17 is re-written as:

A
K(t s ) = ------ G G Wij-1 I (t ij) (5.20)
N2 i j

wh er e W ij-1 is t h e in ver se of t h e pr oport ion of th e circum fer en ce of a cir cle of ra diu s, t s , placed
over ea ch poin t t h a t is wit h in t h e t ot a l s tu dy a r ea . Th u s, if a poin t is nea r t h e s tu dy a r ea
bord er , it will r eceive a gr ea t er weigh t becau se a sm a ller pr opor t ion of t h e circle pla ced over it
will be wit h in t h e st u dy a r ea . An a lt er n a t ive weigh t in g s ch em e ca n be fou n d in Ma r con a n d
P u ech (2003).

I n Crim eS tat, two possible corr ections a re condu cted. One assum es tha t t he stu dy
a r ea is a r ect a n gle wh ile t h e ot h er a ssu m es t h a t it is a circle.

R ect a n gu la r cor r ect ion

In t h e r ect a n gu la r cor r ect ion for Riple y’s K , t h e sea r ch cir cle ra diu s, R j, is compa red to
t h e edge of a n a ssu m ed r ect a n gle with a r ea , A, cen t er ed a t t h e m ea n cen t er . Fir st , th e a r ea t o
be an a lyzed is defined . If t h e u ser h a s sp ecified a st u dy ar ea on t h e m ea su r em en t pa r a m et er s
pa ge, t h en t h a t va lu e for A is t a k en . Th e m a xim u m boun din g r ect a n gle is t a k en (i.e.,
r ect a n gle defin ed by th e m inim u m a n d m a xim u m X/Y valu es) an d pr oport ion a t ely r e-scaled so
t h a t t h e a r ea of t h e r ect a n gle is equ a l t o A. If t h e u ser does n ot specify a n a r ea on t h e
m ea su r em en t pa r a m et er s p a ge, t h en t h e boun din g r ecta n gle defined by t h e m in im u m a n d
m a xim u m X/Y va lu es is ta k en for A.

Secon d, for ea ch p oin t , t h e m in im u m dis t a n ce to t h e n ea r est edge of t h is r ecta n gle is


calcula t ed in both t h e h orizont a l a n d ver t ical dir ection s, d (min R X ) a n d d (m in R Y ). Th ir d, ea ch
of th e minimu m dista nces is compa red to th e sear ch circle ra dius, R j.

1. If neit h er t h e m in im u m dis t a n ce in t h e X-dir ection - d(m in R X ), n or t h e


m in im u m dis t a n ce in t h e Y-dir ection - d(m in R Y ), a r e les s t h a n t h e sea r ch circle
ra dius, R j, th en t h e circle fa lls ent irely with in t h e r ect a n gle a n d E = 1;

2. If eit h er t h e m in im u m dis t a n ce in t h e X-dir ection - d(m in R X ), or t h e m in im u m


dis t a n ce in t h e Y-dir ection - d(m in R Y ), bu t NOT BO TH , a r e less t h a n t h e sea r ch
circle r a diu s, R j, th en par t of th e sear ch circle falls out side the recta ngle and a n
a dju st m en t is n ecess a r y. An a pp r oxim a t e a dju st m en t is m a de t h a n is in ver sely
pr opor t ion a l t o t h e a r ea of t h e sea r ch cir cle wit h in t h e r ect a n gle. Th e va lu es of
E will va r y bet ween 1 a n d 2 s in ce up t o on e-ha lf of t h e sea r ch circle could fall
ou t s id e t h e r ect a n gle;

3. If both t h e m in im u m dis t a n ce in t h e X-dir ection - d(m in R X ), a n d t h e m in im u m


dis t a n ce in t h e Y-dir ection - d(m in R Y ), ar e les s t h a n t h e sea r ch cir cle ra diu s, R j,

5.28
t h en a gr ea t er a dju st m en t is r equ ir ed sin ce E cou ld va r y bet ween 1 a n d 4 sin ce
u p t o t h r ee-fou r t h of t h e sea r ch circle cou ld fa ll ou t side t h e r ect a n gle.

Th e for m u las u sed t o ca lcu lat e t h e r ect a n gula r weigh t s a r e:

R ad iu s d oes n ot exten d beyon d th e rectan gle:

W ij-1 = k = 1 (5.21)

R ad iu s ext en d s beyon d one edge of th e rectan gle (bu t n ot tw o):

2B
W ij-1 = k = { ----------------------------------------- } (5.22)
{2B - 2Cos-1[d (m in R)/Ri]}

R ad iu s exten d s beyon d tw o ed ges of th e rectan gle:

2B
W ij-1 = k = { ------------------------------------------------------------------ } (5.23)
{1.5B - Cos -1[d (m in Rx)/Ri]-Cos -1[d (m in Ry)/Ri]}

Wh ile in t u it ive, t h is w eigh t , W ij-1 , is p r one t o ca u se u pw a r d ‘dr ift’ in t h e K fu n ction , so


a log tr an sform at ion is used:

W ’ij-1 = ln (W ij-1 ) + 1 (5.24)

This h a s t h e effect of t em per ing th e dr ift som ewha t . 1 0

Ci r cu la r cor r ect ion

In t h e circula r cor r ect ion for Riple y’s K , t h e sea r ch cir cle ra diu s, R j, is com pa r ed t o t h e
edge of a n a ssu m ed cir cle with a r ea , A, cen t er ed a t t h e m ea n cen t er . Fir st , th e a r ea t o be
a n a lyzed is defined . If t h e u ser h a s sp ecified a st u dy ar ea on t h e m ea su r em en t pa r a m et er s
pa ge, t h en t h a t va lu e for a is t a k en . Th e r a diu s of th e circle, R j, is ca lcula t ed by equ a t ion 5.8
a bove. If t h e u ser h a s n ot specified a st u dy ar ea on t h e m ea su r em en t pa r a m et er s pa ge, t h en
A is ca lcu la t ed fr om t h e m a xim u m bou n din g r ect a n gle a n d t h e r a diu s of t h e cir cle is
ca lcu lat ed by equa t ion 5.8 above.

Secon d, for ea ch p oint , t h e dist a n ce fr om t h a t poin t t o th e m ea n cent er , R j, is


ca lcu lat ed. The n ea r est dist a n ce fr om t h e poin t t o t h e circle’s edge is given by

R jC = R - R j (5.25)

5.29
Th ir d, t h e sea r ch cir cle ra diu s, R j, is comp a r ed t o th e n ea r es t edge of t h e circle, R iC ,
a n d t h e weigh t will var y fr om 1 (poin t a n d r a diu s t ot a lly with in t h e st u dy ar ea ) t o 2.3834
(poin t is loca t ed exa ct ly on bou n da r y of a r ea cir cle). Th e for m u la s for t h e cir cu la r cor r ect ion
a r e:

2 = Cos-1 {(r 2 + t C 2 - R 2 ) / [ 2*r *t C ]} (5.26)

W ij-1 = k = B / 2 (5.27)

wh er e r is t h e r a diu s of th e sea r ch cir cle, R is t h e r a diu s of th e circula r st u dy a r ea , a n d t C is


t h e dist a n ce fr om t h e poin t t o t h e cen t er of t h e circula r st u dy ar ea .

For eit h er cor r ect ion

Du r in g t h e ca lcu la t ion of Ripley’s K, ea ch poin t is m u lt ip lied by E (a sid e fr om W or I)


a n d t h e K a n d L st a t is t ics a r e ca lcu la t ed a s befor e (see ch a pt er 5). Th e sim u la t ion of r a n dom
poin t dis t r ibu t ions is t r ea t ed in a n a n a logou s w a y.

Wh ile in t u it ive, t h is w eigh t , W ij-1 , is p r one t o ca u se u pw a r d ‘dr ift’ in t h e K fu n ction , so


a log tr an sform at ion is used:

W ’ij-1 = ln (W ij-1 ) + 1 (5.28)

This h a s t h e effect of t em per ing th e dr ift som ewha t .

F igu r e 5.8 below sh ows a Ripley’s K dist r ibut ion for 1996 Balt imore Coun t y bur gla r ies,
with a n d wit h ou t edge cor r ect ion s. As can be seen , th e u n cor r ect ed L dist r ibut ion decrea ses
a n d fa lls below t h e t h eor et ica l r a n dom cou n t (com plet e spa t ia l r a n dom n ess, L=0) a ft er a bou t
7 m iles wher ea s n eith er t h e L dist r ibut ion with t h e r ect a n gula r cor r ect ion n or t h e L
dis t r ibu t ion wit h t h e cir cu la r dis t r ibu t ion do so. As exp ect ed, t h e r ect a n gu la r dis t r ibu t ion
p r od u ces t h e m os t con cen t r a t ion .

Output Inte rme diate Res ults

Th er e is a box la beled “Ou t pu t in t er m edia t e r esu lt s”. If ch ecked, a sepa r a t e dbf file
will be ou t pu t t h a t list s t h e int er m edia t e ca lcu lat ion s. The file will be ca lled
“RipleyTempOu tpu t.dbf”. There are five out put fields:

1. Th e point n u m ber (PO IN T), st a r t in g a t 0 (for t h e first poin t ) an d p r oceed in g t o


N –1 (for t h e N t h poin t )
2. The sear ch r adius in m eters (SEARCHRADI)
3. Th e cou n t of t h e n u m ber of ot her point s t h a t a r e wit h in t h e sea r ch r a diu s
(COUN T)
4. Th e weigh t a ssign ed, ca lcu lat ed from equ a t ion s 5.24 or 5.28 above (WEIG H T)
5. Th e cou n t t im es t h e weight (CTIME SW)

5.30
Figure 5.8:
"K" Statistic For 1996 Burglaries
With Different Types of Corrections
L(t) = Sqrt[K(t)/pi] - t

Rectangular correction

3
L(t)

Circular correction

1
No correction

0
0.09 1.39 2.7 4.0 5.3 6.6 7.9

Distance between points (miles)


K-Function Analysis to Determine Clustering in the
Police Confrontations Dataset in
Buenos Aires Province, Argentina: 1999

Gastón Pezzuchi, Crime Analyst


Buenos Aires Province Police Force
Buenos Aires, Argentina

Sometimes crime analysts tend to produce beautiful hot spot maps without
any formal evidence that clustering is indeed present in the data. One excellent and
powerful tool that CrimeStat provides is the computation of the K function, which
summarizes spatial dependence over a wide range of scales, and uses the
information of all events.

We computed the K function using 1999 police confrontations data (mostly


shootings) within our study area1 and ran 100 Monte Carlo simulations in order to
test for spatial randomness 2 (see figure below); the K function showed clustering up
to about 30 Km. Yet, spatial randomness is not a particularly meaningful hypothesis
to test considering that the “population at risk” are highly clustered. Hence we used
police deployment data as a base population and calculated the K function for that
data set. As can seen, the amount of clustering for the confrontation dataset is much
greater than both the random envelope as well as the distribution of police officers.

K Statistic for the 1999 Dataset


L(d) = Sqrt[K(d)/π] - d
6

4
Observed L(d)

2 Base-Population L(d)

-2
L(d)

-4

-6 100 Sim. Envelope

-8 L(d)
CSR
L(d)_MIN
-10 L(d)_MAX
L(d) Base Population
-12
0 10 20 30 40
Distance Between Points [km]

1 A years worth dataset of events occurring within a 9,500 km2 area around the Federal Capital (29
counties).
2 Remember that Pr( L(d) > Lmax) = Pr( L(d) < Lmin) = 1 / (m + 1) where m is the number of

independent simulations,
Th is ou t pu t ca n be u sefu l for exa m in in g t h e cou n t s for specific poin t s or for t r yin g ou t
altern at ive weight ing schemes.

S o m e C a u t i on s i n U s i n g R i p le y ’s K

Wh ile Ripley’s K is a power ful t ool for a n a lyzing s pa t ia l a u t ocorr ela t ion (us u a lly
clus t er ing, ra t h er t h a n disper sion ), like a n y st a t ist ic it is pr on e t o bias es. We’ve discu ssed
edge bia ses a bove. But t h er e a r e oth er s. F ir st , t h er e is a sa m ple size is su e. Th e r out in e
ca lcu la t es 100 sepa r a t e L(t ) va lu es, on e for ea ch dis t a n ce bin . H owever , t h e pr ecis ion of an y
on e L(t ) va lu e is dep en den t on t h e s am p le s ize. Wit h a sm a ll s am p le, t h er e is in su fficien t da t a
t o est im a t e 100 in depen den t va lu es of L(t ). Wh ile t h e Mon t e Ca r lo s im u la t ion pa r t ly ca n
a ccoun t for t h a t bia s, it h a s t o be r ea lized t h a t t h e pr ecision of t h e in t er pr et a t ion is su sp ect .
F or exam ple, in com pa r ing t wo sim ilar dist r ibut ion s, sa y robberies a n d bu r gla r ies, un less t h e
sa m ple size is la r ge differ en ces for a n y on e bin cou ld ea sily be du e t o ch a n ce. One would n eed
a ver y differ en t t ype of pr ocedu r e t o est im a t e t h e ‘st a n da r d er r or’ of t wo fu n ctions wit h a sm a ll
sa m ple. But , I wou ld su spect t h a t t h er e wou ld be ma n y bin s for wh ich t h ey wou ld be
in dis t in gu ish a ble (sh own a s t h e t wo fun ction s cr iss -cr ossin g ea ch ot h er ).

In pr eviou s ver sion s of Crim eS tat, t h er e wa s a r est r iction of a t lea st 100 d a t a point s t o
displa y th e ent ire 100 L(t) est ima t es; ot h er wise, th ey were t r u n ca t ed. In t h is version, all 100
in t er va ls a r e a llowed for a n y size s a m ple. H owever , t h er e is a st r ict wa r n in g. Us er s s h ould
be ver y ca u t iou s in dr a win g conclusion s a bout differ en ces in t h e L fun ction wit h sm a ll
sa m ples. E ven wit h sa m ple sizes gr ea t er t h a n 100, t h e im pr ecision of a n y on e L(t ) va lu e is
con sid er a ble. U n t il t h e sa m ple sizes get in t o th e h u n dr eds, p r ecision is a n iss u e for sp ecific
L(t) values.

A s econ d ca u t ion h a s t o d o wit h t h e s ca le of t h e in t er p r et a t ion . Da t a s et s wit h s t r on g


first-ord er pr oper t ies (i.e., a h igh degr ee of cent r a l concen t r a t ion of in ciden t s) will exer t bia s
on Rip ley’s K st a t ist ic. Th u s, a n y da t a set t h a t is cor r ela t ed wit h h u m a n popu la t ion s will
m ost lik ely h a ve a ver y s t r on g ‘cen t r a l t en den cy’. Th u s, t h er e will be a h igh degr ee of
con cen t r a t ion in t h e L va lu es for even n ea r d is t a n ces . Th is wa s seen in t h e r obber y a n d
bu r gla r y dat a sh own a bove. The K s t a t ist ic wa s crea t ed t o est ima t e secon d -ord er s pa t ia l
a u t ocorr ela t ion , na m ely localized clu st er in g. However , if t h e firs t -or der effect is s o dom in a n t ,
t h en it’s h a r d t o disen t a n gle it from a secon d-or der effect . Tha t is, it ’s oft en n ot clear wh et h er
t h e clu st er in g t h a t is obs er ved in Ripley K is du e t o pr im a r y, fir st -or der clu st er in g or a ct u a l
loca lized, s econ d-or der clu st er in g. Th a t ’s wh y it is gen er a lly wis e t o u se t h e K st a t is t ic for
sh or t dis t a n ce r a n ges a n d n ot for la r ger dis t a n ce sepa r a t ion s. F or la r ger dis t a n ce sepa r a t ion s,
it is a lm ost im possible t o tell wh et h er t h e effect is du e t o t h e la r ge cen t r a l con cen t r a t ion of th e
popu la t ion or wh et h er t h er e a r e in t er a ction s bet ween n eigh borh oods a t a la r ge scale .

Th er e a r e differ en t wa ys t o h a n dle t o pr oblem, n on e of wh ich a r e per fect . For exam ple,


one can estimat e a first-order concentr at ion effect a nd t hen a pply Ripley’s K to th e residua ls.
Or , a lt er n a t ively, on e ca n u se a ba selin e popu la t ion t o calcu la t e a r a t e a n d t est for
con cent r a t ion on ly in t h e ra t es, not t h e volum es of incident s. In cha pt ers 6 an d 8, th ere will be
a dis cus sion of u sin g a ba selin e popu la t ion t o cont r ol for fir st -or der effects. Bu t , wh et h er t h is

5.33
is don e or n ot , t h e u ser sh ou ld be a wa r e of t h e in t er a ct ion bet ween fir st -or der a n d secon d-
or der (or loca lized) effect s.

Th e t h ird ca u t ion h a s t o do with t h e sh a pe of t h e bou n da r ies in in t er pr et ing t h e K


st a t ist ic. This is pa r t icu lar ly t r u e wh en a n edge cor r ect ion is ap plied. Un less t h e st u dy ar ea
wa s a n a ct u a l r ect a n gle, t h e cor r ect ion m a y a lt er t h e in t er pr et a t ion com pa r ed t o t h e
u n cor r ect ed L. Th er e a r e som e su bt le differ en ces bet ween t h e t wo, h owever , s o som e ca r e
sh ould be u se d. Th e em pir ical L is obt a in ed from t h e point s w it h in t h e st u dy a r ea , t h e
geogr a ph y of wh ich is u su a lly ir r egu la r . Th e r a n dom L, h owever , is ca lcu la t ed fr om a
r ect a n gle or a cir cle. Th u s, t h e differ en ces in t h e sh a pe com pa r is on s m a y a ccou n t for som e
va r ia t ion s .

Th e r ea lism of t h e cor r ected fun ction depen ds on t h e va lidit y of th e u n der lying


a ss u m pt ions . If it is lik ely t h a t t h er e a r e point s ou t sid e t h e st u dy a r ea , t h en a weigh t in g m a y
pr odu ce a m or e r ea list ic in t er pr et a t ion of t h e L fu n ct ion . On t h e ot h er h a n d, if t h e den sit y of
t h e poin t s out side t h e st u dy ar ea is lower (e.g., if t h e st u dy ar ea is a m et r opolita n a r ea , th en
t h e a r ea ou t side is m or e likely to be su bu r ba n or r u r a l an d of low popula t ion den sit y), th en t h e
weigh t in g will exa gger a t e t h e fun ction r ela t ive t o wha t it sh ould be. In t h e ext r em e cas e, if
t h e st u dy ar ea is an islan d (e.g., Honolu lu), t h en t h er e a r e n o poin t s out side t h e st u dy ar ea
a n d n o weight in g is ju st ified. Even wh en weigh t in g would be ju st ified, t h e a ctu a l boun da r y is
pr oba bly n ot a r ect a n gle or a squ a r e so t h a t t h e geom et r ic cor r ect ion a bove ma y distort t h e L
fu n ct ion , t oo. In s h or t , s om e u n d er s t a n din g of t h e ba s is for weigh t in g is n eces sa r y t o p r od u ce
a r ea s on a ble L fu n ct ion .

Assi g n P r i mary P o in ts to S ec on dar y P oint s

Th is r ou t ine will ass ign ea ch pr ima r y point t o a secon da r y point a n d t h en will su m by


t h e n u m ber of pr im a r y p oin t s a ssign ed t o ea ch secon da r y p oin t . Th e r ou t in e is u sefu l for
su m m a r izin g dat a . For exam ple, if t h e pr ima r y file r epr esen t s t h e n u m ber of r obber ies a n d
t h e secon da r y file r epr esen t s t h e cent r oids of cens u s t r a cts , t h en t h e r out in e will a ss ign a ll
r obber ies t o a cen su s t r a ct a n d w ill t h en su m t h e n u m ber of r obber ies in ea ch cen su s t r a ct.
Th e r esu lt is a cou n t of t h e n u m ber of pr ima r y point s for ea ch secon da r y point (zon e). Oth er
exam ples m igh t be to as sign st u den t s t o t h e n ea r est school or t o a ssign pa t ient s t o t h e n ea r est
h ospit a l. Th er e a r e m a n y uses for su m m a r izing da t a by a n oth er da t a r efer en ce. In t h e Tr ip
Gen er a t ion m odu le (u n der Cr im e Tr a vel Dem a n d - see ch a pt er 13), a m odel is develop ed for
t h e n u m ber of cr imes or igina t ing in ea ch zon e a n d a sepa r a t e m odel for t h e n u m ber of cr imes
en din g in ea ch zon e. Th e “Assign pr im a r y poin t s t o secon da r y poin t s” r out in e is a good w a y t o
sum ma rize the nu mber of crimes by zones.

Th er e a r e t wo m et h ods for a ss ign in g t h e pr im a r y poin t s t o th e secon da r y.

N ea r e st n ei gh b or a s si gn m e n t

Th is r out in e a ss ign s ea ch p r im a r y poin t t o th e secon da r y poin t t o which it is closes t . It


goes t h r ough a ll t h e pr im a r y poin t s a n d s u m s t h e n u m ber a ss ign ed t o each s econ da r y poin t .

5.34
Th u s, t h e logical oper a t ion is ‘n ea r est t o’. If th er e a r e t wo or m ore s econ da r y point s t h a t a r e
exa ctly equ a l, th e a ss ignm en t goes t o th e firs t one on t h e list .

P oi n t -i n -p o lyg on a s si gn m e n t

Th is r out in e a ss igns ea ch p r im a r y point t o th e secon da r y point for wh ich it falls wit h in


it s p olygon (zone). Th e poin t -in-polygon a ss ignm en t r ea ds a zon a l boun da r y file (in ArcView
‘sh p’ for m a t ) an d d et er m in es wh ich zon e ea ch p r im a r y poin t falls wit h in . In t h is ca se , t h e
logica l op er a t ion is ‘belon gs t o’. A zon e (p olygon ) s h a pe file m u s t be p r ovid ed a n d t h e r ou t in e
ch eck s wh ich s econ d a r y zon e ea ch p r im a r y p oin t fa lls wit h in .

Most GIS pa cka ges ca n do a poin t -in-polygon oper a t ion but few a llow a n ea r est
n eigh bor a ssign m en t . In gen er a l, t h e t wo a r e sim ilar t h ou gh t h er e will be differ en ces du e t o
t h e irr egula r sh a pe of zon e bou n da r ies. For exam ple, figu r e 5.9 below sh ows a n inciden t t h a t
is w it h in Tr a ffic An a lysis Zon e (TAZ) 0546, bu t is a ctu a lly close r t o th e cen t r oid of TAZ 0547.
Th e cha r a cter ist ics a ss ocia t ed wit h t h is in ciden t a r e m ore lik ely t o be ass ocia t ed wit h t h e
ch a r a ct er is t ics of t h e secon d zon e t h a n t h e zon e t o wh ich it belon gs . Th e decis ion on wh ich
cr it er ia t o u se in a ssign in g t h e in cid en t t o a zon e depen ds on h ow in t egr a l is t h e zon e t o wh ich
it belon gs . If t h e zon es a r e bou n ded by m a jor a r t er ia ls , t h en t r a vel beh a vior wit h in t h e zon e
wit h be defined by t h ose a r t er ia ls; in t h is ca se, it would pr obably be pr u den t t o us e t h e poin t -
in -p olygon a s sign m en t . On t h e ot h er h a n d, if t h e zon e bou n d a r ies a r e n ot a fu n d a m en t a l
sepa r a t ion , t h en t h e n ea r est n eigh bor a ssign m en t wou ld pr oba bly pr odu ce a bet t er fit t o t h e
in cid en t sin ce t h e ch a r a ct er is t ics of t h e clos er zon e a r e lia ble t o h old for t h e in cid en t . In sh or t ,
th e user mu st decide on which t heoretical basis to assign points.

Z on e fi l e

A zon a l file m u st be pr ovided. Th is is a polygon file t h a t defin es t h e zon es t o wh ich t h e


pr ima r y point s a r e a ssign ed. The zon e file sh ou ld be th e sa m e a s t h e secon da r y file (see
Secon da r y file). For ea ch poin t in t h e pr ima r y file, t h e r ou t ine iden t ifies wh ich polygon (zon e)
it belon gs t o a n d t h en su m s t h e n u m ber of poin t s per polygon .

N a m e of a s s i g n ed v a r i a b l e

Sp ecify t h e n a m e of t h e su m m ed va r ia ble. Th e defa u lt n a m e is F RE Q.

U s e w e i gh t i n g fi l e

Th e pr im a r y file r ecor ds ca n be weigh t ed by a n ot h er file. Th is wou ld be u sefu l for


cor r ect ing t h e t ot a ls fr om t h e pr ima r y file. For exa m ple, if t h e pr ima r y file wer e r obber y
inciden t s from a n a r r est r ecor d, th e su m of t h is var iable (i.e. th e t ot a l nu m ber of r obber ies)
m a y produce a biased dist r ibut ion over t h e secon da r y file zon es beca u se t h e pr ima r y file was
n ot a r a n dom sa m ple of a ll inciden t s (e.g., if it cam e fr om a n a r r est r ecor d wh er e t h e
dis t r ibu t ion of robbe r y a r r es t s is n ot t h e sa m e a s t h e dist r ibu t ion of all r obber y in ciden t s).

5.35
Figure 5.9:
Incident Assignment
Point in Relation to Traffic Analysis Zone Boundaries and Centroids

% 0542

0543%

0548
%

%
0547

# 0545 # Incident

% % Traffic Analysis Zone centroid


Traffic Analysis Zone

0546

% N

W E

0 0.5 1 Miles
Th e secon da r y file or a n ot h er file ca n be us ed t o a djus t t h e su m m ed t ot a l. The
weight ing var iable sh ou ld h a ve a field t h a t ident ifies t h e r a t io of t h e t r u e t o t h e m ea su r ed
cou n t for ea ch zon e. A va lu e of 1 in dica t es t h a t t h e su m m ed va lu e for a zon e is equ a l t o t h e
t r u e valu e; h en ce n o a djus t m en t is need ed. A valu e grea t er t h a n 1 indicat es t h a t t h e su m m ed
valu e n eeds t o be ad just ed u pwa r d t o equa l th e t r u e valu e. A valu e less t h a n 1 indicat es t h a t
t h e su m m ed valu e n eeds t o be ad just ed downwa r d t o equa l th e t r u e valu e.

If an oth er file is t o be used for weigh t in g, ind icat e wh et h er it is t h e secon da r y file or, if
a n ot h er file, t h e n a m e of t h e ot h er file.

N am e of assigned w eighted variable

F or a weigh t ed su m , sp ecify t h e n a m e of t h e va r ia ble. Th e defa u lt will be ADJ F RE Q.

S a v e r es u lt t o

F or bot h r ou t in es, t h e ou t pu t is a 'dbf' file. Defin e t h e file n a m e. N ot e: be ca r efu l a bou t


u sin g t h e sa m e n a m e a s t h e secon da r y file a s t h e sa ved file will h a ve t h e n ew va r ia ble. It is
best t o give it a n ew n a m e.

A n ew va r ia ble will be a dd ed t o th is file t h a t gives t h e n u m ber of pr im a r y point s in


ea ch secon da r y file zon e a n d, if weigh t in g is u sed, a secon da r y va r ia ble will be a dded wh ich
h a s t h e a dju st ed frequ en cy.

E xa m ple : As si gn in g Ro bb e ri e s to Zo n e s

To illus t r a t e t h e r ou t ine, t a ble 5.4 sh ows t h e r esu lts of su m m a r izin g 1181 1997
r obber ies t h a t occu r r ed in Ba lt im or e t o 32 5 Tr a ffic An a lysis Zon es. Th e t wo m et h ods a r e
compa red. Only th e first 30 assignm ents a re shown. In genera l, th ey give similar results.
H owever , t h er e a r e differ en ces du e t o t h e m et h od. On e is t h a t t h e n ea r est n eigh bor m et h od
will a ss ign poin t s on t h e ba sis of pr oximit y wh ile t h e point -in-polygon m et h od will n ot. I n t h e
ca se of t h e Ba lt im or e Cou n t y r obber ies, s om e of t h ese wer e a ssign ed t o a Cit y of Ba lt im or e
TAZ becau se t h ose TAZ’s wer e closer , r a t h er t h a n t o a Ba lt im ore Coun t y TAZ. An oth er is t h a t
if a zon e is ver y ir r egu la r , p oin t s m a y be a ssign ed t o it u n der t h e poin t -in -polygon m et h od
wh ich m a y be qu it e far a wa y.

Thus, the u ser ha s to decide which m ethod ma kes th e most sense. If th e purpose is to
a ss ign in ciden t s t o th e zone wh ich it is m ost like ly to be rela t ed, for exa m ple, wh en developin g
a da ta set for zona l modeling (see cha pters 12 and 13), th en th e nearest neighbor m ethod ma y
pr odu ce a bet t er r epr esen t a t ion . Th e in cid en t s a r e t h en a ssign ed t o a zon e wh ich h a s
cha r a cter ist ics t h a t pr obably will be r ela t ed t o th e fact ors cau sin g t h e in ciden t s in t h e firs t
pla ce. On t h e oth er h a n d, if t h e object is t o as sign in ciden t s on t h e ba sis of m em ber sh ip (e.g.,
a ssign in g cr im es t o police pr ecin ct s), t h en t h e poin t -in -polygon m et h od will be t h e m ost
a ccu r a t e.

5.37
Ta ble 5.4

Assignin g In cident s t o Zon es


1181 1997 Robber ies a n d 325 Tr a ffic An a lysis Zon es

T A Z P o i n t -i n -P o l y g o n N e a r e s t N e i g h b o r
0401 0 0
0402 0 0
0403 1 1
0404 0 0
0405 0 0
0406 0 0
0407 0 0
0408 0 0
0409 0 0
0410 0 0
0411 0 0
0412 0 0
0413 0 0
0414 1 1
0415 0 0
0416 0 0
0417 0 0
0418 0 0
0419 0 0
0420 0 0
0421 0 0
0422 0 1
0423 0 0
0424 1 0
0425 3 0
0426 2 2
0427 3 2
0428 0 0
0429 5 5
0430 0 0

D is ta n c e An a ly s is II

Th e r em a in in g dis t a n ce an a lysis r out in es a r e on t h e Dis t a n ce An a lysis II pa ge. F igu r e


5.10 shows t h e pa ge.

D is ta n c e Ma tri ce s

Crim eS tat h a s t h e ca pa bilit y for ou t pu t t in g d ist a n ce m a t r ices. Th er e a r e fou r


t ypes of m a t r ices t h a t can be out pu t .

5.38
Figure 5.10: Distance Analysis II Screen
1. F ir st , t h e dist a n ce bet ween ever y point in t h e pr im a r y file a n d ever y ot h er point
can be calcula t ed in m iles, n a u t ical m iles, feet , kilom et er s or m et er s. Th is is
ca lled th e Within File Point-to-Point m a t r ix (Ma t r ix).

2. Secon d, if t h er e is a lso a secon da r y file, Crim eS tat ca n ca lcu la t e t h e d is t a n ce


from ever y point in t h e pr im a r y file t o every point in t h e secon da r y file, a ga in in
m iles, n a u t ica l miles, feet , kilom et er s or m et er s. This is called t h e From
Prim ary File Poin ts to S econ d ary File Poin ts m a t r ix (Im a t r ix).

3. Th ir d, if th er e is a r efer en ce file defined , t h e dist a n ce fr om ea ch p r im a r y point


t o ea ch grid cell ca n be com pu t ed. This is called t h e From Prim ary File Points to
Grid m a t r ix (PGMa t r ix).

4. F ou r t h , if t h er e is a ls o a secon da r y file a n d a r efer en ce file, t h e dis t a n ce fr om


ea ch secon da r y point t o ea ch grid cell ca n be com pu t ed. This is called t h e From
S econ d ary File Point s to Grid m a t r ix (SGMa t r ix).

E a ch of t h ese t yp es of m a t r ices ca n be dis pla yed or sa ved t o a n Ascii t ext file for im por t
in t o a n ot h er p r ogr a m . E a ch m a t r ix d efin es in cid en t s by t h e or d er in wh ich t h ey occu r in t h e
files (i.e., Record n u m ber 1 is list ed a s ‘1'; recor d n u m ber 2 is list ed ‘2'; an d s o fort h ). Only a
su bset of ea ch m a t r ix is disp la yed on t h e r esu lt s t a b. H owever , t h er e a r e h orizont a l a n d
vert ica l slider ba r s t h a t a llow t h e u ser t o scroll th r ou gh t h e m a t r ix. The u ser sh ou ld m ove th e
vert ica l slide ba r firs t t o a n a ppr oxim a t e pr oport ion of t h e m a t r ix a n d click t h e Go bu t t on .
Th e m a t r ix will scroll th r ou gh t h e r ows of t h e m a t r ix t o a pla ce wh ich r epr esen t s t h a t
pr oport ion ind ica t ed in t h e slide bar . The u ser ca n t h en scroll across t h e r ows wit h t h e u pper
s lid e ba r .

Th e m a t r ices can be us ed for var iou s pu r poses. The w ith in file point -to-point m atrix
ca n be us ed t o exam ine dist a n ces bet ween pa r t icu lar inciden t s. The saved Ascii ‘.txt’ m atrix
ca n a ls o be im p or t ed in t o a n et wor k pr ogr a m for es t im a t in g t r a n s por t a t ion r ou t es . Th e
prim ary-to-secon d ary file m atrix can be u se d in opt im iza t ion r out in es , for exa m ple in t r yin g t o
a ss es s opt im a l a lloca t ion of police car s in ord er t o min im ize r es pon se t im e in a police dis t r ict.
Th e dis t a n ces t o t h e gr id cells ca n be u sed t o com pa r e t h e dis t a n ces for differ en t dis t r ibu t ion s
t o a cen t r a l loca t ion (e.g., a police st a t ion ). Ther e a r e m a n y app lica t ion s wh er e dist a n ces a r e
t h e pr ima r y un it of a n a lysis. H owever, t h e u ser will n eed oth er soft wa r e t o r ea d t h e files.

Be ca r efu l in ou t pu t t ing dist a n ces, th ou gh, beca u se t h e files will gen er a lly be very
lar ge. F or exam ple, a pr ima r y file of 1000 inciden t s wh en int er pola t ed t o 9000 grid cells (100
colu m n s x 90 r ows) will p r odu ce 9 m illion pa ir ed com pa r is on s. Su ch a file will t a k e a lot of
disk spa ce. For t h a t r ea son, we on ly a llow ou t pu t t o a n Ascii text file.

Th is con clud es t h e discu ss ion of secon d-ord er pr oper t ies. Th e n ext t wo ch a pt er s will
dis cus s t h e ide n t ificat ion of ‘h ot s pot s’ wit h Crim eS tat.

5.40
E n d n ot es for Ch a p t er 5

1. Th er e is also a m ea n r a n dom dist a n ce for a disper sed p a t t er n , ca lled th e m ean


d isp ersed d ist an ce (Ebdon, 1988). It is defined as

SQRT[2]
d(dis ) = -------------------------
3 1 /4 SQRT[ N/A ]

A n ea r es t n eigh bor in d ex ca n be s et u p com p a r in g t h e obs er ved m ea n n eigh bor


dis t a n ce wit h t h a t expect ed for a dis per se d p a t t er n . Crim eS tat only provides the
t r a dit ion a l nea r est n eigh bor ind ex, but it does ou t pu t t h e m ea n disper sed d ist a n ce.

2. U n for t u n a t ely, t h e t er m ord er wh en u sed in t h e con t ext of n ea r est n eigh bor a n a lysis
ha s a slight ly different mean ing th an when used as first-ord er com p ar ed to secon d -
ord er st a t ist ics. In t h e n ea r es t n eigh bor con t ext , ord er really mean s n eighbor
wh er ea s in t h e t ype of st a t ist ics con t ext , ord er mean s th e scale of th e stat istics,
globa l or local. Th e u se of t h e t er m s is h ist orical.

3. It m ight be possible to test with a Mont e Car lo simu lation. Tha t is, two separa te
r a n dom sa m ples of 1181 ‘r obber ies’ a n d 6051 ‘bu r gla r ies’ r espectively wou ld be
dr a wn . The n ea r est n eigh bor dist a n ce for ea ch of t h ese sa m ples would be ca lcu lat ed
an d th e rat io of th e two would be ta ken. This experiment would be repeated m an y
t im es (e.g., 1000 or m or e) t o yield a n a ppr oxim a t e 95% confid en ce in t er va l of t h e
r a t io.

4. Th er e is n ot a h a r d-a n d-fa st r u le a bou t h ow m a n y K-or der n ea r est n eigh bor


dis t a n ces m a y be calcu la t ed. Cr essie (1991, p. 613) sh ows t h a t er r or in crea ses wit h
increa sin g or der a n d t h e degree of divergen ce fr om a n edge-cor r ect ed m ea su r e
increa ses over t ime. In a t est ca se of 584 point loca t ion s, he s h ows t h a t even a ft er
on ly 25 n ea r est n eigh bor s, t h e u n cor r ect ed m ea su r e yield s opposit e con clu sion s
a bou t clus t er ing fr om t h e cor r ect ed m ea su r es. So, as a r ou gh a ppr oxim a t ion , or der s
no great er th an 2.5% of th e cases should be calculat ed.

5. Beca u se Crim eS tat u ses in dir ect dist a n ce for t h e lin ea r n ea r est n eigh bor ind ex (i.e.
m ea su r em en t on ly in a n h or izon t a l or ver t ica l d ir ect ion ), t h er e is a sligh t dis t or t ion
t h a t can occur if t h e in ciden t s a r e dist r ibu t ed in a dia gon a l m a n n er , su ch a s wit h
St a t e H igh wa ys 26 a n d 150 in F igu r e 5.4. Th e dist ort ion is ver y sm a ll, h owever .
F or exam ple, wit h t h e inciden t s a lon g Sta t e H igh wa y 26, a ft er r ot a t ing t h e inciden t
poin t s so th a t t h ey fell a ppr oxim a t ely in a h or izon t a l or ient a t ion , th e obser ved
a vera ge linea r n ea r est n eigh bor dist a n ce decrea sed s ligh t ly fr om 0.05843 m iles to
0.05061 m iles a n d t h e lin ea r n ea r est n eigh bor in dex beca m e 0.8354 (t =-.91; n ot
significa n t ). In oth er wor ds, t h e effect s of t h e dia gon a l dist r ibut ion lengt h en ed t h e
est ima t e for t h e a vera ge linea r n ea r est n eigh bor dist a n ce by about 41 feet com pa r ed
t o t h e a ct u a l dist a n ces bet ween inciden t s. For a sm a ll sa m ple size, t h is cou ld be
r eleva n t , bu t for a la r ger sa m ple it gen er a lly will be a sm a ll dist ort ion . H owever , if

5.41
a m or e pr ecis e m ea su r e is r equ ir ed, t h en t h e u ser sh ou ld r ot a t ion t h e dis t r ibu t ion
so t h a t t h e in cid en t s h a ve a s clos ely a s possible a h or izon t a l or ver t ica l or ien t a t ion .
An a lterna tive is to calculat e the regular n earest neighbor dista nce but use a
n et wor k for dis t a n ce ca lcu la t ion s (see ch a pt er 3).

6. Th is for m of t h e L(t s ) is t a k en fr om Cr essie (1991). In Ripley’s or igin a l for m u la t ion


(Ripley, 1976), dista n ce is not s u bt r a ct ed from t h e squ a r e r oot fu n ct ion . The
a dva n t a ge of t h e Cres sie for m u lat ion is t h a t a com plet e r a n dom dist r ibut ion will be
a st ra ight line tha t is para llel to th e X-axis.

7. In ea r lier ver sion s of Crim eS tat, th e dist a n ce wa s h a lf t h e side of a n a ssu m ed


squ a r e. It h a s been r edu ced in Crim eS tat III to empha size the nea r dista nces to
poin t s. Th e st a t is t ic d oesn ’t m a k e m u ch sen se over a la r ger st u dy r egion .

8. Note, t h a t sin ce th er e is n ot a for m a l t est of sign ifican ce, th e com pa r ison wit h a n
en velope pr odu ced from a n u m ber of sim u la t ion s p r ovides on ly app r oxim a t e
con fid en ce a bou t wh et h er t h e d is t r ibu t ion d iffer s fr om ch a n ce or n ot . Th a t is , on e
ca n n ot s a y t h a t t h e lik elih ood of obt a in in g t h is r es u lt by ch a n ce is les s t h a n 5%, for
exam ple.

9. Th e ‘gu a r d r a il’ con cept , wh ile frequ en t ly used, is poor m et h odology becau se it
in volves ign or in g d a t a n ea r t h e bou n d a r y of a s t u dy a r ea . Th a t is , p oin t s wit h in t h e
gua r d r a il a r e on ly a llowed t o be select ed by ot h er poin t s a n d n ot , in t u r n , be
a llowed t o select ot h er s. This h a s t h e effect of t h r owing ou t da t a t h a t cou ld be very
im p or t a n t . It is an a logou s t o t h e old , bu t for t u n a t ely n ow d is ca r ded , p ra ct ice of
t h r owin g ou t ‘ou t lier s’ in r egr ession a n a lysis beca u se t h e ou t lier s wer e som eh ow
seen a s ‘n ot t yp ica l’. Th e gu a r d r a il con cept is a ls o poor policin g p r a ct ice sin ce
in cid en t s occu r r in g n ea r a bor d er m a y be ver y im p or t a n t t o a p olice d ep a r t m en t a n d
m a y r equ ir e coor din a t ion wit h a n a dja cen t ju r is dict ion . In sh or t , u se m a t h em a t ica l
adjustm ents for edge corr ections or, failing th at , leave th e data as it is.

10. Th e u se of a log fu n ct ion for t h e weigh t is differ en t t h a n in pr eviou s ver sion s of


Crim eS tat, both for t he recta ngular a nd circular corr ections.

5.42

You might also like