Professional Documents
Culture Documents
Distance Analysis I and II: Nearest Neighbor Index (Nna)
Distance Analysis I and II: Nearest Neighbor Index (Nna)
Th er e a r e t wo dist a n ce a n a lysis pa ges. In Dist a n ce a n a lysis I, var iou s secon d-or der
st a t ist ics a r e pr ovided , in clud in g:
1. NN
2. Linear NN
3. Ripley
4. Assign pr im a r y point s t o secon da r y point s
Th e n ea r est n eigh bor in dex comp a r es t h e dist a n ces bet ween n ea r est point s a n d
dis t a n ces t h a t wou ld be exp ect ed on t h e ba sis of ch a n ce. It is a n in dex t h a t is t h e r a t io of
t wo su m m a r y mea su r es. Fir st , th er e is th e n earest n eigh bor d istan ce. F or ea ch poin t (or
in cid en t loca t ion ) in t u r n , t h e d is t an ce t o t h e clos es t ot h er poin t (n ea r es t n eigh bor ) is
calculat ed and a veraged over all points.
5.1
Figure 5.1: Distance Analysis I Screen
N Min (d ij)
Nea r est Neigh bor Dis t a n ce = d(NN) = G[ ----------- ] (5.1)
i=1 N
wh er e Min (d ij) is th e dista nce between each point a nd its nea rest n eighbor a nd N is th e
n u m ber of point s in t h e dist r ibu t ion . Th u s, in Crim eS tat, t h e dist a n ce fr om a sin gle point
t o every ot h er poin t is ca lcu lat ed a n d t h e sm a llest d ist a n ce (t h e m inim u m ) is select ed.
Th en , t h e n ext point is t a k en a n d t h e dist a n ce to a ll ot h er point s (in clud in g t h e firs t point
m ea su r ed) is ca lcula t ed wit h t h e n ea r est bein g selected a n d a dd ed t o th e firs t m in im u m
dis t a n ce. Th is p r oces s is r epea t ed u n t il a ll poin t s h a ve h a d t h eir n ea r es t n eigh bor select ed.
Th e t ot a l s u m of t h e m in im u m dis t a n ces is t h en divid ed by N , t h e sa m ple size, t o pr odu ce
a n a vera ge min imu m dist a n ce.
A
Mea n Ra n dom Dis t a n ce = d(r a n ) = 0.5 SQRT [ ------] (5.2)
N
Th e n ea r est n eigh bor ind ex is t h e r a t io of t h e obser ved nea r est n eigh bor dist a n ce t o
t h e m ea n r a n dom d is t a n ce
d(NN)
Nea r est Ne ighbor In dex = N NI = --------------- (5.3)
d(r a n )
5.3
Te s t i n g t h e S i g n i fi c a n c e o f t h e N e a r e s t N e i g h b o r In d e x
Some differ en ces from 1.0 in t h e n ea r est n eigh bor ind ex wou ld be expect ed by
ch a n ce. Cla r k a n d E van s (1954) pr oposed a Z-t est t o ind ica t e wh et h er t h e obser ved
a ver a ge n ea r est n eigh bor dis t a n ce wa s sign ifica n t ly differ en t fr om t h e m ea n r a n dom
dist a n ce (H a m m on d a n d McCullagh, 1978; Ripley, 1981). The t est is betw een t h e obser ved
n ea r est n eigh bor dist a n ce a n d t h a t expect ed from a r a n dom dist r ibut ion a n d is given by
d(N N ) - d (r a n )
Z = ---------------------- (5.4)
SE d (r a n )
(4 - B) A 0.26136
SE d (r a n ) . SQRT [--------------- ] . --------------------- (5.5)
4BN 2 SQRT[ N 2 /A ]
Ca lc u l a ti n g t h e s t a t is t i c s
On ce n ea r est n eigh bor a n a lysis h a s been select ed, t h e u ser clicks on Com pute t o r u n
t h e r ou t in e. Th e pr ogr a m out pu t s 10 st a t is t ics :
1. Th e sa m ple size
2. Th e m ea n n ea r est n eigh bor dis t a n ce
3. Th e st a n da r d devia t ion of t h e n ea r est n eigh bor dis t a n ce
4. Th e m in im u m d is t a n ce
5. Th e m a xim u m d is t a n ce
6. Th e m ea n r a n dom dist a n ce for bot h t h e bou n din g recta n gle a n d t h e u ser
inp u t a r ea , if pr ovided
7. Th e m ea n disper sed d ist a n ce for bot h t h e bou n din g recta n gle a n d t h e u ser
inp u t a r ea , if pr ovided
8. Th e n ea r est n eigh bor ind ex for bot h t h e bou n din g recta n gle a n d t h e u ser
inp u t a r ea , if pr ovided
9. Th e st a n da r d er r or of t h e n ea r est n eigh bor in dex for bot h t h e m a xim u m
bou n din g recta n gle a n d t h e u ser inp u t a r ea , if pr ovided
10. A significa n ce t est of t h e n ea r est n eigh bor ind ex (Z-t est )
11. Th e p-va lues a ssociat ed wit h a on e t a il a n d t wo t a il significa n ce t est .
5.4
Exam ple 1: The ne ares t ne ighbo r inde x for street robbe ries
Ta ble 5.1
Ne are st N e igh bo r Sta tis tic s for
1996 Street Robb erie s in B altimore County
(N =1181)
Exam ple 2: The ne ares t ne ighbo r inde x for res iden tial burglaries
5.5
SARS and the Distribution of Passengers on an Airplane
Marta A. Guerra
Senior Staff Epidemiologist,
Centers for Disease Control and Prevention
Atlanta, GA
The nearest neighbor index was used to compare the distances between the
seats of passengers on this flight to distances expected on the basis of chance. A grid
(7 m x 32 m) was superimposed on the airline seat configuration, and each seat was
assigned an X, Y coordinate based on the width (x) and the length (y) of the airplane.
In the diagram below, the seat location of the SARS index case is indicated by an X,
and the passengers’ seat locations are shaded in black.
The nearest neighbor index of passengers’ seats was 0.931 indicating that the
distribution was random, not clustered. This preliminary analysis was important in
order to establish that the seating arrangement of the passengers was random and
independent, and that the passengers’ seats were not clustered around the SARS
case. Therefore, if any passengers have positive serum samples for SARS, we would
be able to evaluate their locations in relation to the SARS case and assess patterns
of transmission. In this survey, however, there was no evidence of transmission
since none of the passengers had positive serum samples for SARS.
Ta ble 5.2
Ne are st Ne ig h bo r St at is tic s for
1996 R esid e n tial Bu r glar ie s in B alt im or e Coun t y
(N=6051)
Th e dis t r ibu t ion of r esid en t ia l bu r gla r ies is a ls o h igh ly sign ifica n t . N ow, s u ppose
we wa n t t o comp a r e t h e dist r ibu t ion of st r eet r obber ies (ta ble 5.1) wit h t h a t r esiden t ia l
bu r gla r ies (t a ble 5.2). Th e sign ifica n ce t est is n ot ver y u sefu l for t h e com pa r is on beca u se
t h e sa m ple sizes a r e so la r ge (1181 v. 6051); t h e m u ch h igher Z-va lu e for r esiden t ia l
bu r gla r ies indicat es pr ima r ily t h a t t h er e wa s a lar ger s a m ple size to test it. H owever,
com pa r in g t h e r ela t ive n ea r est n eigh bor in dices ca n be m ea n in gfu l.
Rela t ive
Near est
Neigh bor NNI(A)
Com pa r is on = ----------------- (5.6)
NN I(B)
wh er e NN I(A) is t h e n ea r est n eigh bor ind ex for on e group (A) a n d N NI (B) is t h e n ea r est
n eigh bor in dex for a n oth er gr oup (B). Th u s, com pa r in g st r eet r obber ies wit h r esiden t ia l
bu r gla r ies , we h a ve
In ot h er wor ds, t h e dis t r ibu t ion of st r eet r obber ies r ela t ive t o a n exp ect ed r a n dom
dis t r ibu t ion a ppea r s t o be m or e con cen t r a t ed t h a n t h a t of bu r gla r ies r ela t ive t o a n
exp ect ed r a n dom dis t r ibu t ion . Th er e is n ot a sim ple sign ifica n ce t est of t h is com pa r is on
since th e st a n da r d er r or of t h e joint dist r ibut ion s is not k n own . 3 But t h e r elat ive index
su ggest s t h a t r obber ies a r e m ore concen t r a t ed t h a n bu r gla r ies a n d, h en ce, ar e m ore lik ely
t o h a ve ‘h ot spot ’ or ‘h ot zon es’ wh er e t h ey a r e pa r t icu la r ly con cen t r a t ed. Th is in dex, of
cou r s e, d oes n ot p r ove t h a t t h er e a r e ‘h ot s pot s ’, bu t on ly p oin t s u s t owa r d s t h e h igh er
con cent r a t ion of robberies rela t ive to bur gla r ies. In t h e pr eviou s ch a pt er, it wa s sh own
t h a t r obber ies h a d a sm a ller dis per sion t h a n bu r gla r ies. H er e, h owever , t h e a n a lysis is
ta ken a step fur th er to suggest th at robberies ar e more concentr at ed tha n bur glaries.
5.7
U s e o f N e t w o r k D i s ta n c e
In calcula t in g t h e n ea r est n eigh bor in dex, net work dis t a n ce ca n be u sed t o ca lcula t e
t h e dis t a n ce bet ween poin t s (see ch a pt er 3). H owever , u n less t h e da t a set is ver y s m a ll or
you h a ve a lot of pa t ien ce, I h igh ly r ecom m en d t h a t you d on ’t do th is. N et wor k
ca lcu lat ion s a r e very slow a n d will t a ke a lon g tim e t o com plet e for a lar ge file.
K-Orde r Ne are st Ne ig h bo rs
As m en t ion ed a bove, t h e n ea r est n eigh bor ind ex is only an ind ica t or of firs t -or der
spa t ia l r a n dom n ess. It com pa r es t h e a ver a ge dis t a n ce for t h e n ea r est n eigh bor t o a n
expe cted r a n dom dis t a n ce. But wh a t a bout t h e secon d n ea r est n eigh bor? Or t h e t h ir d
n ea r est n eigh bor ? Or t h e K t h n ea r es t n eigh bor ? Crim eS tat const ru cts K-order n earest
n eigh bor in dices. On t h e dis t a n ce a n a lysis pa ge, t h e u ser ca n specify t h e n u m ber of
n ea r es t n eigh bor in dices t o be calcu la t ed.
1. Th e or der , s t a r t in g fr om 1
2. Th e m ea n n ea r est n eigh bor dist a n ce for ea ch or der (in m et er s)
3. Th e expect ed n ea r est n eigh bor dist a n ce for ea ch or der (in m et er s)
4. Th e n ea r est n eigh bor ind ex for ea ch or der
F or ea ch or der , Crim eS tat ca lcu lat es t h e K t h n ea r est n eigh bor dis t a n ce for ea ch
obser va t ion a n d t h en t a k es t h e a ver a ge. Th e exp ect ed n ea r est n eigh bor dis t a n ce for ea ch
ord er is ca lcula t ed by:
Never t h eless, t h e K-or der n ea r est n eigh bor dis t a n ce a n d in dex ca n be u sefu l for
u n der st a n din g th e overa ll spa t ial dist r ibut ion s. Figur e 5.2 com pa r es t h e K-or der n ea r est
neighbor index for st reet r obberies with th at of resident ial burglar ies. The out put was
5.8
Figure 5.2
K-Order Nearest Neighbor Indices
1996 Street Robberies and Residential Burglaries
2.0
1.8
1.6
Nearest Neighbor Index
1.4
1.2
0.8
Residential burglaries
0.6
0.2
0.0
1 5 9 13 17 21 25 29 33 37 41 45 49
3 7 11 15 19 23 27 31 35 39 43 47
James L. LeBeau
Administration of Justice
Southern Illinois University-Carbondale
A comparison was made of Man with a Gun calls for the weekend in which
Hurricane Hugo hit the North Carolina coast ( September 22 – 24) with the
following New Year’s Eve weekend (December 29-31, 1989). There were 146 Man
with a Gun calls during the Hurricane Hugo weekend compared to 137 calls for New
Year’s Eve.
0.85
0.80
Clustered - Index - Dispersed
0.75
0.70
0.60
0 5 10 15 20 25
ORDER
In ot h er wor ds, even t h ou gh t h er e is not a good significa n ce t est for t h e K-or der
n ea r est n eigh bor ind ex, a gra ph of t h e K-or der ind ices (or t h e K-or der dist a n ces) ca n give a
p ict u r e of h ow clu s t er ed t h e d is t r ibu t ion is a s well a s a llow com p a r is on s in clu s t er in g
bet ween t h e differ en t t yp es of crim es (or t h e sa m e cr im e a t t wo differ en t t im e per iods).
On t h e out pu t pa ge, t h er e is a qu ick gr a ph fun ction t h a t dis pla ys a cur ve sim ila r t o
figu r e 5.2. Th is is u seful for qu ickly exa m in in g t h e t r en ds . H owever , a bet t er gr a ph is
m a de by im por t in g t h e ‘dbf’ file ou t pu t in t o a s pr ea ds h eet or gr a ph ics pr ogra m .
Edge Effec ts
N e a re s t N e ig h bo r E dg e Co rre c ti on s
5.11
Crim eS tat h a s t wo differ en t edge cor r ect ion s. Beca u se Crim eS tat is not a GIS
pa cka ge, it ca n n ot loca t e t h e a ct u a l bor der of a st u dy a r ea . On e wou ld n eed a t opologica l
GIS p a cka ge in wh ich t h e dist a n ce fr om ea ch p oint t o th e n ea r es t boun da r y is ca lcula t ed.
In st ea d, th er e a r e t wo differ en t geom et r ic m odels t h a t ca n be ap plied. The firs t a ssu m es
t h a t t h e st u dy a r ea is a r ect a n gle wh ile t h e secon d a ss u m es t h a t t h e st u dy a r ea is a circle.
Depen din g on t h e sh a pe of t h e a ct u a l st u dy ar ea , on e or eith er of t h ese m odels m a y be
a ppr opr iat e.
R ect a n gu la r st u d y a r ea
Ci r cu la r st u d y a r ea
R = SQRT [A / B ] (5.8)
R iC = R - R i (5.9)
5.12
F ift h , for ea ch poin t , i, th e obser ved m inim u m dist a n ce is com pa r ed t o t h e n ea r est
edge of t h e circle, R iC . If t h e obs er ved n ea r est n eigh bor dis t a n ce for poin t i is equ a l t o or
less t h a n t h e dis t a n ce t o t h e n ea r est ed ge, it is r et a in ed. On t h e ot h er h a n d, if t h e
observed nearest n eighbor dista nce for point i is great er th an th e dista nce to th e nearest
edge, t h e dis t a n ce t o t h e bor der is u sed a s a pr oxy for t h e t r u e n ea r est n eigh bor dis t a n ce of
poin t i.
Th e lin ear n earest n eigh bor in d ex is a va r iat ion on t h e n ea r est n eigh bor r ou t ine, bu t
on e a p plied t o a s t r eet n et wor k . All d is t a n ces a lon g t h is n et wor k a r e a s su m ed t o t r a vel
a lon g a gr id, h en ce ind ir ect dis t a n ces a r e u sed. Wh er ea s t h e n ea r est n eigh bor r out in e
calculat es the distan ce between each point a nd its nea rest n eighbor u sing direct dista nces,
t h e lin ea r n ea r es t n eigh bor r out in e u se s in dir ect (‘Ma n h a t t a n ’) dist a n ces (see cha pt er 3).
Sim ilar ly, wher ea s t h e n ea r est n eigh bor r ou t ine calcu lat es t h e expect ed dist a n ce bet ween
n eigh bor s in a r a n dom dist r ibut ion of N p oint s u sin g th e geogra ph ica l ar ea of t h e st u dy
r egion , t h e lin ea r n ea r est n eigh bor r ou t in e u ses t h e t ot a l len gt h of t h e st r eet n et wor k .
5.13
Figure 5.3:
Dispersed
Random
1
Concentrated
Nearest neighbor index
0.9
No correction
Rectangular correction
0.7
10 20 30 40
5 15 25 35 45
Order
Th e t h eor y of lin ea r n ea r es t n eigh bors comes from H a m m ond a n d McCulla gh
(1978). Th e obse r ved lin ea r n ea r es t n eigh bor dis t a n ce, Ld(N N), is calcu la t ed by Crim eS tat
a s t h e a vera ge of ind irect d ist a n ces bet ween ea ch poin t a n d it s n ea r est n eigh bor . The
expect ed lin ea r n ea r es t n eigh bor dis t a n ce is given by
L
Ld (r a n ) = 0.5 [------------------] (5.10)
N -1
wh er e L is th e tota l length of str eet n etwork an d N is the sam ple size (Ha mm ond a nd
McCullagh, 1978, 279). Consequent ly, th e linear n earest neighbor index is defined as
Lin ea r N ea r es t Ld(NN)
Neighbor In dex = LN NI = --------------- (5.11)
Ld (r a n )
Te s t i n g t h e S i g n i fi c a n c e o f t h e Li n e a r N e a r e s t N e i g h b o r In d e x
Sin ce t h e t h eor et ica l s t a n da r d er r or for t h e r a n dom lin ea r n ea r est n eigh bor dis t a n ce
is n ot kn own , t h e a u t h or h a s con st r u ct ed a n a ppr oxim a t e st a n da r d devia t ion for t h e
obser ved lin ea r n ea r est n eigh bor dist a n ce:
wh er e Min (d ij) is t h e n ea r est n eigh bor dist a n ce for poin t i an d Ld(NN ) is t h e a vera ge linea r
n ea r est n eigh bor dis t a n ce. Th is is t h e st a n da r d devia t ion of t h e lin ea r n ea r est n eigh bor
dist a n ces. The s t a n da r d er r or is ca lcu lat ed by
S L d(N N )
SE L d(N N ) = -------------- (5.13)
SQRT[N]
Ld (N N ) - Ld (r a n )
t = ----------------------------- (5.14)
SE L d(N N )
wh er e Ld(NN) is t h e a vera ge linea r n ea r est n eigh bor dist a n ce, Ld(r a n ) is t h e expect ed
lin ea r n ea r es t n eigh bor dis t a n ce (equ a t ion 5.10), a n d S E L d(N N ) is th e a pp roxim a t e s ta n da r d
er r or of t h e lin ea r n ea r es t n eigh bor dis t an ce (equ a t ion 5.13). Sin ce t h e em p ir ica l s ta n da r d
devia t ion of th e lin ea r n ea r es t n eigh bor is bein g u se d in st ea d of a t h eor et ical va lu e, t h e
t es t is a t-test r a t h er t h a n a Z-t es t .
5.15
Ca lc u l a ti n g t h e s t a t is t i c s
On th e measu rem ents pa ra met ers page, th ere ar e two par am eters t ha t a re input ,
t h e geogr a ph ica l a r ea of t h e st u dy r egion a n d t h e len gt h of st r eet n et wor k . At t h e bot t om
of t h e p age, t h e u ser m u st select wh ich t yp e of d is t an ce m ea su r em en t t o u se, d ir ect or
in d ir ect . If t h e m ea s u r em en t t yp e is dir ect , t h en t h e n ea r es t n eigh bor r ou t in e r et u r n s t h e
sta nda rd n earest neighbor a na lysis (somet imes called areal nea r est neighbor ). On t h e
ot h er h a n d, if t h e m ea su r em en t t yp e is in dir ect , t h en t h e r ou t in e r et u r n s t h e lin ea r n ea r es t
n eigh bor a n a lysis . To ca lcu la t e t h e lin ea r n ea r est n eigh bor in dex, t h er efor e, d is t a n ce
m ea su r em en t m u st be specified a s in dir ect a n d t h e lengt h of t h e st r eet n et wor k m u st be
defined.
On ce n ea r est n eigh bor a n a lysis h a s been select ed, t h e u ser clicks on Com pute t o r u n
t h e r ou t ine. The L n n a rout ine out put s 9 stat istics:
1. Th e sa m ple size
2. Th e m ea n lin ea r n ea r est n eigh bor dist a n ce
3. The minimum linear distan ce between n earest neighbors
4. Th e m a xim u m lin ea r dis t a n ce bet ween n ea r est n eigh bor s
5. Th e m ea n lin ea r r a n dom dist a n ce
6. Th e lin ea r n ea r es t n eigh bor in dex
7. Th e st a n da r d deviat ion of t h e lin ea r n ea r est n eigh bor dist a n ce
8. Th e st a n da r d er r or of t h e lin ea r n ea r est n eigh bor dis t a n ce
9. A significa n ce t est of t h e n ea r est n eigh bor ind ex (t -t est )
10. Th e p-va lues a ssociat ed wit h a on e t a il a n d t wo t a il significa n ce t est .
E x a m p l e 3: Au t o t h e ft s a lo n g t w o h i g h w a y s
Th e lin ea r n ea r est n eigh bor in dex is u seful for a n a lyzing t h e dist r ibu t ion of crim e
in ciden t s a lon g pa r t icula r st r eet s. F or exa m ple, in Ba lt im ore Coun t y, st a t e h ighwa y 26 in
t h e west er n pa r t a n d st a t e h igh wa y 150 in t h e ea st er n pa r t h a ve h igh con cen t r a t ion s of
m ot or vehicle th eft s (figu r e 5.4). In 1996, th er e wer e 87 vehicle th eft s on h igh wa y 26 an d
47 on h igh wa y 150. A GIS ca n be u sed wit h t h e lin ea r n ea r est n eigh bor in dex t o in dica t e
wh et h er t h es e in ciden t s a r e gr ea t er t h a n wh a t would be exp ect ed on t h e ba sis of cha n ce.
5.16
Figure 5.4:
Sta
te
Hig
hw
ay
26
0
y 15
a
ighw
H
te
Sta
Miles
0 2 4
Ta ble 5.3
H igh wa y 26 10.42 m i
H igh wa y 150 7.79 m i
All Ma jor
Ar t er ia ls 241.04 m i
All
Roads 3333.54 m i
Random E xpected
Dist a n ce
Bet ween In ciden t s = 0.44 miles
A ve ra ge “R el a t i ve
“R e la tive A ve ra ge R a n d om t o I t s e lf”
t o R a n d om ” Linea r Linea r Linea r
Wh e r e Number E x p e c te d Ne arest N e are st Ne arest
Inc ide n ts of Number R a t io o f Neighbor Neighbor Neighbor
Oc cu rre d Inc ide n ts If R a n d o m Frequen cy D i s ta n c e D i s ta n c e In d e x
All Ma jor
Ar t er ia ls 607 272.8 2.2 0.13 m i 0.20 0.64
(p#.001)
5.18
seen , th e dist r ibut ion of m ot or vehicle th eft s is n ot r a n dom. On a ll m a jor a r t er ial r oa ds, t h er e
a r e 2.2 t im es a s m a n y t h eft s a s wou ld be exp ect ed by a r a n dom spa t ia l d is t r ibu t ion . In fa ct ,
in 1996, of 28,551 r oa d segm en t s in Ba lt im or e Cou n t y, on ly 7791 (27%) h a d on e or m or e m ot or
veh icle t h eft s occu r on t h em ; m os t of t h es e a r e m a jor r oa ds . F u r t h er , on h igh wa y 26 t h er e
wer e 7.4 tim es a s m u ch a n d on h igh wa y 150 th er e wer e 5.3 tim es a s m u ch a s would be
expe cted if t h e dist r ibu t ion wa s r a n dom . Clea r ly, th ese t wo high wa ys h a d m ore t h a n t h eir
sh a r e of a u t o th eft s in 1996.
But wha t a bout th e distr ibut ion of th e incidents a lon g each of th ese highwa ys? If
t h er e wer e a n y p a t t er n , for exa m ple, m ost of t h e in cid en t s clu st er in g on t h e west er n edge or
in t h e cen t er , th en police cou ld u se t h a t infor m a t ion t o m or e efficient ly deploy veh icles t o
r espond quickly to event s. On t h e ot h er h a n d, if t h e dist r ibut ion a lon g th ese h igh wa ys wer e
n o differ en t t h a n a r a n dom dist r ibut ion , th en police vehicles m u st be posit ion ed in t h e m iddle,
sin ce t h a t wou ld m inim ize t h e dist a n ce t o a ll occu r r ing incident s.
K-Or d e r Li n e a r N e a r e s t N e i g h b o rs
There is also a K-order linear near est neighbor a na lysis, as with t he ar eal nearest
n eigh bors. Th e u se r can sp ecify h ow m a n y a dd it iona l n ea r es t n eigh bors a r e t o be calcu la t ed.
Th e lin ea r K-or der n ea r est n eigh bor r out in e r et u r n s fou r colu m n s:
1. Th e or der , s t a r t in g fr om 1
2. Th e m ea n lin ea r n ea r est n eigh bor dist a n ce for ea ch or der (in m et er s)
3. Th e expect ed linea r n ea r est n eigh bor dist a n ce for ea ch or der (in m et er s)
4. Th e lin ea r n ea r est n eigh bor ind ex for ea ch or der
Sin ce t h e expect ed linea r n ea r est n eigh bor dist a n ce h a s n ot been work ed out for or der s
h igh er t h a n on e, t h e ca lcu la t ion pr odu ced h er e is a r ou gh a ppr oxim a t ion . It a pplies equ a t ion
5.10 only a dju st in g for t h e decr ea sin g sa m ple size, N k , wh ich occu r s a s degr ees of fr eedom a r e
lost for each successive order. In th is sense, th e index is really th e k-order linear near est
n eigh bor dis t a n ce r ela t ive t o t h e exp ect ed lin ea r n eigh bor dis t a n ce for t h e fir st or der . It is n ot
a st r ict n ea r est n eigh bor ind ex for or der s a bove on e.
Never t h eless, like t h e a r ea l k-or der n ea r est n eigh bor ind ex, t h e k-or der lin ea r n ea r est
n eigh bor ind ex ca n pr ovide ins igh t s in t o t h e dist r ibut ion of t h e poin t s, even if t h e firs t -or der
5.19
is r a n dom . Figur e 5.5 s h ows a gr a ph of 50 lin ea r n ea r est n eigh bors for 1996 r esiden t ia l
bu r gla r ies a n d st r eet r obber ies for Balt imore Coun t y. As wit h t h e a r ea l k-or der n ea r est
n eigh bor s (see figu r e 5.3) bot h bu r gla r ies a n d r obber ies sh ow eviden ce of clu st er in g. F or bot h ,
t h e firs t n ea r est n eigh bor s a r e closer t ogeth er t h a n a r a n dom dist r ibut ion . Similar ly, over t h e
50 or der s, s t r eet r obber ies a r e m or e clu st er ed t h a n bu r gla r ies. H owever , m ea su r in g d is t a n ce
on a gr id sh ows t h a t for bu r gla r ies, t h er e is only a sm a ll a m ou n t of clu st er in g. After t h e
four th order n eighbor, the distribution for bur glaries is more dispersed th an a r an dom
dis t r ibu t ion . An in t er pr et a t ion of t h is is t h a t t h er e a r e sm a ll n u m ber of bu r gla r ies wh ich a r e
clus t er ed, bu t t h e clust er s a r e r ela t ively disp er se d. S t r eet r obber ies , on t h e oth er h a n d, a r e
highly clustered, up t o over 30 near est neighbors.
Th e lin ea r k-or der n ea r est n eigh bor dis t r ibu t ion gives a sligh t ly differ en t per sp ect ive
on t he distribution t ha n t he ar eal. For one th ing, th e index is slight ly biased as t he
den om in a t or - t h e K-or der exp ect ed lin ea r n eigh bor dis t a n ce, is on ly a ppr oxim a t ed. F or
a n ot h er t h ing, th e index m ea su r es dist a n ce as if t h e st r eet follow a t r u e gr id, orien t ed in a n
ea st -west a n d n or t h -sout h dir ect ion . In t h is sen se, it m a y be un r ea listic for m a n y places,
especia lly if st r eet s t r a ver se in dia gon a l p a t t er n s; in t h ese ca ses, t h e u se of in dir ect dis t a n ce
m ea su r em en t will pr odu ce grea t er dis t a n ces t h a n wh a t a ctu a lly occu r on t h e n et work . St ill,
t h e lin ea r n ea r est n eigh bor ind ex is a n a t t em pt t o a ppr oxim a t e t r a vel a lon g th e st r eet
n et work . To t h e ext en t t h a t a pa r t icula r jur isd iction’s s t r eet pa t t er n fall in t h is m a n n er , it
ca n pr ovide u sefu l in for m a t ion .
On t h e out pu t pa ge, t h er e is a qu ick gr a ph fun ction t h a t dis pla ys a cur ve sim ila r t o
figur e 5.5 below. Th is is u se ful for qu ickly exa m in in g t h e t r en ds .
Ripley’s K S t a t ist ic
Con sider a spatially ran d om dis t r ibu t ion of N point s. I f cir cles of r a diu s, t s , ar e dr a wn
a r oun d ea ch p oint , wh er e s is t h e order of r a dii fr om t h e sm a lles t t o th e la r gest , a n d t h e
n u m ber of oth er point s t h a t a r e fou n d wit h in t h e circle a r e cou n t ed a n d t h en su m m ed over a ll
poin t s (a llowin g for d u plicat ion ), t h en t h e expected n u m ber of point s wit h in t h a t r a diu s a r e
5.20
Figure 5.5
3.5
Linear Nearest Neighbor Index
Residential burglaries
2.5
1.5
0
0 10 20 30 40
5 15 25 35 45
N
E (# un der csr ) = ------ B t s 2 (5.16)
A
A
K(t s ) = ------ G G I (t ) ij (5.17)
N2 i i=/ j
where I(t ij) is t h e n u m ber of oth er poin t s, j, fou n d w it h in dis t a n ce, t s , sum med over all points,
i. Th a t is, a circle of r a diu s, t s , is pla ced over ea ch p oin t , i. The n , t h e n u m ber of oth er point s,
j, wit h in t h e circle is cou n t ed. Th e circle is m oved t o th e n ext i a n d t h e pr oces s is r epea t ed.
Th u s, t h e dou ble su m m a t ion poin t s t o t h e cou n t of a ll j’s for ea ch i, over a ll i’s. N ot e, t h e cou n t
does n ot in clud e it self, on ly oth er poin t s.
5.22
t h er e a r e 50-100 in t er va ls by wh ich t h e st a t ist ic ca n be coun t ed. In Crim eS tat, 100 in t er va ls
(r a dii) a r e u sed, ba sed on
R
t s = -------- (5.18)
100
K(t s )
L(t s ) = S QRT [ --------- ] - t s (5.19)
B
Co m p a r i s o n to A S p a t ia ll y R a n d o m D i s t ri b u t io n
5.23
Figure 5.6:
"K" Statistic For 1996 Robberies
Compared to Random and 2000 Population Distributions
L(t) = Sqrt[K(t)/pi] - t
Robbery
2
Population
1
L(t)
CSR
0
-2
0.1 1.4 2.7 4.0 5.3 6.6 7.9
In pr a ct ice, th e sim u lat ion t est a lso h a s bia ses a ssociat ed wit h edges. Un like t h e
t h eore t ical L u n der u n ifor m con dit ion s of com plet e spa t ia l r a n dom n ess (i.e., st r et chin g in a ll
dir ect ion s well beyon d t h e st u dy ar ea ) wh er e L is a s t r a igh t h or izon t a l line, t h e sim u lat ed L
a ls o declin es wit h in cr ea sin g dis t a n ce s epa r a t ion be t ween poin t s. Th is is a fu n ct ion of th e
sa m e t ype of edge bia s.
Co m pa ri so n to B as e li ne P o p u la ti on s
F or m ost social dist r ibut ion s, su ch a s crim e inciden t s, r a n domn ess is n ot a very
m ea n in gfu l ba selin e. Most socia l ch a r a cter ist ics a r e n on-r a n dom . Con sequ en t ly, to find t h a t
t h e a m oun t of clu st er in g t h a t is occur r in g is gr ea t er t h a n wh a t would be exp ected on t h e ba sis
of ch a n ce is not ver y useful for cr ime a n a lyst s. H owever, it is p ossible t o com pa r e t h e
dis t r ibu t ion of L for crim e in ciden t s wit h t h e dist r ibu t ion of L for va r iou s ba selin e
cha r a cte r ist ics, for exa m ple , for t h e popu la t ion d ist r ibu t ion or t h e dist r ibu t ion of em ploym en t .
In a lmost a ll m et r opolita n a r ea s, popu lat ion is m or e con cen t r a t ed t owa r ds t h e cen t er t h a n a t
t h e per iph er y; t h e dr op-off in popu la t ion den sit y is ver y sh a r p a s wa s s h own in t h e la st
ch a p t er . All ot h er t h in gs bein g equ a l, on e wou ld exp ect m or e in cid en t s t owa r d s t h e
m et r opolita n cen t er t h a n a t t h e per iph er y; con sequ en t ly, th e a vera ge dista n ce bet ween
in cid en t s will be s h or t er in t h e cen t er t h a n fa r t h er ou t . Th is is n ot h in g m or e t h a n a
con sequ en ce of t h e dis t r ibu t ion of people. H owever , t o sa y s om et h in g a bou t con cen t r a t ion s of
in cid en t s a bove-a n d-beyon d t h a t exp ect ed by p opu la t ion r equ ir es u s t o exa m in e t h e pa t t er n of
populat ion a s well as of crime incidents.
5.25
ca lcu la t ion of L. In F igu r e 5.6 a bove, t h er e is a n en velop e pr odu ced fr om 100 r a n dom
simu lat ion s a s well as t h e L distr ibut ion fr om t h e 2000 popu lat ion ; t h e lat t er va r iable was
obta in ed by t a k in g t h e cent r oid of t r a ffic a n a lysis zon es from t h e 2000 censu s a n d u sin g
popu la t ion a s t h e in t en sit y var ia ble. As can be s een , t h e a m oun t of clu st er in g for r obber ies is
grea t er t h a n bot h t h e r a n dom en velope a s well as t h e dist r ibut ion of popula t ion . The r obber y
fu n ct ion is h igh er t h a n t h e p op u la t ion fu n ct ion u p t o a bou t 6 m iles . Th is in d ica t es t h a t
r obber ies a r e m or e con cen t r a t ed t h a n wh a t wou ld be exp ect ed fr om t h e popu la t ion
dis t r ibu t ion for a fair ly la r ge a r ea .
F or com pa r ison , figur e 5.7 below sh ows t h e dist r ibu t ion of 1996 bu r gla r ies, aga in
com p ar ed to a r a n dom en velop e a n d t h e d is t ribu t ion of p op ula t ion . We fin d t h a t bu r gla r ies
a r e m or e clu st er ed t h a n popu la t ion , bu t less so t h a n for r obber ies; t h e L va lu e is h igh er for
robberies th an for bur glaries for n ear dista nces but becomes more dispersed at a bout 3 miles;
it is still more concentr at ed tha n a ra ndom distr ibut ion, however, as seen by th e ran dom
en velope.. Th u s, t h e dist r ibut ion of L con firm s t h e r esu lt t h a t bur gla r ies t en d t o be spr ea d
over a m u ch la r ger geogr a ph ica l a r ea in sm a ller clu st er s t h a n st r eet r obber ies , wh ich t en d t o
be m or e con cen t r a t ed in la r ge clu st er s . In t er m s of look in g for ‘h ot sp ot s ’, on e wou ld exp ect t o
find more with robberies th an with burglar ies.
E d g e Co r re c t i o n s fo r Ri p le y ’s K
Th e L st a t ist ic is p r one t o edge effects ju st like t h e n ea r est n eigh bor s t a t ist ic. Th a t is,
for poin t s loca t ed nea r t h e bou n da r y of t h e s tu dy a r ea , t h e n u m ber en u m er a t ed by a n y cir cle
for t h ose point s will, a ll ot h er t h ings bein g equa l, n ecessa r ily be less t h a n poin t s in t h e cen t er
of t h e s tu dy a r ea beca u se p oin t s ou t sid e t h e bou n da r y a r e n ot cou n t ed . F u r t h er , t h e gr ea t er
t h e dist a n ce bet ween point s t h a t a r e bein g t est ed (i.e., th e gr ea t er t h e r a diu s of t h e circle
placed over ea ch point ), th e grea t er t h e bia s. Thu s, a plot of L aga inst dist a n ce will show a
declin ing cu r ve as dist a n ce incr eas es a s figu r es 5.6 a n d 5.7 show.
5.26
Figure 5.7:
"K" Statistic For 1996 Burglaries
Compared to Random and 2000 Population Distributions
L(t) = Sqrt[K(t)/pi] - t
Population
2
Burglary
1
L(t)
0
CSR
-2
0.09 1.39 2.7 4.0 5.3 6.6 7.9
A
K(t s ) = ------ G G Wij-1 I (t ij) (5.20)
N2 i j
wh er e W ij-1 is t h e in ver se of t h e pr oport ion of th e circum fer en ce of a cir cle of ra diu s, t s , placed
over ea ch poin t t h a t is wit h in t h e t ot a l s tu dy a r ea . Th u s, if a poin t is nea r t h e s tu dy a r ea
bord er , it will r eceive a gr ea t er weigh t becau se a sm a ller pr opor t ion of t h e circle pla ced over it
will be wit h in t h e st u dy a r ea . An a lt er n a t ive weigh t in g s ch em e ca n be fou n d in Ma r con a n d
P u ech (2003).
I n Crim eS tat, two possible corr ections a re condu cted. One assum es tha t t he stu dy
a r ea is a r ect a n gle wh ile t h e ot h er a ssu m es t h a t it is a circle.
In t h e r ect a n gu la r cor r ect ion for Riple y’s K , t h e sea r ch cir cle ra diu s, R j, is compa red to
t h e edge of a n a ssu m ed r ect a n gle with a r ea , A, cen t er ed a t t h e m ea n cen t er . Fir st , th e a r ea t o
be an a lyzed is defined . If t h e u ser h a s sp ecified a st u dy ar ea on t h e m ea su r em en t pa r a m et er s
pa ge, t h en t h a t va lu e for A is t a k en . Th e m a xim u m boun din g r ect a n gle is t a k en (i.e.,
r ect a n gle defin ed by th e m inim u m a n d m a xim u m X/Y valu es) an d pr oport ion a t ely r e-scaled so
t h a t t h e a r ea of t h e r ect a n gle is equ a l t o A. If t h e u ser does n ot specify a n a r ea on t h e
m ea su r em en t pa r a m et er s p a ge, t h en t h e boun din g r ecta n gle defined by t h e m in im u m a n d
m a xim u m X/Y va lu es is ta k en for A.
5.28
t h en a gr ea t er a dju st m en t is r equ ir ed sin ce E cou ld va r y bet ween 1 a n d 4 sin ce
u p t o t h r ee-fou r t h of t h e sea r ch circle cou ld fa ll ou t side t h e r ect a n gle.
W ij-1 = k = 1 (5.21)
2B
W ij-1 = k = { ----------------------------------------- } (5.22)
{2B - 2Cos-1[d (m in R)/Ri]}
2B
W ij-1 = k = { ------------------------------------------------------------------ } (5.23)
{1.5B - Cos -1[d (m in Rx)/Ri]-Cos -1[d (m in Ry)/Ri]}
In t h e circula r cor r ect ion for Riple y’s K , t h e sea r ch cir cle ra diu s, R j, is com pa r ed t o t h e
edge of a n a ssu m ed cir cle with a r ea , A, cen t er ed a t t h e m ea n cen t er . Fir st , th e a r ea t o be
a n a lyzed is defined . If t h e u ser h a s sp ecified a st u dy ar ea on t h e m ea su r em en t pa r a m et er s
pa ge, t h en t h a t va lu e for a is t a k en . Th e r a diu s of th e circle, R j, is ca lcula t ed by equ a t ion 5.8
a bove. If t h e u ser h a s n ot specified a st u dy ar ea on t h e m ea su r em en t pa r a m et er s pa ge, t h en
A is ca lcu la t ed fr om t h e m a xim u m bou n din g r ect a n gle a n d t h e r a diu s of t h e cir cle is
ca lcu lat ed by equa t ion 5.8 above.
R jC = R - R j (5.25)
5.29
Th ir d, t h e sea r ch cir cle ra diu s, R j, is comp a r ed t o th e n ea r es t edge of t h e circle, R iC ,
a n d t h e weigh t will var y fr om 1 (poin t a n d r a diu s t ot a lly with in t h e st u dy ar ea ) t o 2.3834
(poin t is loca t ed exa ct ly on bou n da r y of a r ea cir cle). Th e for m u la s for t h e cir cu la r cor r ect ion
a r e:
W ij-1 = k = B / 2 (5.27)
F igu r e 5.8 below sh ows a Ripley’s K dist r ibut ion for 1996 Balt imore Coun t y bur gla r ies,
with a n d wit h ou t edge cor r ect ion s. As can be seen , th e u n cor r ect ed L dist r ibut ion decrea ses
a n d fa lls below t h e t h eor et ica l r a n dom cou n t (com plet e spa t ia l r a n dom n ess, L=0) a ft er a bou t
7 m iles wher ea s n eith er t h e L dist r ibut ion with t h e r ect a n gula r cor r ect ion n or t h e L
dis t r ibu t ion wit h t h e cir cu la r dis t r ibu t ion do so. As exp ect ed, t h e r ect a n gu la r dis t r ibu t ion
p r od u ces t h e m os t con cen t r a t ion .
Th er e is a box la beled “Ou t pu t in t er m edia t e r esu lt s”. If ch ecked, a sepa r a t e dbf file
will be ou t pu t t h a t list s t h e int er m edia t e ca lcu lat ion s. The file will be ca lled
“RipleyTempOu tpu t.dbf”. There are five out put fields:
5.30
Figure 5.8:
"K" Statistic For 1996 Burglaries
With Different Types of Corrections
L(t) = Sqrt[K(t)/pi] - t
Rectangular correction
3
L(t)
Circular correction
1
No correction
0
0.09 1.39 2.7 4.0 5.3 6.6 7.9
Sometimes crime analysts tend to produce beautiful hot spot maps without
any formal evidence that clustering is indeed present in the data. One excellent and
powerful tool that CrimeStat provides is the computation of the K function, which
summarizes spatial dependence over a wide range of scales, and uses the
information of all events.
4
Observed L(d)
2 Base-Population L(d)
-2
L(d)
-4
-8 L(d)
CSR
L(d)_MIN
-10 L(d)_MAX
L(d) Base Population
-12
0 10 20 30 40
Distance Between Points [km]
1 A years worth dataset of events occurring within a 9,500 km2 area around the Federal Capital (29
counties).
2 Remember that Pr( L(d) > Lmax) = Pr( L(d) < Lmin) = 1 / (m + 1) where m is the number of
independent simulations,
Th is ou t pu t ca n be u sefu l for exa m in in g t h e cou n t s for specific poin t s or for t r yin g ou t
altern at ive weight ing schemes.
S o m e C a u t i on s i n U s i n g R i p le y ’s K
Wh ile Ripley’s K is a power ful t ool for a n a lyzing s pa t ia l a u t ocorr ela t ion (us u a lly
clus t er ing, ra t h er t h a n disper sion ), like a n y st a t ist ic it is pr on e t o bias es. We’ve discu ssed
edge bia ses a bove. But t h er e a r e oth er s. F ir st , t h er e is a sa m ple size is su e. Th e r out in e
ca lcu la t es 100 sepa r a t e L(t ) va lu es, on e for ea ch dis t a n ce bin . H owever , t h e pr ecis ion of an y
on e L(t ) va lu e is dep en den t on t h e s am p le s ize. Wit h a sm a ll s am p le, t h er e is in su fficien t da t a
t o est im a t e 100 in depen den t va lu es of L(t ). Wh ile t h e Mon t e Ca r lo s im u la t ion pa r t ly ca n
a ccoun t for t h a t bia s, it h a s t o be r ea lized t h a t t h e pr ecision of t h e in t er pr et a t ion is su sp ect .
F or exam ple, in com pa r ing t wo sim ilar dist r ibut ion s, sa y robberies a n d bu r gla r ies, un less t h e
sa m ple size is la r ge differ en ces for a n y on e bin cou ld ea sily be du e t o ch a n ce. One would n eed
a ver y differ en t t ype of pr ocedu r e t o est im a t e t h e ‘st a n da r d er r or’ of t wo fu n ctions wit h a sm a ll
sa m ple. But , I wou ld su spect t h a t t h er e wou ld be ma n y bin s for wh ich t h ey wou ld be
in dis t in gu ish a ble (sh own a s t h e t wo fun ction s cr iss -cr ossin g ea ch ot h er ).
In pr eviou s ver sion s of Crim eS tat, t h er e wa s a r est r iction of a t lea st 100 d a t a point s t o
displa y th e ent ire 100 L(t) est ima t es; ot h er wise, th ey were t r u n ca t ed. In t h is version, all 100
in t er va ls a r e a llowed for a n y size s a m ple. H owever , t h er e is a st r ict wa r n in g. Us er s s h ould
be ver y ca u t iou s in dr a win g conclusion s a bout differ en ces in t h e L fun ction wit h sm a ll
sa m ples. E ven wit h sa m ple sizes gr ea t er t h a n 100, t h e im pr ecision of a n y on e L(t ) va lu e is
con sid er a ble. U n t il t h e sa m ple sizes get in t o th e h u n dr eds, p r ecision is a n iss u e for sp ecific
L(t) values.
5.33
is don e or n ot , t h e u ser sh ou ld be a wa r e of t h e in t er a ct ion bet ween fir st -or der a n d secon d-
or der (or loca lized) effect s.
N ea r e st n ei gh b or a s si gn m e n t
5.34
Th u s, t h e logical oper a t ion is ‘n ea r est t o’. If th er e a r e t wo or m ore s econ da r y point s t h a t a r e
exa ctly equ a l, th e a ss ignm en t goes t o th e firs t one on t h e list .
P oi n t -i n -p o lyg on a s si gn m e n t
Most GIS pa cka ges ca n do a poin t -in-polygon oper a t ion but few a llow a n ea r est
n eigh bor a ssign m en t . In gen er a l, t h e t wo a r e sim ilar t h ou gh t h er e will be differ en ces du e t o
t h e irr egula r sh a pe of zon e bou n da r ies. For exam ple, figu r e 5.9 below sh ows a n inciden t t h a t
is w it h in Tr a ffic An a lysis Zon e (TAZ) 0546, bu t is a ctu a lly close r t o th e cen t r oid of TAZ 0547.
Th e cha r a cter ist ics a ss ocia t ed wit h t h is in ciden t a r e m ore lik ely t o be ass ocia t ed wit h t h e
ch a r a ct er is t ics of t h e secon d zon e t h a n t h e zon e t o wh ich it belon gs . Th e decis ion on wh ich
cr it er ia t o u se in a ssign in g t h e in cid en t t o a zon e depen ds on h ow in t egr a l is t h e zon e t o wh ich
it belon gs . If t h e zon es a r e bou n ded by m a jor a r t er ia ls , t h en t r a vel beh a vior wit h in t h e zon e
wit h be defined by t h ose a r t er ia ls; in t h is ca se, it would pr obably be pr u den t t o us e t h e poin t -
in -p olygon a s sign m en t . On t h e ot h er h a n d, if t h e zon e bou n d a r ies a r e n ot a fu n d a m en t a l
sepa r a t ion , t h en t h e n ea r est n eigh bor a ssign m en t wou ld pr oba bly pr odu ce a bet t er fit t o t h e
in cid en t sin ce t h e ch a r a ct er is t ics of t h e clos er zon e a r e lia ble t o h old for t h e in cid en t . In sh or t ,
th e user mu st decide on which t heoretical basis to assign points.
Z on e fi l e
N a m e of a s s i g n ed v a r i a b l e
U s e w e i gh t i n g fi l e
5.35
Figure 5.9:
Incident Assignment
Point in Relation to Traffic Analysis Zone Boundaries and Centroids
% 0542
0543%
0548
%
%
0547
# 0545 # Incident
0546
% N
W E
0 0.5 1 Miles
Th e secon da r y file or a n ot h er file ca n be us ed t o a djus t t h e su m m ed t ot a l. The
weight ing var iable sh ou ld h a ve a field t h a t ident ifies t h e r a t io of t h e t r u e t o t h e m ea su r ed
cou n t for ea ch zon e. A va lu e of 1 in dica t es t h a t t h e su m m ed va lu e for a zon e is equ a l t o t h e
t r u e valu e; h en ce n o a djus t m en t is need ed. A valu e grea t er t h a n 1 indicat es t h a t t h e su m m ed
valu e n eeds t o be ad just ed u pwa r d t o equa l th e t r u e valu e. A valu e less t h a n 1 indicat es t h a t
t h e su m m ed valu e n eeds t o be ad just ed downwa r d t o equa l th e t r u e valu e.
If an oth er file is t o be used for weigh t in g, ind icat e wh et h er it is t h e secon da r y file or, if
a n ot h er file, t h e n a m e of t h e ot h er file.
S a v e r es u lt t o
E xa m ple : As si gn in g Ro bb e ri e s to Zo n e s
To illus t r a t e t h e r ou t ine, t a ble 5.4 sh ows t h e r esu lts of su m m a r izin g 1181 1997
r obber ies t h a t occu r r ed in Ba lt im or e t o 32 5 Tr a ffic An a lysis Zon es. Th e t wo m et h ods a r e
compa red. Only th e first 30 assignm ents a re shown. In genera l, th ey give similar results.
H owever , t h er e a r e differ en ces du e t o t h e m et h od. On e is t h a t t h e n ea r est n eigh bor m et h od
will a ss ign poin t s on t h e ba sis of pr oximit y wh ile t h e point -in-polygon m et h od will n ot. I n t h e
ca se of t h e Ba lt im or e Cou n t y r obber ies, s om e of t h ese wer e a ssign ed t o a Cit y of Ba lt im or e
TAZ becau se t h ose TAZ’s wer e closer , r a t h er t h a n t o a Ba lt im ore Coun t y TAZ. An oth er is t h a t
if a zon e is ver y ir r egu la r , p oin t s m a y be a ssign ed t o it u n der t h e poin t -in -polygon m et h od
wh ich m a y be qu it e far a wa y.
Thus, the u ser ha s to decide which m ethod ma kes th e most sense. If th e purpose is to
a ss ign in ciden t s t o th e zone wh ich it is m ost like ly to be rela t ed, for exa m ple, wh en developin g
a da ta set for zona l modeling (see cha pters 12 and 13), th en th e nearest neighbor m ethod ma y
pr odu ce a bet t er r epr esen t a t ion . Th e in cid en t s a r e t h en a ssign ed t o a zon e wh ich h a s
cha r a cter ist ics t h a t pr obably will be r ela t ed t o th e fact ors cau sin g t h e in ciden t s in t h e firs t
pla ce. On t h e oth er h a n d, if t h e object is t o as sign in ciden t s on t h e ba sis of m em ber sh ip (e.g.,
a ssign in g cr im es t o police pr ecin ct s), t h en t h e poin t -in -polygon m et h od will be t h e m ost
a ccu r a t e.
5.37
Ta ble 5.4
T A Z P o i n t -i n -P o l y g o n N e a r e s t N e i g h b o r
0401 0 0
0402 0 0
0403 1 1
0404 0 0
0405 0 0
0406 0 0
0407 0 0
0408 0 0
0409 0 0
0410 0 0
0411 0 0
0412 0 0
0413 0 0
0414 1 1
0415 0 0
0416 0 0
0417 0 0
0418 0 0
0419 0 0
0420 0 0
0421 0 0
0422 0 1
0423 0 0
0424 1 0
0425 3 0
0426 2 2
0427 3 2
0428 0 0
0429 5 5
0430 0 0
D is ta n c e An a ly s is II
D is ta n c e Ma tri ce s
5.38
Figure 5.10: Distance Analysis II Screen
1. F ir st , t h e dist a n ce bet ween ever y point in t h e pr im a r y file a n d ever y ot h er point
can be calcula t ed in m iles, n a u t ical m iles, feet , kilom et er s or m et er s. Th is is
ca lled th e Within File Point-to-Point m a t r ix (Ma t r ix).
E a ch of t h ese t yp es of m a t r ices ca n be dis pla yed or sa ved t o a n Ascii t ext file for im por t
in t o a n ot h er p r ogr a m . E a ch m a t r ix d efin es in cid en t s by t h e or d er in wh ich t h ey occu r in t h e
files (i.e., Record n u m ber 1 is list ed a s ‘1'; recor d n u m ber 2 is list ed ‘2'; an d s o fort h ). Only a
su bset of ea ch m a t r ix is disp la yed on t h e r esu lt s t a b. H owever , t h er e a r e h orizont a l a n d
vert ica l slider ba r s t h a t a llow t h e u ser t o scroll th r ou gh t h e m a t r ix. The u ser sh ou ld m ove th e
vert ica l slide ba r firs t t o a n a ppr oxim a t e pr oport ion of t h e m a t r ix a n d click t h e Go bu t t on .
Th e m a t r ix will scroll th r ou gh t h e r ows of t h e m a t r ix t o a pla ce wh ich r epr esen t s t h a t
pr oport ion ind ica t ed in t h e slide bar . The u ser ca n t h en scroll across t h e r ows wit h t h e u pper
s lid e ba r .
Th e m a t r ices can be us ed for var iou s pu r poses. The w ith in file point -to-point m atrix
ca n be us ed t o exam ine dist a n ces bet ween pa r t icu lar inciden t s. The saved Ascii ‘.txt’ m atrix
ca n a ls o be im p or t ed in t o a n et wor k pr ogr a m for es t im a t in g t r a n s por t a t ion r ou t es . Th e
prim ary-to-secon d ary file m atrix can be u se d in opt im iza t ion r out in es , for exa m ple in t r yin g t o
a ss es s opt im a l a lloca t ion of police car s in ord er t o min im ize r es pon se t im e in a police dis t r ict.
Th e dis t a n ces t o t h e gr id cells ca n be u sed t o com pa r e t h e dis t a n ces for differ en t dis t r ibu t ion s
t o a cen t r a l loca t ion (e.g., a police st a t ion ). Ther e a r e m a n y app lica t ion s wh er e dist a n ces a r e
t h e pr ima r y un it of a n a lysis. H owever, t h e u ser will n eed oth er soft wa r e t o r ea d t h e files.
Be ca r efu l in ou t pu t t ing dist a n ces, th ou gh, beca u se t h e files will gen er a lly be very
lar ge. F or exam ple, a pr ima r y file of 1000 inciden t s wh en int er pola t ed t o 9000 grid cells (100
colu m n s x 90 r ows) will p r odu ce 9 m illion pa ir ed com pa r is on s. Su ch a file will t a k e a lot of
disk spa ce. For t h a t r ea son, we on ly a llow ou t pu t t o a n Ascii text file.
Th is con clud es t h e discu ss ion of secon d-ord er pr oper t ies. Th e n ext t wo ch a pt er s will
dis cus s t h e ide n t ificat ion of ‘h ot s pot s’ wit h Crim eS tat.
5.40
E n d n ot es for Ch a p t er 5
SQRT[2]
d(dis ) = -------------------------
3 1 /4 SQRT[ N/A ]
2. U n for t u n a t ely, t h e t er m ord er wh en u sed in t h e con t ext of n ea r est n eigh bor a n a lysis
ha s a slight ly different mean ing th an when used as first-ord er com p ar ed to secon d -
ord er st a t ist ics. In t h e n ea r es t n eigh bor con t ext , ord er really mean s n eighbor
wh er ea s in t h e t ype of st a t ist ics con t ext , ord er mean s th e scale of th e stat istics,
globa l or local. Th e u se of t h e t er m s is h ist orical.
3. It m ight be possible to test with a Mont e Car lo simu lation. Tha t is, two separa te
r a n dom sa m ples of 1181 ‘r obber ies’ a n d 6051 ‘bu r gla r ies’ r espectively wou ld be
dr a wn . The n ea r est n eigh bor dist a n ce for ea ch of t h ese sa m ples would be ca lcu lat ed
an d th e rat io of th e two would be ta ken. This experiment would be repeated m an y
t im es (e.g., 1000 or m or e) t o yield a n a ppr oxim a t e 95% confid en ce in t er va l of t h e
r a t io.
5. Beca u se Crim eS tat u ses in dir ect dist a n ce for t h e lin ea r n ea r est n eigh bor ind ex (i.e.
m ea su r em en t on ly in a n h or izon t a l or ver t ica l d ir ect ion ), t h er e is a sligh t dis t or t ion
t h a t can occur if t h e in ciden t s a r e dist r ibu t ed in a dia gon a l m a n n er , su ch a s wit h
St a t e H igh wa ys 26 a n d 150 in F igu r e 5.4. Th e dist ort ion is ver y sm a ll, h owever .
F or exam ple, wit h t h e inciden t s a lon g Sta t e H igh wa y 26, a ft er r ot a t ing t h e inciden t
poin t s so th a t t h ey fell a ppr oxim a t ely in a h or izon t a l or ient a t ion , th e obser ved
a vera ge linea r n ea r est n eigh bor dist a n ce decrea sed s ligh t ly fr om 0.05843 m iles to
0.05061 m iles a n d t h e lin ea r n ea r est n eigh bor in dex beca m e 0.8354 (t =-.91; n ot
significa n t ). In oth er wor ds, t h e effect s of t h e dia gon a l dist r ibut ion lengt h en ed t h e
est ima t e for t h e a vera ge linea r n ea r est n eigh bor dist a n ce by about 41 feet com pa r ed
t o t h e a ct u a l dist a n ces bet ween inciden t s. For a sm a ll sa m ple size, t h is cou ld be
r eleva n t , bu t for a la r ger sa m ple it gen er a lly will be a sm a ll dist ort ion . H owever , if
5.41
a m or e pr ecis e m ea su r e is r equ ir ed, t h en t h e u ser sh ou ld r ot a t ion t h e dis t r ibu t ion
so t h a t t h e in cid en t s h a ve a s clos ely a s possible a h or izon t a l or ver t ica l or ien t a t ion .
An a lterna tive is to calculat e the regular n earest neighbor dista nce but use a
n et wor k for dis t a n ce ca lcu la t ion s (see ch a pt er 3).
8. Note, t h a t sin ce th er e is n ot a for m a l t est of sign ifican ce, th e com pa r ison wit h a n
en velope pr odu ced from a n u m ber of sim u la t ion s p r ovides on ly app r oxim a t e
con fid en ce a bou t wh et h er t h e d is t r ibu t ion d iffer s fr om ch a n ce or n ot . Th a t is , on e
ca n n ot s a y t h a t t h e lik elih ood of obt a in in g t h is r es u lt by ch a n ce is les s t h a n 5%, for
exam ple.
9. Th e ‘gu a r d r a il’ con cept , wh ile frequ en t ly used, is poor m et h odology becau se it
in volves ign or in g d a t a n ea r t h e bou n d a r y of a s t u dy a r ea . Th a t is , p oin t s wit h in t h e
gua r d r a il a r e on ly a llowed t o be select ed by ot h er poin t s a n d n ot , in t u r n , be
a llowed t o select ot h er s. This h a s t h e effect of t h r owing ou t da t a t h a t cou ld be very
im p or t a n t . It is an a logou s t o t h e old , bu t for t u n a t ely n ow d is ca r ded , p ra ct ice of
t h r owin g ou t ‘ou t lier s’ in r egr ession a n a lysis beca u se t h e ou t lier s wer e som eh ow
seen a s ‘n ot t yp ica l’. Th e gu a r d r a il con cept is a ls o poor policin g p r a ct ice sin ce
in cid en t s occu r r in g n ea r a bor d er m a y be ver y im p or t a n t t o a p olice d ep a r t m en t a n d
m a y r equ ir e coor din a t ion wit h a n a dja cen t ju r is dict ion . In sh or t , u se m a t h em a t ica l
adjustm ents for edge corr ections or, failing th at , leave th e data as it is.
5.42