You are on page 1of 16

in  conjunc(on  with  

Data Management & Warehousing

http://www.datamgmt.com

What  is  the  Spa(al  Module?  
•  It’s  the  ability  to  analyse  informa(on  in  a   geographic  context:  
–  Where  is  the  nearest  petrol  sta(on?   –  Which  road  am  I  on?   –  How  many  ATMs  are  in  this  area?  

•  It’s  not  maps  and  images  
–  These  come  later  with  tools  that  help  present  the   informa(on  
Wednesday,  July  28,  2010   ©  2010  Data  Management  &  Warehousing   2  

The  three  types  of  data  &  many  ques(ons  
•  Points  
–  OS  Grid   –  La(tude  &  Longitude    

•  Lines  
–  Pairs  of  points   –  e.g.  Road  Segments  

•  Polygons  
–  A  series  of  points  that   define  a  boundary   –  e.g.  Postcode  Boundaries  
Wednesday,  July  28,  2010  

•  How  close  are  two   points?   •  Does  a  point  touch  a   line?   •  Is  a  point  inside  or   outside  a  polygon?   •  Does  a  line  cross  a   polygon?   •  How  many  points  are  in   a  polygon?  
3  

©  2010  Data  Management  &  Warehousing  

Using  Spa(al  Data  Is  Complex  
•  Different  distances   between  points  at   different  longitudes  and   la(tudes   •  Measurement  over  a   curved  irregular  surface   •  Mul(ple  input  and  output   formats   •  Mul(ple  co-­‐ordinate   systems  see: A  Guide  to  Coordinate   Systems  in  Great  Britain    
Wednesday,  July  28,  2010   ©  2010  Data  Management  &  Warehousing   4  

Sources  of  Informa(on  –  GPS  
•  In  Car  Device  
–  Sends  frequent  data  sets  to   processing  centre   –  Point  Data   –  Aggregate  Data  
•  Speed,  Direc(on,     Loca(on  and  G-­‐force   •  Speed  and  Direc(on  

•  Other  Devices  

–  Sat  Nav  Systems   –  Smart  Phone  Apps     e.g.  ‘GPS  Tracker’   –  Cameras  
©  2010  Data  Management  &  Warehousing   5  

Wednesday,  July  28,  2010  

Sources  of  Informa(on  –  Ordnance  Survey  
•  Integrated  Road  Network:   A  series  of  3  million   ‘linestrings’  and  17  million   points  that  describe  every   road  in  the  UK   •  Linestrings  have  between  2   and  655  points,  most  have   less  than  10   •  23  points  for  this  picture      
Wednesday,  July  28,  2010   ©  2010  Data  Management  &  Warehousing   6  

Sources  of  Informa(on  –  Post  Office/GAdm  
•  Postal  Address  File:   A  series  of  c.1.75M  UK   postcodes  
–  Postcode  Boundaries     –  Over  28M  complete   addresses  

•  Global  Admin  Boundaries  
–  Na(onal  and  regional   boundaries  for  c.245   countries   –  hgp://www.gadm.org    
Wednesday,  July  28,  2010   ©  2010  Data  Management  &  Warehousing   7  

Data  Layers  –  Enriching  what  you  have  
•  Data  Layers  are  sets  of  informa(on  (ed  to  a   geographic  point  
–  Road  Speed  for  a  given  road  segment   –  ATM  Loca(on   –  House  Price  for  a  postcode  

•  Where  data  has  loca(on  informa(on  it  is   known  as  ‘Geo-­‐tagged’  

Wednesday,  July  28,  2010  

©  2010  Data  Management  &  Warehousing  

8  

Data  Layer  Sources  (1)  
•  Ordnance  Survey  
–  Road  Types,  Limits,  Closures,  etc.  

•  Government  
–  UK  Government  now  providing  masses  of     geo-­‐tagged  info  (hgp://data.gov.uk)  

•  Met  Office  /  HM  Nau(cal  Almanac  Office    
–  Weather,  Daylight  to  Postcode  Level  

Wednesday,  July  28,  2010  

©  2010  Data  Management  &  Warehousing  

9  

Data  Layer  Sources  (2)  
•  Wikipedia  
–  Geo-­‐tag  Access  API  –  what’s  nearby?   –  Road  level  photographic  images   –  Fast  Food  Outlets,  Supermarkets,  Petrol  Sta(ons,  ATMs,   etc.  

•  Google  Maps  

•  Commercial  Sources  

•  Massive  growth  in  both  commercial  and  public  domain   geo-­‐tagged  data  

Wednesday,  July  28,  2010  

©  2010  Data  Management  &  Warehousing  

10  

Issues  with  Geo-­‐tagged  data  
•  Geo-­‐tagging  uses  different  formats   •  Geo-­‐tagging  at  different  levels  
–  Longitude  &  La(tude,  OS  Grid  Reference,  etc   –  Data  for  a  postcode  or  a  an  en(re  county  which  makes   it  difficult  to  compare   –  Rate  of  change  of  fine  detail  data  is  very  high     –  e.g.  OS  issues  monthly  updates  to  the  UK  mapping   –  XML  &  CSV,  different  file  formats,  etc.    
©  2010  Data  Management  &  Warehousing   11  

•  Geo-­‐tagging  coverage  is  patchy  and/or  historic   •  Mul(ple  standards  and  formats  

Wednesday,  July  28,  2010  

Our  Model  For  Delivering  Spa(al  Data  
Source   Source   Source   Source   Source   Source  
2  

Spa(al  Analysis    (Proximity,  Contains,  Excludes)  

1  

(Small)   Postgres   Database  

3  

4  

5  

Wednesday,  July  28,  2010  

©  2010  Data  Management  &  Warehousing  

Query  &  Presenta(on  Tools   (Tableau,  Google  Maps,  etc.)  
12  

Spa(al  Presenta(on   (Sets  of  data  with  spa(al   agributes)  

1.  2.  3.  4.  5. 

Load  Mul(ple  File  Formats   Standardise  Geo-­‐Tagging   Extract  &  Load  CSVs   Perform  Spa(al  Analysis   Create  User  Access  Area  

Netezza  

Netezza  Spa(al  Value  Add  
•  Netezza  Spa(al  is  fast  
–  Analysis  
•  Look  up  a  typical  18  point   trip  in  the  3M  linestrings  to   find  the  roads  that  the   vehicle  was  on  in  less  than   1  second   •  Overnight  batch  process  of   300,000  points  to  matching   road  names  in  under  30   minutes   •  Tools  rely  on  fast  query   access  to  render  any   queried  map  with  sub-­‐ second  response  (mes  

•  Netezza  Spa(al  is  easy  

–  Distance  and  proximity   calcula(ons  are  simple   –  ‘Touches’,  ‘Overlaps’  &   ‘Contains’  queries  allow   instant  value  add     –  Works  well  with  Tableau   –  Easy  to  generate  KML  for   use  with  Google  Earth  and   Google  Maps  

•  Netezza  Spa(al  integrates  

–  Presenta(on  

Wednesday,  July  28,  2010  

©  2010  Data  Management  &  Warehousing  

13  

Netezza  Spa(al  Limita(ons  
•  Fails  the  Slar(barpast  Test:  
–  Polygons  for  very  detailed  maps   are  too  big  to  be  loaded  as   Netezza  limits  the  maximum   block  size  to  64000  characters   –  Named  aqer  the  Hitch-­‐Hikers   Guide  to  the  Galaxy  coastline   designer  responsible  for  the   twiddly  bits  around  the   Norwegian  rords   –  Use  regional  boundaries  (e.g.   UK  Coun(es,  US  States,  etc.)   and  then  aggregate  into   na(onal  boundaries   –  If  a  point  is  in  Berkshire  then  by   defini(on  it  is  also  in  England  
©  2010  Data  Management  &  Warehousing  

Norway  

•  Work-­‐around:  

Slar(barpast  

Wednesday,  July  28,  2010  

Page  14  

Current  Uses  …  
•  •  •  •  •  •  M/A/B  road  driving  profiles   Time  of  day  driving  profiles   Speed  Limits  vs.  Driven  Speed   Matching  GPS  posi(ons  to  road  names   Out  of  bounds  driving   Customer  Demographic  Profiles    …  but  this  is  only  the  start  in  a  very  short  (me  
©  2010  Data  Management  &  Warehousing   15  

Wednesday,  July  28,  2010  

in  conjunc(on  with  

Data Management & Warehousing

http://www.datamgmt.com