You are on page 1of 25

Spatial Big Data

Shashi Shekhar
McKnight Distinguished University Professor
Department of Computer Science and Engineering, University of Minnesota
www.cs.umn.edu/~shekhar

AAG-NIH Symp. on Enabling a National Geospatial Cyberinfrastructure for Health Research (July 2012)

More details in S. Shekhar et al., Spatial Big Data Challenges Intersecting Mobility and Cloud Computing, ACM
SIGMOD Workshop on Data Engineering for Wireless and Mobile Access, 2012.
Research Theme 1: Spatial Databases

Evacutation Route Planning

Parallelize
Range Queries

only in old plan


Only in new plan
In both plans
Shortest Paths Storing graphs in disk blocks
Theme 2 : Spatial Data Mining

Location prediction: nesting sites Spatial outliers: sensor (#9) on I-35


Nest locations Distance to open water

Vegetation durability Water depth

Co-location Patterns Tele connections


Outline

• Motivation
• What is Spatial Big Data (SBD)?
• SBD and Science
• SBD Analytics
• Conclusions

4
Big Data

Mining and analyzing these big new data sets can open
the door to a new wave of innovation, accelerating
productivity and economic growth. Some economists,
academics and business executives see an opportunity to
move beyond the payoff of the first stage of the Internet,
which combined computing and low-cost
communications to automate all kinds of commercial
transactions.

Estimated Value >Usd 1 Trillion per year by 2020


Location-based service: usd 600 B
Health Informatics: usd 300 B
Manufacturing:

Spatial Big Data Definitions
• Spatial datasets exceeding capacity of current computing systems
• To manage, process, or analyze the data with reasonable effort
• Due to Volume, Velocity, Variety, …

• SBD Components
• Data-intensive Computing: Cloud Computing
• Middleware, e.g., Map-Reduce, Pregel, Big-Table, …
• Big-Data analytics, e.g., data mining, machine learning, computational statistics, …
• Big Data science and societal applications
• Ex. Social media datasets, e.g., Google Flu Trend
• Which patterns may be detected in these datasets?
• Flu outbreaks ?

6
Traditional Spatial Data
 Spatial attribute:
Neighborhood and extent
Geo-Reference: longitude, latitude, elevation
 Spatial data genre
Raster: geo-images e.g., Google Earth
Vector: point, line, polygons
Graph, e.g., roadmap: node, edge, path Raster Data for UMN Campus
Courtesy: UMN

Graph Data for UMN Campus


Courtesy: Bing Vector Data for UMN Campus 7
Courtesy: MapQuest
Raster SBD

 Data Sets >> Google Earth


Geo-videos from UAVs, security cameras
Satellite Imagery (periodic scan), LiDAR, …
Geo-sensor networks
Climate simulation, EPA Air Quality
 Example use cases LiDAR & Urban Terrain
Patterns of Life
Change detection, Feature extraction, Urban terrain

Average Monthly Temperature

Feature Extraction (Courtsey: Prof. V. Kumar) 8


Change Detection
Use Case: Patterns of Life, e.g., activity space
 Weekday GPS track for 3 months Work Farm
Patterns of life
Activity Space: Usual places and visits
Rare places, Rare visits
Morning Afternoon Evening Midnight Total
7am – 12noon – 5pm 5pm – 12midnight –
12am 12pm 7pm

Home 10 2 15 29 54

Work 19 20 10 1 50

Club 4 5 4 15
Home Club
Farm 1 1

Total 30 30 30 30 120

9
Vector SBD from Geo-Social Media
 Vector data sub-genre
Point: location of a tweet, Ushahidi report, checkin,

Line-strings, Polygons: roads in openStreetMap
 Use cases: Persistent Surveillance
Outbreaks of disease, Disaster, Unrest, Crime, …
Hot-spots, emerging hot-spots
Spatial Correlations: co-location, teleconnection

10
Persistent Surveillance at American Red Cross

• Even before cable news outlets began reporting the tornadoes that ripped through Texas on
Tuesday, a map of the state began blinking red on a screen in the Red Cross' new social
media monitoring center, alerting weather watchers that something was happening in the
hard-hit area. (AP, April 16th, 2012)

11
Graphs SBDs: Temporally Detailed
 Spatial Graphs, e.g., Roadmaps, Electric grid, Supply Chains, …
Temporally detailed roadmaps [Navteq]
 Use cases: Accessibility by time of week, Best start time, Best route at
different start-times

12
Outline

• Motivation
• What is Spatial Big Data (SBD)?
• SBD and Science
• SBD Analytics
• Conclusions

13
Big Data and Science
Science in the Petabyte Era –
• Increasing Volume
• Heightened Complexity
• Demands for Interoperability

Nature, 7209(4), September 4, 2008

"Above all, data on today's scale require scientific and


computational intelligence. Google may now have its critics, but no
one can deny its impact, which ultimately stems from the cleverness
of its informatics. The future of science depends in part on such
cleverness again being applied to data for their own sake,
complementing scientific hypotheses as a basis for exploring today's
information cornucopia."
Preparing Science for Big-Data
Nature, 7209(4), September 4, 2008

Big Data Translates into Big Opportunities...


and Big Responsibilities

Sudden influxes of data have transformed researchers' understanding of


nature before — even back in the days when 'computer' was still a job
description.

Unfortunately, the institutions and culture of science remain rooted


in that pre-electronic era. Taking full advantage of electronic data
will require a great deal of additional infrastructure, both technical
and cultural
Models in Science

Science: understand natural world


Subjective  Objective, (transparent, reproducible)
Methods: Forward models, Backward models
Engineering: Solve problems optimizing cost, efficiency, etc.

Models Manual (Paper, Assisted by computers (HPCC, cyber-


Pencil, Slide-rules, infrastructure, data-intensive, big-
log-tables, …) data)
Forward Differential Computational Simulations using D.E.s,
Equations (D.E.), Agent-based models, etc.
Algebraic equations,

Backward Parametric models, Bayesian: resampling, local regression,
e.g. Regression, MCMC, kernel density estimation,
Correlations, neural networks, generalized additive
sampling, models, …
Experiment design, Frequentist: frequent patterns, Model
Hypothesis testing, ensembles, hypothesis generation, …
… Exploratory Data Analysis: data
visualization, visual analytics,
geographic information science, spatial
data mining, …
Outline

• Motivation
• What is Spatial Big Data (SBD)?
• SBD and Science
• SBD Analytics
• SBD Infrastructure
• Conclusions

17
Pre-Electronic Era Models: Example 1

 1854 Cholera in London


Broad St. water pump except a brewery

 Recent Decades
Proximity vs. Accessibility
From Hotspots To Mean Streets
• Complication Dimensions
• Spatial Networks
• Time

• Challenges: Trade-off b/w


• Semantic richness and
• Scalable algorithms

19
Innovative Technique: K Main Routes (KMR)
Summarizes Urban Activities

KMR Routes (10) – thick lines, Crimestat K-Means (10) – ellipses,


Roads – gray lines, Burglaries - points
Pre-Electronic Models: Example 2
 Location Prediction
Models to predict location, time, path, …
Nest sites, minerals, earthquakes, tornadoes, …
 Pre-electronic models, e.g. Regression
 Assumed i.i.d
 To simplify parameter estimation
 Least squares – easy to hand-compute
 Alternatives y  ρWy  xβ  ε
 Spatial Autoregression,
 Geographic Weighted (Local) Regression
 Parameter estimation is compute-intensive!
n ln( 2 ) n ln( 2
)
ln( L)  ln I  W    SSE
 Next 2 2
 Non-i.i.d errors: Distance based
 Spatio-temporal vector fields (e.g. flows, motion)
Example 3: Global vs. Local Regression
 Example: Lilac Phenology data
Yearly date of first leaf and first bloom
1126 locations in US & Canada
 ―Global‖ regression model shows a mystery
Postive Slope => blooms delayed in recent years!
 Spatial decomposition solves the mystery
East of Mississippi, West of Mississippi
Each half has Negative Slope => blooms earlier in recent years!
However slopes are different across east & west
More reports in west in recent years

River
Station
Outline

• Motivation
• What is Spatial Big Data (SBD)?
• SBD and Science
• SBD Analytics
• Conclusions

23
Spatial Big Data (SBD) Summary
 SBD are becoming available
Geo-social Media, Geo-Sensor Networks, Geo-Simulations, VGI, …
 Big Opportunities
Data: Quicker detection of disease outbreaks, e.g., Google Flu Trends
 Multi-decade large-area studies, e.g., Gulf Study, Exposomics, …
Intervention:
 How can geo-social network induce desired behavior?
 Health effects of friends, e.g., smoking, drinking, exercise, nutrition, optimism, …
Large scale Collaboration on Complex Questions
 Studies with thousands of doctors and hundred million humans
 ... and Big Responsibilities
Institutions and culture of science remain rooted in that pre-electronic era.
 Ex. Hotspots to Mean Streets
Big data exceeding capacity of traditional systems

24
CCC Workshop: Spatial Computing Visioning (9/10-11/2012)
http://cra.org/ccc/spatial_computing.php

25

25

You might also like