You are on page 1of 41

Location Analytics for your Data Lake

Driving New Business Insights and Outcomes

Philip Russom, Ph.D.


Senior Research Director, TDWI

November 30, 2017


SPONSOR
PHILIP RUSSOM
Senior Research Director for
Data Management,
TDWI
Agenda – Location Analytics
• Background
– Defining the Data Lake
– Market Drivers Location Analytics
– Benefits
for your Data Lake
• Location Data
– New data sources #TDWI @prussom
– Role of the Data Lake
• Location Analytics
– Use cases
– Role of the Data Lake
• Summary and Recommendations
– Links to free, recent publications from TDWI
DEFINING
The Data Lake
• Method for organizing large volumes of
highly diverse data
– For broad data exploration and discovery, plus advanced analytics
– Depending on platform (usually Hadoop) lake may handle many structures
• Tends to persist data in its original raw detailed state
– So data can be repurposed repeatedly for operations and/or analytics
– For analytics needing detailed source: mining, statistics, machine learning
• Supports multiple use cases, in multiple data architectures
– Marketing, sales, healthcare, logistics // EDW, data integration, analytics
– Addresses key data domains: customers, products, location data, etc.
Market Drivers for
The Data Lake
• Many organizations have new
data sources coming online
– Internet of Things (IoT): Sensors on machinery, vehicles, pallets, mobile devices
– Customer channels and touch points. External partners and data aggregators
• Technical teams are under pressure to capture new data and big
data, then develop its business value
– For both analytics and reactive operations
• The data lake captures big data, new data, other diverse sources
– Enables many forms of analytics and self-service data-driven practices
– Provides more granular and immediate monitoring of business operations
BENEFITS OF
The Data Lake
• More complete views of customers and other key business entities
– Location data enriches customer views, event records, operational processes…
• Operations move or react faster, based on fresher, more detailed data
• Self-service data exploration, discovery, prep, visualization…
• Depending on your lake platform, biz value from non-relational data
• Leveraging big data and new data sources, instead of hoarding them
• Data lakes provide infrastructure and method for specific data domains
– Data about customers, logistics, business processes, locations and geocodes…
The Data Lake provides Solutions for the
Challenges of Location Data & its Analytics
LOCATION DATA CHALLENGES DATA LAKE SOLUTIONS
Strewn across many systems Central pool for data containing location info.
Trusted, governed, shared, single truth
Massive volumes, diverse structures Hadoop-based lake for complex info calc’s
Hard to relate to biz processes More data for richer cross-source
correlations of same/similar locations
From batch to real-time ingestion & use Integration infrastructure for all latencies
Costly time-consuming data integration Quickly copy data into lake, mostly unaltered
Proprietary data formats and structure Store original state of data; fix on the fly.
Hadoop-based lake handles diverse data.
Excessive normalization processing Standardize and integrate at read time

SOURCE: TDWI 2017


INTRODUCING
The Location Data Lake
• It’s a data lake, probably implemented on Hadoop,
that serves as a central, shared data store for
diverse data that includes location information.
• Data may come from traditional enterprise
applications, the web, IoT, third parties, etc.
• It may also manage generated datasets: geocodes,
sandboxes, mailing lists, views
• Use cases – analytics (that correlate locations) and
operations (location awareness, operational reports)
Integrate your Location Data Lake with
other Enterprise Systems.
• A data lake rarely exists in a vacuum.
– Most are integrated with larger enterprise data architectures
• A location data lake may be integrated with:
– Data warehouses and/or analytic databases
– Omnichannel marketing and its hybrid data environment
– Digital supply chain, procurement, shipping…
– Asset management, fleet mgt, travel systems, logistics…
• Integration requirements for a location data lake:
– Hefty infrastructure for data and application integration
– Hefty data quality functions and 3rd party location databases
Use Cases for Location Data & Analytics
• Multichannel or Omnichannel Marketing
– The trend is toward collecting more information about each
customer and prospect, to better target and personalize
marketing campaigns, customer service, customer retention,
account growth, and so on. Location data enriches your
understanding of customer behaviors, proximity preferences,
prime locations for specific pitches and recommendations…
• Logistics
– Via location data, shippers can make routes more efficient,
deliveries faster and more accurate, delivery costs lower, and
bottlenecks apparent (so you can avoid them). And it enables
the self-service tracking of shipments by customers.
More Use Cases for Location Data
• First Responders
– Precise geocoding and GPS analytics that dynamically plot
directions can quickly get emergency medical teams and other
first responders to the location where they desperately needed.
• Smart Cities
– Pilot programs are in place in cities around the world, to analyze
traffic patterns and generate models for redirecting traffic to ease
congestion or clear a path for emergency vehicles.
– Other programs make citizens safer and healthier by analyzing
locations (correlated with other data) to understand crime,
vagrancy, graffiti, pollution, noise, drainage, commuter patterns...
Yet More Use Cases for Location Data
• Communications service provider
– Via location data, can verify coverage for a new customer,
optimize tower placement, assess new potential service area
• Insurer
– Via location data, can make enlightened decisions about
underwriting (or create more accurate risk calculations and
actuarial tables), based on characteristics of property, owner,
and points of interest (POIs) in proximity (crime, flood plane)
– Via location data, can detect potential fraud cases, as seen
when a single person or vehicle, etc. is in proximity to
multiple loss and claim locations
DATA QUALITY AND STANDARDIZATION FOR
The Location Data Lake
• Data lakes store RAW data
– Standardize on fly, not upfront (usually)
• Some location data needs light standardization for
– Higher match rates across location fields
– Speed with operational use cases or self-service access
– Compatibility w/embedding location awareness in apps
– Corrections for known bugs creating bogus location data
– Precise point-level geocoding
Not all geocodes are created equally.
• Geocode is preferred standard for describing locations
– GPS tech based on spatial coordinates for longitude and latitude
– Geocodes are the foundation for data quality with location info
• Geocodes add precision – when created with precision
– Codes are imprecise when generated from ZIP, town name, street
address – but precise enough for assigning demographics, etc.
– Codes are precise when a domain expert (i.e., human) checks
them and registers them in a commercially available database
• Some use cases need more than ZIP or street address
– Delivery entrance of factory. Condo within gated community.
– Many rural or undeveloped locations have no street address
A few Requirements for Location Analytics
• Geometric functions to define geographic areas & grids
• Segmentation and aggregate based on X-Y data points
– E.g., boundaries like highways, rivers, topography, districts
– Plotting points of interest. Proximity metrics and triggers.
• Automatic recalculations, as geometrics areas change
– E.g., demographic averages of an area
– Distance and time calculations. Efficiency or risk metrics.
• Correlate and cluster locations that repeat in the data
– Concentrations of customers, accidents, crimes, tornados, diseases
• Geographic maps and analytic visualizations in layers
– Export to tools for GIS, BI, analytics, mapping, mobile…
SUMMARY & RECOMMENDATIONS
Location Analytics for your Data Lake
• Take advantage of location data
– For business value from new, big data
• Create a strategy for managing location data
– Consider the data lake as a scalable data store
– Apply hefty data quality and integration functions
• Embrace use cases in location analytics
– Both operational and analytic use cases
– Leverage new location data; extend existing apps with location data
RECOMMENDED READING
TDWI Publications about Data Lakes
• TDWI 2017 Best Practices Report on Data Lakes
– Bit.ly/DataLakeRpt
• The Data Lake is a Method that Cures Hadoop Madness
– Bit.ly/HadoopMadness
• Busting 5 Myths about Data Lakes
– Bit.ly/LakeMyths
• The Data Lake Manifesto
– Bit.ly/DLManifesto
DAN KERNAGHAN

Big Data Evangelist,


Pitney Bowes
Introduction to the Spatial Data Lake

Dan Kernaghan
Big Data Evangelist, Pitney Bowes

20
Spatial Data Organized by Location
Unlock new insights at massive scale with Native Hadoop processing
capabilities.

Organize. Enrich. Access.

Organize by Location to Increase data insight through Enable Operational and Analytic
transform chaos into clarity. deep location enrichment. Access to Deep Location
Insights.
Typical Spatial Data Lake Deployments
Need to handle two types of locations
Address-based and Exact location

Addresses Exact Location


• Used for locations traced to an • Used for locations traced to a
address device
• Attributes, Limits, Transactions • Devices, Emitters, Metrics
• Events affecting a location • Events originating at a location
Address-Based Spatial Data Lake
Census and
Property • Start with a core address file
Attributes
Admin
Boundaries
– This could be a Customer File or a
comprehensive set of address data
– Add generated Hashkeys
• Add address-based data
Neighborhoo
Core ds and
Address Schools

Insurance File – Standardize, geocode, and generate a


Risk
hashkey for joins
• Add boundary-based data
Appraisals,
Inspections, – Spatially join all polygons to generated
Title, Tax Info
MLS Data
hashkeys for standardized addresses
Grid-Based Spatial Data Lake
• Start with a global or regional grid
– Layer 10: 10m Grid
– Layer 09: 50m Grid
– Layer 08: 250m Grid
– And so on
• Add Location-based data
– Standardize, geocode and align to grid
segments
• Add boundary-based data
– Spatially join all polygons to grid segments
to desired granularity
Pitney Bowes Data Products
Pitney Bowes Big Data SDKs
DEBASHIS RANA

Chief Solution Architect,


RCG Global Services
Building the Spatial Data Lake
& Putting it to Use
DEBASHIS RANA
CHIEF SOLUTION ARCHITECT
30 NOVEMBER 2017

© 2017 RCG. ALL RIGHTS RESERVED. PROPRIETARY AND CONFIDENTIAL. 28


Building the Spatial Data Lake

Ingest Curate Deliver


• Variety • Conform • Visualize
• Velocity • Refine • Expose
• Volume • Publish • Provision

AT SCALE
© 2017 RCG. ALL RIGHTS RESERVED. PROPRIETARY AND CONFIDENTIAL. 29
Conceptual Architecture
Spatial Data Lake
ENTERPRISE DATA ANALYTICS & BI

DB
Curate
Flat
File

Deliver
Ingest
s
RAW CONFORMED REFINED PUBLISHED DATA SCIENCE
EXTERNAL DATA
Source-like Cleansed Object- Purpose-
structured specific
Enterprise ID

Enriched APPLICATIONS

RCG|enable™ Accelerator Frameworks

Pitney Bowes Big Data SDKs

Pitney Bowes Data Products

Data Management, Governance & Security

© 2017 RCG. ALL RIGHTS RESERVED. PROPRIETARY AND CONFIDENTIAL. 30


Insurance – Commercial Underwriting
REPRESENTATIVE USE CASE
CAPABILITIES
• Single source for all data
• Deeper quote history
• Nearby risk
• Enrichment data
BENEFITS
• Better risk accuracy
• Fewer physical inspections
• Faster quote turnaround

© 2017 RCG. ALL RIGHTS RESERVED. PROPRIETARY AND CONFIDENTIAL. 31


Insurance – Claims
REPRESENTATIVE USE CASE
CAPABILITIES
• Single source for all data
• Event analysis
• Actor analysis
BENEFITS
• Holistic understanding
• Better disposition
• Enhanced fraud detection
• Shorter processing time

© 2017 RCG. ALL RIGHTS RESERVED. PROPRIETARY AND CONFIDENTIAL. 32


Insurance – Loss Prevention
REPRESENTATIVE USE CASE
CAPABILITIES
• Single source for all data
• Multiple risks
• Event monitoring & analysis
• Proactive alerts
BENEFITS
• Fewer claims
• Greater customer satisfaction
• Auto-generated claims

© 2017 RCG. ALL RIGHTS RESERVED. PROPRIETARY AND CONFIDENTIAL. 33


Retail
REPRESENTATIVE USE CASES
CAPABILITIES
• Single source for all data
• Locations
• Transportation & logistics
• Inventory & pricing
• In-store locator
• Social
BENEFITS
• Operational efficiency
• Actual foot traffic data
• Richer customer interactions
© 2017 RCG. ALL RIGHTS RESERVED. PROPRIETARY AND CONFIDENTIAL. 34
Entertainment & Hospitality
REPRESENTATIVE USE CASES
CAPABILITIES
• Single source for all data
• Locations
• Transportation & logistics
• Inventory & pricing
• In-resort locator
BENEFITS
• Operational efficiency
• Actual foot traffic data
• Richer guest interactions
© 2017 RCG. ALL RIGHTS RESERVED. PROPRIETARY AND CONFIDENTIAL. 35
Healthcare
REPRESENTATIVE USE CASES
CAPABILITIES
• Single source for all data
• Patient monitoring
• Plan administration
• Benchmarks
• Network coverage
BENEFITS
• Holistic patient view
• Avoid/reduce readmissions
• Provider capacity planning
© 2017 RCG. ALL RIGHTS RESERVED. PROPRIETARY AND CONFIDENTIAL. 36
Guiding Principles & Lessons Learned
1. Don’t create data silos and interoperability issues
2. Evolve via “outside-in” increments that solve business problems
3. Leverage accelerators for fast, easy, secure data management
4. Establish new data management processes for success
5. Strengthen governance processes organically
6. Integrate location data into the overall enterprise architecture
7. Use advanced analytics and machine learning
8. Focus on strategy, organization, and culture to foster adoption

© 2017 RCG. ALL RIGHTS RESERVED. PROPRIETARY AND CONFIDENTIAL. 37


Our reputation is built upon the premise that
we are a company that listens
We bring a creative view to
Our Promise your business initiative
We are collaborative and accountable as we
jointly create your solution
We continuously innovate from concept to
result and help you affect business change
Ideas. Realized.® There will be no surprises.
© 2017 RCG. ALL RIGHTS RESERVED. PROPRIETARY AND CONFIDENTIAL. 38
QUESTIONS?

tdwi.org
CONTACT INFORMATION
If you have further questions or comments:

Philip Russom, TDWI


prussom@tdwi.org
Dan Kernaghan, Pitney Bowes
daniel.kernaghan@pb.com
Debashis Rana, RCG Global Services
debashis.rana@rcggs.com

tdwi.org
UPCOMING TDWI EVENTS

DECEMBER EVENTS
TDWI ORLANDO 2017
Royal Pacific Resort, Orlando, FL
December 3‒8

LEADERSHIP SUMMIT ORLANDO


Emerging Trends and Leadership for Advanced
Analytics
Royal Pacific Resort, Orlando, FL
December 4‒5

TRANSFORMING DATA WITH INTELLIGENCETM

You might also like