You are on page 1of 50

DMM117 SAP HANA Processing Services:

Text, Spatial, Graph, Series, and Predictive

Public
Speakers

Las Vegas, Sept 19 - 23 Bangalore, October 5 - 7 Barcelona, Nov 8 - 10

Anthony Waite Priyanka Nalakath Markus Fath


May Chen M S Poornapragna Anthony Waite

2016 SAP SE or an SAP affiliate company. All rights reserved. Public 2


Disclaimer

The information in this presentation is confidential and proprietary to SAP and may not be disclosed without the permission of
SAP. Except for your obligation to protect confidential information, this presentation is not subject to your license agreement or
any other service or subscription agreement with SAP. SAP has no obligation to pursue any course of business outlined in this
presentation or any related document, or to develop or release any functionality mentioned therein.

This presentation, or any related document and SAP's strategy and possible future developments, products and or platforms
directions and functionality are all subject to change and may be changed by SAP at any time for any reason without notice.
The information in this presentation is not a commitment, promise or legal obligation to deliver any material, code or functionality.
This presentation is provided without a warranty of any kind, either express or implied, including but not limited to, the implied
warranties of merchantability, fitness for a particular purpose, or non-infringement. This presentation is for informational
purposes and may not be incorporated into a contract. SAP assumes no responsibility for errors or omissions in this
presentation, except if such damages were caused by SAPs intentional or gross negligence.

All forward-looking statements are subject to various risks and uncertainties that could cause actual results to differ materially
from expectations. Readers are cautioned not to place undue reliance on these forward-looking statements, which speak only
as of their dates, and they should not be relied upon in making purchasing decisions.

2016 SAP SE or an SAP affiliate company. All rights reserved. Public 3


Agenda

Introduction: a platform to analyze various data types


Text
Spatial
Graph
Series
Numbers

2016 SAP SE or an SAP affiliate company. All rights reserved. Public 4


Introduction

Public
Example scenarios

Public Security Insurance

Generate real-time intelligence from Analyze the impact of natural disasters


multiple sources from many perspectives

Case management, activities, Policy data, locations


master data
News/media
Social media
Satellite imagery
Phone monitoring
Business networks
Traffic data

2016 SAP SE or an SAP affiliate company. All rights reserved. Public 6


SAP HANA The Platform Powers the Digital Transformation

SAP HANA PLATFORM


ON-PREMISE | CLOUD | HYBRID

2016 SAP SE or an SAP affiliate company. All rights reserved. Public 7


Text

Public
What types of text processing capabilities are supported?

Full-text search
In addition to string matching, SAP HANA features full-text search which works on content stored in tables or exposed via
views. Just like searching on the Internet, full-text search finds terms irrespective of the sequence of characters and words.

Text analysis
Capabilities range from basic tokenization and stemming to more complex semantic analysis in the form of entity and fact
extraction. Text analysis applies within individual documents and is the foundation for both full-text search and text mining.

Text mining
Text mining makes semantic determinations about the overall content of documents relative to other documents. Capabilities
include key term identification and document categorization. Text mining is complementary to text analysis.

2016 SAP SE or an SAP affiliate company. All rights reserved. Public 9


Full-text search

SAP HANA provides an in-database search


engine
Supports 32 languages and handles binary file
formats
Modeling tools for search
Search queries via built-in procedure, SQL, and
OData
Linguistic and fuzzy (error tolerant) search

2016 SAP SE or an SAP affiliate company. All rights reserved. Public 10


Full-text index and full-text search

CREATE COLUMN TABLE "RESEARCH_PAPERS" (


"ID" INTEGER PRIMARY KEY,
"AUTHOR" NVARCHAR(200),
"MIMETYPE" NVARCHAR(200),
"DOCUMENT" BLOB insert Full Text
Indexing
); ID DOC

Full Text Index


CREATE FULLTEXT INDEX "FTI_RESEARCH_PAPERS_DOCUMENT"
ON "RESEARCH_PAPERS"("DOCUMENT")
;

SELECT "ID", "AUTHOR", "DOCUMENT"


FROM "RESEARCH_PAPERS"
WHERE CONTAINS(
("AUTHOR", "DOCUMENT"),
'roberd software', FUZZY(0.8)
);
2016 SAP SE or an SAP affiliate company. All rights reserved. Public 11
Search models

In a search model you define the structure of


your search object and how it is exposed to
an application
Access
Tables and joins
Columns
Default columns for search
Weights for ranking Model

Fuzziness
Default columns for facets
Table
Table

2016 SAP SE or an SAP affiliate company. All rights reserved. Public 12


Search models and data access

UI
CALL ESH_SEARCH (query,?);
Built-in procedure to search on multiple search
models with an OData query and a JSON JSON

response SQL

CALL ESH_CONFIG (config); search annotations


search annotations
Built-in procedure to add search annotations *any* View
(request/response, facets, UI areas etc.) to views

Table
Table

2016 SAP SE or an SAP affiliate company. All rights reserved. Public 13


Text analysis

SAP HANA provides in-database text analysis


Linguistic analysis
Entity extraction
e.g. persons, organizations
Fact extraction
e.g. sentiments, mergers & acquisitions
Grammatical role analysis
subject-predicate-object
Custom dictionaries and rules for domain adaptation
e.g. chemical substances, product launch

2016 SAP SE or an SAP affiliate company. All rights reserved. Public 14


Text analysis

Text Analysis as an optional processing step Text Analysis on non-persisted data


on top of full-text indexing

Text
Text Analysis
Results
insert Full Text
Full Text
Indexing
Indexing
ID DOC with TA

Text
Full Text Index

Text Analysis
Analysis
Results Extended
Table Application Services

SAP HANA SAP HANA

2016 SAP SE or an SAP affiliate company. All rights reserved. Public 15


Text analysis
advanced configuration options

Custom dictionaries for domain specific entity


Standard Variant Type
extraction
Form
Dictionaries are stored in repository
Arnold Arnie American Film
Updates to dictionaries are considered immediately Schwarzenegger Actor
Sylvester Stallone Sly American Film
Actor
SAP SE SAP AG Company

2016 SAP SE or an SAP affiliate company. All rights reserved. Public 16


Text analysis
advanced configuration options

Custom rules for domain specific fact extraction


Rules are stored in repository
Updates to rules are considered immediately
*
type: stem: type: for:
Rule elements company acquire, company currency
buy
Tokens, stems, part-of-speech tags
Iteration operators
Wildcards, alternation, negation SAP acquired Sybase for $5.8 billion
Character classifiers (case-sensitivity)
IBM buys Softlayer for $2 billion
Grouping and containment (regEx)

2016 SAP SE or an SAP affiliate company. All rights reserved. Public 17


Text analysis
using text analysis results

Search-based applications Result list item 1


this is the abstract of the

Include text analysis results in a search model for navigation and


document shown in line 1

Result list item 1


this is the abstract of the
document shown in line 1

filtering Result list item 1


this is the abstract of the
document shown in line 1

Analytics
Simple calculations like term frequencies and co-occurrence
Clustering, topic modeling or other text mining techniques
R, Predictive Analysis Library (PAL) functions

Geotagging
Assign longitude/latitude coordinates to location entities

Graph Analysis
Store co-occurrences or semantic triples as graph for pattern
matching, reasoning etc.

2016 SAP SE or an SAP affiliate company. All rights reserved. Public 18


Text mining

SAP HANA provides in-database text mining


Identify similar documents
tn
Identify key terms of a document
Identify related terms
Categorize new documents based on a training corpus d2
d1
Scenarios
Highlight the key terms when viewing a patent document
t1
Identify similar incidents for faster problem solving
Categorize new scientific papers along a hierarchy of topics

2016 SAP SE or an SAP affiliate company. All rights reserved. Public 19


Text mining

The text mining table is built from the results of


linguistic analysis.
Essentially, it is a large term-document matrix. insert
Full Text
Full Text
Indexing
Indexing
with TA
The matrix is fully accessible for custom ID DOC and TM
algorithms.

Full Text Index


Text Text
Analysis Mining
Table Table

2016 SAP SE or an SAP affiliate company. All rights reserved. Public 20


Text mining

Text mining functions


Related documents
Relevant terms Text Mining
Related terms .js API

Classify kNN Extended


Application Services TM SQL
and more

Text Mining
Tables

SAP HANA

2016 SAP SE or an SAP affiliate company. All rights reserved. Public 21


Spatial

Public
DMM270 (H2)

Spatial i Spatial Analytics with SAP HANA

SAP HANA provides native spatial data processing


Store 2D and 3D vector datatypes
50+ geospatial functions and algorithms
Geocoding and reverse geocoding
Geo content (GAB) and mapping services
Open standards (OGC, 1999 SQL/MM)
SDK for custom geospatial algorithms
Bulk and streaming data integration capabilities

Integration with Esri, Pitney Bowes, HERE and more

2016 SAP SE or an SAP affiliate company. All rights reserved. Public 23


Geographic data
Categories

Vector data Raster data


Point, Linestring, Polygon, MultiPoint, Gridded data
e.g. digital terrain elevation, weather information
Networks, Topologies, Point Clouds,
Image data
Metadata
spatial reference systems (SRS)
e.g. created from optical or spectral sensors
unit of measures (UOM) Metadata
Raster- and grid information
Spatial- and band reference system

14 35 25
17 39 59
16 15 17

Point Linestring Polygon CircularString

2016 SAP SE or an SAP affiliate company. All rights reserved. Public 24


Spatial predicates

g1 g2 g1
g2
g2
g1 g2 g1
g2 g2
g1

g1.ST_Within(g2) g1 g1 g2
g2
g1 g2 = g1 I(g1) E(g2) = g1

g2 g2 g1.ST_Crosses(g2)
g1.ST_Contains(g2) g1
g2 I(g1) I(g2) (g1 g2 g1) (g1 g2 g2)]
g1 g2 = g2 I(g1) I(g2)

g1
g1
g1.ST_Intersects(g2) g2 g1.ST_Touches(g2)
g1
g1
g1 g2 (g1 g2 ) (B(g1) B(g2) = )

g1 g2
g1
g1 g1.ST_Overlaps(g2)
g2 g2
g1 g2 (I(g1) I(g2) )
g2 (I(g1) E(g2) )
g1.ST_Covers(g2) * g1.ST_Disjoint(g2)
g1.ST_Equals(g2)
g1 g2 (E(g1) I(g2) )
g1 g2 = g2 g1 g2 =
g1 = g2 * No OGC standard

2016 SAP SE or an SAP affiliate company. All rights reserved. Public 25


Spatial clustering and joins

Clustering - grid, k-means, dbscan


SELECT ST_ClusterId() AS CID, ST_ClusterCentroid() AS
CENTROID, COUNT(*) AS C
FROM "RESEARCH_ORGANIZATIONS"
spherical clusters non-spherical clusters
GROUP CLUSTER BY "LON_LAT"
USING KMEANS CLUSTERS 5;

Join
SELECT *
FROM "RESEARCH_ORGANIZATIONS" AS T1,
"PROJECT_LOCATION" AS T2
WHERE T2."LON_LAT".ST_DISTANCE(
T1."LON_LAT", 'kilometer
) <100;

2016 SAP SE or an SAP affiliate company. All rights reserved. Public 26


Spatial joins in Calculation View modeler

2016 SAP SE or an SAP affiliate company. All rights reserved. Public 27


Spatial
Geocoding
Geocoding service,

SAP HANA supports geocoding, reverse e.g. HERE

geocoding, and address cleansing.


This data transformation/ enrichment can
either run local (reference data is stored
in HANA) or via a remote service.
Local geocoding and address cleansing Geocode
Longitude,
Address Data transform or
Latitude
is handled by SAP HANA smart data geocode index
quality.

Geocode reference data

SAP HANA

2016 SAP SE or an SAP affiliate company. All rights reserved. Public 28


Spatial
Geo content and services

SAP HANA includes HERE mapping content


and services
Mapping services API/SDK mapping service

Map content for generalized administration


boundaries (GAB) and postcode areas (POC)

map
content

SAP HANA

2016 SAP SE or an SAP affiliate company. All rights reserved. Public 29


Sample spatial clients

Esri ArcGIS Esri ArcGIS SAP Business Native SAP UI5


Desktop Portal Objects Cloud app
Query
Layer

Map
Service
ODBC
Esri ArcGIS
Server shapefile
upload

ODBC
Extended
Application Services
SAP HANA

2016 SAP SE or an SAP affiliate company. All rights reserved. Public 30


Graph

Public
DMM212 (L1)

Graph i SAP HANA Graph Processing:


Information and Demonstration

SAP HANA provides a native graph engine


property graph model
full transactional (ACID) properties
basic graph functions like shortest path and strongly
connected components
native graph viewer
tightly integrated in SAP HANA operations (security, backup
etc.)

Benefits
Store and analyze graph data in real-time
Tools and graph algorithms to navigate and extract insight
from relationship data
Combine text, spatial, and advanced analytics with
relationship intelligence

2016 SAP SE or an SAP affiliate company. All rights reserved. Public 32


Property graph

Powerful and flexible property graph model


vertices (nodes) and edges (relationships) tables
vertices connected via multiple edges of any type
dynamic graph workspace view

Up-to-date insights without replicating data


Workspace
Enhance graph semantic by adding new
attributes to vertices and edges
Vertices Edges

Key Name Birthdate Key Source Target Type

Herman Herman Hesse 19270530 1 Maria Herman hasSon

Samuel Samuel Becket 19281001 2 Maria Samuel hasSon

2016 SAP SE or an SAP affiliate company. All rights reserved. Public 33


Graph algorithms

Neighborhood Search Shortest Path Strongly Connected Pattern Matching


Components

Cronus Hera Aphrodite Artemis

Gaia Poseidon Hades Leto

2016 SAP SE or an SAP affiliate company. All rights reserved. Public 34


Graph modeler

SELECT * FROM GET_SHORTEST_PATHS


When retrieving data from a calculation view, ORDER BY "WEIGHT"
the graph algorithm is executed. WITH PARAMETERS (
'placeholder' = ('$start$', ['zeus']),
'placeholder' = ('$level$', '5'));

With a calculation view, a graph node can be


used which triggers a graph algorithm

2016 SAP SE or an SAP affiliate company. All rights reserved. Public 35


Series

Public
Series data

SAP HANA provides native support for series data


Store and generate series data
SQL integration for query processing
Detect and correct errors or anomalies
Horizontal aggregation/disaggregation (e.g. hourly to daily)
Series analysis (similarity, regression, smoothing, binning etc.)
Benefits
Efficient, scalable storage of series data
Simple and concise SQL interface
Optimized series algorithms
Seamless integration into existing database

2016 SAP SE or an SAP affiliate company. All rights reserved. Public 37


Series table

CREATE COLUMN TABLE "WEATHER"(


"STATION_ID varchar(3) not null references "WEATHER_STATION",
"DATE date not null,
"MAXTEMP decimal(3,1),
primary key("STATION_ID", "DATE")
) SERIES (
SERIES KEY("STATION_ID")
EQUIDISTANT INCREMENT BY 1 DAY MISSING ELEMENTS NOT ALLOWED
PERIOD FOR SERIES ("DATE", NULL)
);

2016 SAP SE or an SAP affiliate company. All rights reserved. Public 38


Series data functions

Functions that make it easier to manipulate series data


SERIES_GENERATE Generate a complete series

SERIES_DISAGGREGATE Move from coarse units (day) to finer (hour)

SERIES_ROUND Convert a single value to a coarser resolution

SERIES_PERIOD_TO_ELEMENT Convert a timestamp in a series to its offset from start

SERIES_ELEMENT_TO_PERIOD Convert an integer to the associated period

2016 SAP SE or an SAP affiliate company. All rights reserved. Public 39


Analytical functions

Functions for analyzing series data:


LINEAR_APPROX Replace NULL values by interpolating adjacent non-NULL values

CUBIC_SPLINE_APPROX Replace NULL values by interpolating adjacent non-NULL values

CORR Pearson product-moment correlation coefficient

CORR_SPEARMAN Spearman rank correlation

DFT Compute the discrete Fourier transform

MEDIAN

AUTO_CORR Correlation of a (sub-)series with itself at varying lags

2016 SAP SE or an SAP affiliate company. All rights reserved. Public 40


Advanced Analytics

Public
DMM271 (H2)

Advanced Analytics i Introduction to Predictive Modeling and


Application Deployment for SAP HANA

BA101 (L1)

SAP HANA provides in-database data mining


Application Function Library (AFL) contains packages
for data mining and predictive analysis, e.g. Predictive
Analysis Library (PAL)
Native algorithms for advanced analysis
In-database processing for fast results
Support for common data mining tasks like clustering,
classification, association, time series etc.
R integration for SAP HANA
use the R open source environment in context of
SAP HANA
R integration via fast, parallelized connection
R script is embedded within SAP HANA SQL Script

2016 SAP SE or an SAP affiliate company. All rights reserved. Public 42


Advanced Analytics

SAP Predictive
SAP applications
Analytics

SAP HANA Platform


Text Analysis, R
Spatial Text Mining Application Function Library
APL, BFL, PAL, UDF, OFL,
Rules Engine Graph etc.

SAP HANA Studio &


Application
Smart Data Smart Data
Function Modeler
Event Stream Processing
Access Embedded Predictive
Integration
Integration Services

Location Machine
Transaction Text Other
Data Data

2016 SAP SE or an SAP affiliate company. All rights reserved. Public 43


Advanced Analytics
Predictive Analysis Library (PAL)

SAP HANA In-Memory Predictive Analytics


SAP HANA Platform
SAP HANA embeds multiple advanced analytics function libraries,
optimized for massive parallel in-memory processing Predictive Analysis Library

Predictive Analytics Library


Core of numerous powerful, native predictive algorithms for in-database & in-
memory processing that fully exploit the power of SAP HANA, resulting in
quicker insight and faster implementations
Content and Usage
The library includes common as well as specialized algorithms targeting
various data mining and machine learning areas
Leveraged and embedded in native SAP applications and usage from within
SAP HANA development tools as well as SAP Predictive Analytics
Scenarios & Use Cases
continuous growth and enhancements
Various LoB / industry scenarios making use of Association Analysis, Time
Series Forecasting, Link Prediction, Predictive Modeling, etc.

2016 SAP SE or an SAP affiliate company. All rights reserved. Public 44


Advanced Analytics
Predictive Analysis Library (PAL)
Association Analysis Cluster Analysis Probability Distribution Statistic Functions
Apriori ABC Classification Distribution Fit (Univariate)
Apriori Lite DBSCAN Cumulative Distribution Function Mean, Median, Variance, Standard
FP-Growth K-Means Quantile Function Deviation
KORD Top K Rule Discovery K-Medoid Clustering Kaplan-Meier Survival Analysis Kurtosis
K-Medians Skewness
Classification Analysis Kohonen Self Organized Maps Outlier Detection
CART Agglomerate Hierarchical Inter-Quartile Range Test Statistic Functions
C4.5 Decision Tree Analysis Affinity Propagation (Tukeys Test) (Multivariate)
CHAID Decision Tree Analysis Latent Dirichlet Allocation (LDA) Variance Test Covariance Matrix
K Nearest Neighbor Gaussian Mixture Model (GMM) Anomaly Detection Pearson Correlations Matrix
Logistic Regression (incl. SGD) Cluster Assignment Grubbs Outlier Test Chi-squared Tests:
Neural Network Test of Quality of Fit
Nave Bayes Time Series Analysis Link Prediction Test of Independence
Random Forest Single/Double/Triple Exponential Common Neighbors F-test (variance equal test)
Support Vector Machine Smoothing Jaccards Coefficient
Parameter Selection / Model Forecast Smoothing Adamic/Adar Other
Evaluation ARIMA/ Seasonal ARIMA Katz Weighted Scores Table
Confusion Matrix, Area Under Curve Brown Exponential Smoothing Substitute Missing Values
Croston Method Data Preparation
Regression Linear Regression with Damped Trend Sampling, Random Distribution S.
Multiple Linear Regression and Seasonal Adjust Binning
Polynomial Regression Forecast Accuracy Measures, Scaling
Exponential Regression Test for White Noise, Trend, Seasonality Partitioning
Bi-Variate Geometric Regression Principal Component Analysis (PCA)
Bi-Variate Logarithmic Regression

2016 SAP SE or an SAP affiliate company. All rights reserved. Public 45


Demo

Subtitle/name of demo here

Public
SAP TechEd Online

Continue your SAP TechEd


education after the event!
Access replays of
Keynotes
Demo Jam
SAP TechEd live interviews
Select lecture sessions
Hands-on sessions

2016 SAP SE or an SAP affiliate company. All rights reserved. Public 47


Further information

Related SAP TechEd sessions:


DMM212 - SAP HANA Graph Processing: Information and Demonstration (L1)
DMM270 - Spatial Analytics with SAP HANA (H2)
DMM271 - Introduction to Predictive Modeling and Application Deployment for SAP HANA (H2)

SAP Public Web


scn.sap.com
www.sap.com

SAP Education and Certification Opportunities


www.sap.com/education

Watch SAP TechEd Online


www.sapteched.com/online

2016 SAP SE or an SAP affiliate company. All rights reserved. Public 48


Feedback
Contact information:
Please complete your
Markus Fath
session evaluation for markus.fath@sap.com

DMM117.

2016 SAP SE or an SAP affiliate company. All rights reserved. Public 49


2016 SAP SE or an SAP affiliate company. All rights reserved.

No part of this publication may be reproduced or transmitted in any form or for any purpose without the express permission of SAP SE or an SAP affiliate company.

SAP and other SAP products and services mentioned herein as well as their respective logos are trademarks or registered trademarks of SAP SE (or an SAP affiliate
company) in Germany and other countries. Please see http://www.sap.com/corporate-en/about/legal/copyright/index.html for additional trademark information and notices.

Some software products marketed by SAP SE and its distributors contain proprietary software components of other software vendors.

National product specifications may vary.

These materials are provided by SAP SE or an SAP affiliate company for informational purposes only, without representation or warranty of any kind, and SAP SE or its
affiliated companies shall not be liable for errors or omissions with respect to the materials. The only warranties for SAP SE or SAP affiliate company products and
services are those that are set forth in the express warranty statements accompanying such products and services, if any. Nothing herein should be construed as
constituting an additional warranty.

In particular, SAP SE or its affiliated companies have no obligation to pursue any course of business outlined in this document or any related presentation, or to develop
or release any functionality mentioned therein. This document, or any related presentation, and SAP SEs or its affiliated companies strategy and possible future
developments, products, and/or platform directions and functionality are all subject to change and may be changed by SAP SE or its affiliated companies at any time
for any reason without notice. The information in this document is not a commitment, promise, or legal obligation to deliver any material, code, or functionality. All forward-
looking statements are subject to various risks and uncertainties that could cause actual results to differ materially from expectations. Readers are cautioned not to place
undue reliance on these forward-looking statements, which speak only as of their dates, and they should not be relied upon in making purchasing decisions.

2016 SAP SE or an SAP affiliate company. All rights reserved. Public 50