You are on page 1of 39

Efficient Data integration for

High-Performance Analytics
Karl Krycha,
EMEA SAS Teradata CoE, Teradata

Copyright © 2012, SAS Institute Inc. All rights reserved. #analytics2012

Agenda
Case Studies Big Analytics and HPA
In Database, In Memory, Social Media
Various Dimensions of “Big Data”
Large scale (data volume) analytics, Emerging new data types, New
(non-SQL) analytics
Big Data and High Performance Analytics
Motivation, Traditional v Big Analytics, Potential Use Cases
Big Data and High Performance Architecture
Integration Architecture Options, Hadoop and Aster, SAS
High-Performance Analytics
Summary

Copyright © 2012, SAS Institute Inc. All rights reserved. #analytics2012

Agenda
Case Studies Big Analytics and HPA
In Database, In Memory, Social Media
Various Dimensions of “Big Data”
Large scale (data volume) analytics, Emerging new data types, New
(non-SQL) analytics
Big Data and High Performance Analytics
Motivation, Traditional v Big Analytics, Potential Use Cases
Big Data and High Performance Architecture
Integration Architecture Options, Hadoop and Aster, SAS
High-Performance Analytics
Summary

Copyright © 2012, SAS Institute Inc. All rights reserved. #analytics2012

Bank of America was gathering enormous amounts of data but was not integrating it effectively in its existing Teradata warehouse (40 data models. SAS Institute Inc. All rights reserved.8 hours. expected to take 28 hours with the anticipated increase of transaction volume.  Existing process ran for about 14 hours. Introduced SAS-Teradata's in-database processing system to eliminate these problems. external data stores no longer required. Case Study Case Study Big Analytics In Database BoA cuts AML transaction processing time by ten hours  Following successive mergers. The process in fact now runs at only 4 hours.)  Centralised 100 terabyte SAS server that serviced all of the business using virtually every tool that SAS has available. #analytics2012 . Improved processing… Copyright © 2012.  The processing time for AML estimated to be reduced between 5 .  Informatica system that had been used to extract and load data from Teradata into SAS was no longer needed.

All rights reserved. Case Study Customer Case Study In Memory High-Performance Analytics Process 167 Hours … accelerated modeling ! DEVELOPMENT EXPLORATION DEPLOYMENT Bottom-line Impact: MODEL MODEL DATA Tens of Millions of Dollars SAS In-Memory Analytics for Teradata delivered 84 SECONDS game changing results! Copyright © 2012. #analytics2012 . SAS Institute Inc.

#analytics2012 . Complementary Offerings Options to manage the entire analytical process In-Database In-Memory “Bring SAS analytics to the data” “Accelerate SAS analytics with MPP technology” • Minimizes data replication • Moves analytic dataset • Minimize data movement to dedicated high- performance analytic • Leverage TD DB and MPP capability sandbox for improved performance • Gives analyst full • Data preparation and data exploration platform control in-database • ‘In-database’ scoring for SAS models • Leverages ‘in-memory‟ MPP execution speed • Supports select SAS procedures • Supports select SAS procedures • Supports complex advanced analytics • Speed up model development phase • Complex models on large datasets Copyright © 2012. All rights reserved. SAS Institute Inc.

#analytics2012 . SAS Institute Inc. All rights reserved. EMEA SAS Teradata COE Lead Copyright © 2012. SAS Analytics for Teradata Process Evolution Reduced Time-to-Intelligence Nicolas Adamek.

Emerging new data types. Hadoop and Aster. SAS High-Performance Analytics Summary Copyright © 2012. Potential Use Cases Big Data and High Performance Architecture Integration Architecture Options. Social Media Various Dimensions of “Big Data” Large scale (data volume) analytics.Agenda Case Studies Big Analytics and HPA In Database. SAS Institute Inc. New (non-SQL) analytics Big Data and High Performance Analytics Motivation. #analytics2012 . In Memory. Traditional v Big Analytics. All rights reserved.

#analytics2012 . SAS Institute Inc.What is Big Data? Copyright © 2012. All rights reserved.

All rights reserved. SAS Institute Inc. #analytics2012 . What is Big Data?  Big Data = Large scale (data volume) analytics Copyright © 2012.

Teradata has been the leader in large scale SQL analytics with over 16 customers with a Petabyte or more of data. Copyright © 2012. SAS Institute Inc. All rights reserved. What is Big Data?  Big Data = Large scale (data volume) analytics  MPP SQL databases have delivered large scale analytics for over a decade. #analytics2012 .

All rights reserved. Copyright © 2012. SAS Institute Inc.What is Big Data? Growing Data Volumes It„s growing. And it„s everywhere. Quickly. #analytics2012 .

Teradata has been the leader in large scale SQL analytics with over 16 customers with a Petabyte or more of data. What is Big Data?  Big Data = Large scale (data volume) analytics  MPP SQL databases have delivered large scale analytics for over a decade. #analytics2012 . SAS Institute Inc. All rights reserved.  Big Data = Emerging new data types Copyright © 2012.

SAS Institute Inc. Teradata has been the leader in large scale SQL analytics with over 16 customers with a Petabyte or more of data. Copyright © 2012. sensor networks. All rights reserved. text. #analytics2012 .  Big Data = Emerging new data types  New multi-structured data types with unknown relationships that require processing of data regardless of size to discover insights. What is Big Data?  Big Data = Large scale (data volume) analytics  MPP SQL databases have delivered large scale analytics for over a decade. Examples include web logs. social networks.

All rights reserved. SAS Institute Inc.What is Big Data? New kinds of data Structured data vs. unstructured data growth Copyright © 2012. #analytics2012 .

sensor networks. Examples include web logs. social networks.  Big Data = Emerging new data types  New multi-structured data types with unknown relationships that require processing of data regardless of size to discover insights.  Big Data = New (non-SQL) analytics Copyright © 2012. SAS Institute Inc. Teradata has been the leader in large scale SQL analytics with over 16 customers with a Petabyte or more of data. text. All rights reserved. #analytics2012 . What is Big Data?  Big Data = Large scale (data volume) analytics  MPP SQL databases have delivered large scale analytics for over a decade.

social networks. Teradata has been the leader in large scale SQL analytics with over 16 customers with a Petabyte or more of data.  Big Data = Emerging new data types  New multi-structured data types with unknown relationships that require processing of data regardless of size to discover insights. #analytics2012 . Leveraging the power of MapReduce: Teradata SQL MapReduce. SAS MapReduce (SAS/ACCESS to HADOOP) Copyright © 2012. SAS Institute Inc. text.  Big Data = New (non-SQL) analytics  New Analytic Frameworks that provide parallel processing on semi- structured data. Examples include web logs. All rights reserved. What is Big Data?  Big Data = Large scale (data volume) analytics  MPP SQL databases have delivered large scale analytics for over a decade. sensor networks.

daily) data/text mining. near-real-time. pre-built functions) analysis Analyse Validity Quality Level Data Volatility Variety Generation rate Structured Update rate Volume Multi-Structured Accumulation rate Source: BI Research 2012 Copyright © 2012. Concurrent data acquisition & intra-day. (predictive. The Many Dimensions of Big Data Workload Agility Analytic Complexity Workload Complexity Data & analysis latency Analytic capabilities used Query mix (real-time. advanced statistics. All rights reserved. #analytics2012 . SAS Institute Inc.

Complexity) • New multi-structured data types with unknown relationships that require processing of data regardless of size to discover insights  Big Analytics . Velocity. All rights reserved. SAS Institute Inc.New Non SQL analytics • Leveraging the power of MapReduce for new methods for efficiently analyzing data 6/18/2012 19 Teradata Confidential Copyright © 2012. #analytics2012 . Key Points Big Data  Large scale SQL analytics (Volume) • Teradata has over 25 customers in Petabyte club  Emerging new data types (Variety.

#analytics2012 . SAS Institute Inc. Social Media Various Dimensions of “Big Data” Large scale (data volume) analytics. Potential Use Cases Big Data and High Performance Architecture Integration Architecture Options. All rights reserved. SAS High-Performance Analytics Summary Copyright © 2012. Hadoop and Aster.Agenda Case Studies Big Analytics and HPA In Database. Emerging new data types. New (non-SQL) analytics Big Data and High Performance Analytics Motivation. In Memory. Traditional v Big Analytics.

SAS Institute Inc.Big Data Analytics Do we really need Big Data?  For consumer  Better understanding of own behavior  Integration of activities  Gamification – turn behavior into enjoyment  Influence – involvement and recognition  For companies  Real behavior – what do people do. #analytics2012 . All rights reserved. and what do they value?  Faster interaction  Better targeted offers  Customer understanding Copyright © 2012.

Big Data Analytics Potential Use Cases for Big Data Analytics Source: IDC Copyright © 2012. SAS Institute Inc. All rights reserved. #analytics2012 .

Customer Interaction Graphs: Social Network Connections . New Capabilities: Merge the BI and Data Scientist Worlds .e. Micro-transactions: Financial Services Electronic. MapReduce) . Machine Data: Click Stream Files. Electric Grid Data 2. Iterative analysis of data (data exploration and investigative analytics) . #analytics2012 . Text analysis 3.g. New Data: Relational plus new non-relational data sources . All rights reserved.Big Data Analytics New Capabilities New data + new analysis = new capabilities 1. Data Scientist/ Data Ninja/ Analytics Developers /Quants . MapReduce Copyright © 2012. High Performance Analytics . SAS Institute Inc. New Analysis: Requiring more than SQL (i. Sensor Data: Telecommunications Network Data Records. Embrace new analytics techniques e. Mobile Transactions . System Log Files . On-the-fly Pattern matching and path analysis . Graph analysis .

Unstructured data may be an input to analytics take place an analytic process. Big Data Analytics Unstructured Data is Not Analysed  Data is prepared and …the fact is that virtually no analytics structure applied before directly analyze unstructured data. the unstructured data itself isn’t utilized.Bill Franks Scoring International Institute for Analytics  “Big Data” is about more than ability to store data  The ability to quickly structure and analyse data is required to gain value. All rights reserved.  Sentiment Analysis & Word . SAS Institute Inc. #analytics2012 . but when it comes  Fingerprints & Polygons time to do any actual analysis. Copyright © 2012.

SAS Institute Inc. Bill Franks. The analysis that big data enables will lead to decisions that are more informed and. government. All rights reserved. machine generated data.Key Points Big Data Analytics  What is Big Data  Big Data is challenging our current pattern of thought  Cost effective computing and storage  Everything can be stored  Cheap large scale computing power readily available  Data explosion: Data everywhere. Taming The Big Data Tidal Wave Copyright © 2012. structured. and academia. …  Big Data Analytics  Big data is the next wave of new data sources that will drive analytic innovation in business. different from what they are today. unstructured. in some cases. #analytics2012 . semi-structured. geo-location data.

In Memory. Traditional v Big Analytics. SAS Institute Inc. New (non-SQL) analytics Big Data and High Performance Analytics Motivation. Potential Use Cases Big Data and High Performance Architecture Integration Architecture Options. SAS High-Performance Analytics Summary Copyright © 2012. #analytics2012 . Emerging new data types. Hadoop and Aster.Agenda Case Studies Big Analytics and HPA In Database. Social Media Various Dimensions of “Big Data” Large scale (data volume) analytics. All rights reserved.

#analytics2012 . All rights reserved.Integrated Data Warehouse Our Preferred. SAS Institute Inc. Advocated Solution Integrated Data Lab enables rapid experimentation of new data Teradata Data Lab – integral part of Analytic Advantage Program Viewpoint Portlets that enables Data Labs (sandboxing) Copyright © 2012.

. #analytics2012 .However. this integrated DWH approach needs constantly to be extended and improved. Copyright © 2012. All rights reserved. SAS Institute Inc.

All rights reserved. SAS Institute Inc. #analytics2012 .New Business opportunities Social Media integration  Brand understanding: “Do people like me?”  Market understanding: “What are the hot topics?”  Influencer analysis: “Who is important?”  Social network analysis  Add context to customer information: “What drives actions?”  Data mining  Customer segmentation  Service led social media strategy “Help me”  Marketing social media strategy “  Creating an interaction framework Copyright © 2012.

followers & likes  Typical reporting tools provide basics w/no context Hardest: Knowing when top social Easy: influencers come to site. SAS Institute Inc. All rights reserved.Growing marketing capabilities Social media integration  Marketers need solid metrics that are meaningful  Data needs to be analyzed. Direct sales show customized from messaging to encourage Facebook evangelism Hard: Harder: Knowing sentiment of posts and Conversion rate from people in various responding quickly social channels Copyright © 2012. #analytics2012 . not just reported  Not just # of fans.

SAS Institute Inc. #analytics2012 . All rights reserved.Social media integration Big Data Architecture Complementary Technologies – One Vision Enterprise Transactional Data Sources Marketing Channels Direct Mail Retailer Integrated Data Warehouse Partner/Dist Email Web Responses Copyright © 2012.

#analytics2012 .g. SAS-Aster Data Integration Results SAS/ACCESS for Aster nCluster (e. SAS Institute Inc. fraud detection) • Transparent connection between Aster Data SAS and Aster Data Analytic SAS Platform • Makes MapReduce processing SQL Queries easily accessible to SAS Base SAS on Big Data developers •Enable SAS system to access big data sets •High performance bulk Aster Load Utility support •Seamlessly integrate SAS programs against Aster SAS Scoring Accelerator for Aster Aster Data SAS Models nCluster SAS • Push down and process SAS Enterprise Miner models inside Fast Scoring for SAS Enterprise Miner Aster Data • Native SAS parallelization for fast scaling and high performance • Currently in Limited Availability • Faster data mining process • Lower IT and development costs Copyright © 2012. All rights reserved.

#analytics2012 . Teradata SAS Analytic Process Flow In database / High Performance Analytics Data Data Model Model Model Understanding Preparation Development Deployment Execution Model Manager 3 Enterprise Miner Scoring Accelerator 1 2 3a 4 Data Set Builder for Data Set Builder for Analytics Accelerator Analytics Accelerator 5 Teradata SQL SAS SAS Scoring ORDER ORDER NUMBER ADS ORDER STATUS D ORDER A ITEM BACKORDERED TQUANTITY CUSTOMER E CUSTOMER NUMBER CUSTOMER NAME CUSTOMER CITYORDER ITEM SHIPPED QUANTITY CUSTOMER POSTSHIP DATE CUSTOMER ST CUSTOMER ADDR CUSTOMER PHONE CUSTOMER FAX ITEM Modeling ADS QUANTITY DESCRIPTION High Performance Analytics 3b Copyright © 2012. SAS Institute Inc. All rights reserved.

#analytics2012 . All rights reserved. SAS Institute Inc.Teradata Appliance for SAS HPA Teradata Ecosystem Appendix Slides • Teradata appliance running SAS in-memory – Not a data warehouse • Focused on model development – Not data prep or scoring • Built for unique SAS analytic modeling – High volumes and performance • Integrated in the Teradata Analytical Ecosystem – Key differentiator Copyright © 2012.

and model scoring • Intended to be stand alone. SAS Institute Inc. model development. All rights reserved. Teradata Appliance for SAS High-Performance Analytics . dedicated system. not the EDW or mixed workload data mart • Orders-of-magnitude performance gains by leveraging MPP architecture and executing in memory analytics from SAS in parallel Copyright © 2012. #analytics2012 .Model 700 • Purpose built appliance optimized specifically for SAS In-Memory Analytics • Executes SAS HPA on a Teradata appliance and is co-resident with the database  Utilizes Teradata for data storage and management and to supply data to the HPA routines  Leverages SAS In-Memory Procedures for data analysis.

SAS Institute Inc. #analytics2012 . All rights reserved.Teradata Appliance for SAS High-Performance Analytics .Model 700 Model 700 SAS/STAT® Software SAS/ETS® Software Teradata Appliance for SAS High- SAS® Enterprise Miner™ Performance Analytics Software 36 Copyright © 2012.

SAS High-Performance Analytics Summary Copyright © 2012. Social Media Various Dimensions of “Big Data” Large scale (data volume) analytics. #analytics2012 . New (non-SQL) analytics Big Data and High Performance Analytics Motivation. SAS Institute Inc. In Memory. Traditional v Big Analytics. Potential Use Cases Big Data and High Performance Architecture Integration Architecture Options.Agenda Case Studies Big Analytics and HPA In Database. Emerging new data types. All rights reserved. Hadoop and Aster.

An organization‟s analytic professionals and their business sponsors must be sure to look for ways to deliver small. Summary  Business Approach Identify business processes that you could do more efficiently with the help of big data and high performance analytics1)  Deliver Value As You Go It will take a lot of effort to figure out how to apply a source of big data to your business. #analytics2012 . Taming The Big Data Tidal Wave Copyright © 2012. All rights reserved. SAS Institute Inc. quick win‟s as they go 2)  Analytical Ecosystem Acquire or grow the needed technology and analytical skills1) 1) Gartner 2) Bill Franks.

All rights reserved. SAS Institute Inc. THANK YOU ! Karl Krycha Managing Consultant Teradata EMEA Advanced Analytics PS COE Storchengasse 1 1150 Wien Austria karl. #analytics2012 .krycha@teradata.com Copyright © 2012.