Professional Documents
Culture Documents
with Hadoop
Fern Halper
@fhalper
TDWI Research Director for Advanced Analytics
April 17, 2014
Sponsor
3
Speakers
Fern Halper
Research Director for
Advanced Analytics,
TDWI
Tapan Patel
Product Marketing Manager,
SAS
Agenda
The evolving big data ecosystem
Status of big data, analytics,and hadoop
Considerations for getting started
4
New TDWI Checklist
Free to download
http://tdwi.org/rese
arch/list/tdwi-
checklist-
reports.aspx
An evolving ecosystem
6
Hadoop
Big data
Advanced
Analytics
in-memory
Examining the pieces: Big Data
7
Social
M2M/IoT
Text
Mobile/Location
Volume
Formats
70% of those respondents
using or currently using predictive
analytics are utilizing big data
(source: TDWI Predictive Analytics Best Practices Report, 2014)
8
Examining the pieces: Analytics
The Analytics Spectrum
Excel
Dashboards
and Reports
Other BI Visualization
Advanced
Analytics
9
Advanced Analytics
10
Advanced analytics provides algorithms for
complex analysis of either structured or unstructured
data. It includes sophisticated statistical models,
machine learning, text analytics, advanced
visualization, and other advanced
data mining techniques.
Examining the pieces: Hadoop
HDFS/MapReduce
Schema on read
Ecosystem of tools
Commercial distributions
11
In-memory analytics
Performance
Interactivity
12
Status: Evolving architectures
13
Source: (TDWI Evolving Data Warehouse Architectures In the Age of Big Data, 2014) n=1688 responses
What technical issues or practices are driving change in your DW architecture?
Select all that apply.
Status: Big data pieces
14
Status: Analytics pieces
15
Considerations
16
Defining the problem
Data preparation
Analyzing the data
Making it work (i.e., the team)
Governance
Data preparation
ETL vs. ELT
Data quality
Metadata
17
Data exploration
18
Query
Visualization
Descriptive statistics
Analysis
19
Data mining
Supervised
Unsupervised
Other analytics
Operationalize
20
Business process
In-database scoring
Skills
21
Computing
Analytic modeling
Creative thinker
Communicator
Big Data:
The Big Data Maturity Model
22
Poll Question
Are you making use of Hadoop for advanced
analytics
Yes
No, but were thinking about it
No, and no plans to do so
Dont know
23
Copyr i ght 2014, SAS I nst i t ut e I nc. Al l r i ght s r eser ved.
UTILIZING BIG DATA ANALYTICS
WITH HADOOP
TAPAN PATEL, PRODUCT MARKETING MANAGER, SAS
Copyr i ght 2014, SAS I nst i t ut e I nc. Al l r i ght s r eser ved.
DATA TO DECISION LIFECYCLE
TEXT
COMPETITIVE
ADVANTAGE
PREPARE
DATA
E
X
P
L
O
R
E
D
A
T
A
DEVELOP
MODELS
D
E
P
L
O
Y
&
M
O
N
I
T
O
R
Copyr i ght 2014, SAS I nst i t ut e I nc. Al l r i ght s r eser ved.
ACCESS TO HADOOP
HADOOP
Hive QL
SAS
SERVER
Push some of SAS processing to Hadoop
1
Key Offerings:
SAS/Access to Hadoop
SAS/Access to Cloudera Impala
Copyr i ght 2014, SAS I nst i t ut e I nc. Al l r i ght s r eser ved.
EMBEDDED PROCESS FRAMEWORK
HADOOP
SAS Data Step
& DS2
SAS
SERVER
Push SAS processing to Hadoop with MapReduce 2
Key Offerings:
SAS Scoring Accelerator for Hadoop
SAS Data Quality Accelerator for Hadoop
SAS Code Accelerator for Hadoop
SAS Data Management
Copyr i ght 2014, SAS I nst i t ut e I nc. Al l r i ght s r eser ved.
SAS
LASR ANALYTIC
SERVER
SAS
IN-MEMORY
SAS
IN-MEMORY
SAS
IN-MEMORY
SAS
IN-MEMORY
SAS
IN-MEMORY
HADOOP WEB CLIENTS APPLICATIONS
ERP
SCM
CRM
Images
Audio
and Video
Machine
Logs
Text
f
Web and
Social
Data Discovery and
Visualization
Statistics and
Predictive Analytics
Data Management
Text Analytics
Copyr i ght 2014, SAS I nst i t ut e I nc. Al l r i ght s r eser ved.
SAS
VISUAL
STATISTICS
INTERACTIVE PREDICTIVE ANALYTICS
EXPLORE AND
DISCOVER
PREDICT AND
REFINE
DEPLOY AND
MONITOR
Copyr i ght 2014, SAS I nst i t ut e I nc. Al l r i ght s r eser ved.
SAS
VISUAL
STATISTICS
INTERACTIVE PREDICTIVE ANALYTICS
Copyr i ght 2014, SAS I nst i t ut e I nc. Al l r i ght s r eser ved.
SAS
IN-MEMORY
STATISTICS FOR
HADOOP
WHAT IS IT
Provides a single interactive programming environment
for Hadoop to perform:
analytical data manipulation
variable transformations
exploratory analysis
statistical modeling and machine learning
integrated modeling comparison and scoring
Takes advantage of distributed in-memory computing
optimized for analytical workloads
TEXT
MANIPULATE
DATA
E
X
P
L
O
R
E
D
A
T
A
DEVELOP
MODELS
S
C
O
R
E
Copyr i ght 2014, SAS I nst i t ut e I nc. Al l r i ght s r eser ved.
SAS