You are on page 1of 37

IBM Big Data Platform


Martin Pavlk
+420 731 435 691

January 2013 2013 IBM Corporation

Big Data is a Hot Topic Because Technology Makes it
Possible to Analyze ALL Available Data
Cost effectively manage and analyze
all available data in its native form
unstructured, structured, streaming

Website Social Media

ERP Network Switches
2 2012 IBM Corporation
BIG DATA is not just HADOOP
Understand and navigate
Federated Discovery and Navigation
federated big data sources

Manage & store huge Hadoop File System

volume of any data MapReduce

Structure and control data Data Warehousing

Manage streaming data Stream Computing

Analyze unstructured data Text Analytics Engine

Integrate and govern all Integration, Data Quality, Security,

data sources Lifecycle Management, MDM

3 2012 IBM Corporation

Business-Centric Big Data Enables You to Start With a Critical Business
Pain and Expand the Foundation for Future Requirements

Big data isnt just a technology

its a business strategy for
capitalizing on information

Getting started is crucial

Success at each entry point is

accelerated by products within
the Big Data platform

Build the foundation for future

requirements by expanding
further into the big data platform

4 2012 IBM Corporation

1 Unlock Big Data
Customer need
Understand existing data sources

Search and navigate data within

existing systems

No copying of data

Value statement
Get up and running quickly

Discover and retrieve big data

Work even with big data sources by

business users

Vivisimo Velocity renamed to
IBM InfoSphere DataDiscovery

5 2012 IBM Corporation

2 Analyze Raw Data
Customer need
Ingest data as-is into Hadoop
Combine it with data from DWH

Process very large volume of data

Value statement
Gain new insight

Overcome the high cost of converting

data from unstructured to structured

Experiment with analysis on different

data and combine them with other

IBM InfoSphere BigInsights

6 2012 IBM Corporation

Merging the Traditional and Big Data Approaches
Traditional Approach Big Data Approach
Structured & Repeatable Analysis Iterative & Exploratory Analysis

Business Users
Delivers a platform to
Determine what enable creative
question to ask discovery

IT Business
Structures the Explores what
data to answer questions could be
that question asked

Monthly sales reports Brand sentiment

Profitability analysis Product strategy
Customer surveys Maximum asset utilization

7 2012 IBM Corporation

InfoSphere BigInsights is more than just HADOOP

IBM InfoSphere Big Insights

Is much more than

IBM Big data platform

Includes much more than
IBM InfoSphere Big

8 2012 IBM Corporation

Open-source software framework from Apache
Inspired by
Google MapReduce
GFS (Google File System)


9 2012 IBM Corporation

InfoSphere BigInsights
Can run also on top of
Platform for volume, variety,
Enhanced Hadoop
foundation Enterprise Edition
Analytics Application accelerators
Text analytics & tooling Pre-built applications
Application accelerators Text analytics
Spreadsheet-style tool
Usability RDBMS, warehouse connectivity
Web console Basic Edition Administrative tools, security

Spreadsheet-style tool Enterprise class Free download

Eclipse development tools
Performance enhancements
Ready-made apps Integrated install ....
Online InfoCenter
Enterprise Class Apache
BigData Univ.
Storage, security, cluster Hadoop

Connectivity to Netezza,
DB2, JDBC databases, etc
Breadth of capabilities
10 2012 IBM Corporation
Spreadsheet-style Analysis
Web-based analysis
and visualization

Define and manage
long running data
collection jobs

Analyze content of the

text on the pages that
have been retrieved

11 2012 IBM Corporation

Build a Big Data Program MapReduce example
Eclipse tools
For Jaql, Hive, Pig Java MapReduce, BigSheets
plug-ins, text analytics, etc.

12 2012 IBM Corporation

JAQL IBMs programming language in hadoop world
Jaql is a complete solutions environment supporting all other
BigInsights components
Integration point for
various analytics

Ad-Hoc analysis
BigInsights Text

DB2, Netezza,


(R module)
Text analytics


Statistical analysis
Machine learning
Ad-hoc analysis
Integration point for Jaql Core Jaql
Jaql I/O
various data sources Operators Modules

Local and distributed

file systems
NoSQL data bases DFS NoSQL RDBMS
Content repositories System

Relational sources
operational data bases)
13 2012 IBM Corporation
BigInsights and the data warehouse Traditional
Big Data
applications Data warehouse


Filter Transform Aggregate

14 2012 IBM Corporation

3 Simplify your warehouse
Make performance of DWH better
Reduce DWH administration costs

Value statement
Speed: 10 100x better performance
Simplicity: Administration costs reduced by 75% - 90%
Smart system
In-database analytics
Out-of-the box integration with SPSS

IBM Netezza renamed to
PureData System for Analytics

15 2012 IBM Corporation

OK. We have to evaluate a lot of
statistics, set the correct db indexes
I need to evaluate the possible and db partitioning. It will take us 5
relationship between client salary and days.


16 2012 IBM Corporation

Great. Thanks a lot. Done. You can run your analytical
Im going to check the results. query.

Analyst IT

17 2012 IBM Corporation

Great. I can see here some nice Noooo!!! Ohhh, welcome dear friend.
correlations. Now I need to Its
atpossible to work Understand. So, its .
here! another 5 days of our work
it from the different perspective.

Analyst IT

18 2012 IBM Corporation

And now with Netezza ...

19 2012 IBM Corporation

I need to evaluate the possible
relationship between client salary and
I will use Netezza.


20 2012 IBM Corporation

Great. I can see here some nice correlations.
Now I need to look at it from the different
With Netezza I can run the query immediately.
The response will be in the same time

Analyst IT

IT can do something else

much more useful

21 2012 IBM Corporation

22 2012 IBM Corporation
Built-In Expertise Makes This as Simple as an Appliance

Dedicated device
Optimized for purpose
Complete solution
Fast installation
Very easy operation
Standard interfaces
Low cost

23 2012 IBM Corporation

In October 2012

IBM Netezza was renamed to IBM PureData System for Analytics

24 2012 IBM Corporation

Genesis in T-Mobile CZ

Proof-Of-Concept Project
New EnterpriseDataWarehouse platform selection
Comparison of existing and other platforms

Selection Criteria
Operational Savings

.and the winner was: Netezza

25 2012 IBM Corporation

Netezza Genesis in T-Mobile CZ
Significant response improvement:
Faster platform means better reports response

Direct Data Availability

Higher trust in data , one version of truth
Aggregation reduction
Any attribute available

Operational Benefits
Storage savings (no data replicas)
Administration costs reduction(DBA)

Infrastructure Simplification
Lower environment complexity

26 2012 IBM Corporation

Netezza Genesis in T-Mobile CZ
Project Implementation
EDW platform migration
Netezza platform implementation
ETL graphs/processes redesign

BI Front-End Tool Migration

SAP Business Object implementation
All reports redesign

Main Integration Partner: T-System CZ

27 2012 IBM Corporation

Netezza Genesis in T-Mobile CZ
Actual Status
All relevant ETL procecessing redesigned

Actual parallel run to Original and Netezza platform finished

Netezza as only primary platform

28 2012 IBM Corporation

Real Netezza experience from T-Mobile Czech Rep.

Original Netezza
Workflow Reporting 2 hours 1 minute

Invoicing and Payments reporting

Payment discipline of current month invoices 33 minutes 17 seconds

Overdue Debt of Invoices in Current Month 10 hours 23 seconds

Average Monthly Invoice Figures 50 minutes 38 seconds


29 2012 IBM Corporation
4 Reduce costs with Hadoop
Too much data => Too expensive to store and to maintain
Big portion is used just in case
Data amount is still growing => its more expensive

=> too expensive to have all data in standard DWH

Value statement
Leverage the architecture of parallel processing in Hadoop

Hadoop uses cheap commodity HW

Enable business users still work in the same or similar way

IBM InfoSphere BigInsights

30 2012 IBM Corporation

BigInsights and the data warehouse
Traditional Big Data
analytic analytic
From Cognos BI
tools via Hive JDBC applications


Query-ready archive for cold warehouse data

Data Warehouse

31 2012 IBM Corporation

Future: The SQL interface . . . .
Rich SQL query capabilities Application
SQL '92 and 2011 features
Correlated subqueries SQL Language
Windowed aggregates
JDBC / ODBC Driver

SQL access to all data stored in

InfoSphere BigInsights JDBC / ODBC Server

Robust JDBC/ODBC support SQL interface Engine

Take advantage of key features

of each data source
Data Sources
Leverage MapReduce
achieving low-latency HiveTables HBase CSV Files

InfoSphere BigInsights

34 2012 IBM Corporation

5 Analyze Streaming Data
Customer need
Process and leverage streaming data

Select valuable data from data stream for

future processing

Quickly process data going to be useless

if its not processed immediately

Value statement
React in real-time to take an oppurtinity
before it expires

Periodically adjust streaming models

based on analysis on data at rest

IBM InfoSphere Streams

35 2012 IBM Corporation

Why and when to use InfoSphere Streams?
Applications needing on-fly processing, filtering and analyzing streaming data
Environmental, Industrial, GPS,
Images, Videos,
Network data
Data Exhaust
system logs (web server, app server),
Financial transactions
High-rate transaction data

At least 2 criteria from the list bellow should be fulfilled

Processing in isolation
or in limited windows (time / nr. Of records)

Non-traditional formats included Spatial data, images, text, voice,

Different connection methods

Integration challenges Different data rates
Different processing requirements

Multiple processing nodes Volume / rate very high => scalability required

Sub-millisecond latency Immediate analysis and response

Store & mine approach doesnt work Because of very high volume of data (and its rates)

36 2012 IBM Corporation

Streams and BigInsights - Integrated Analytics on Data in
Motion & Data at Rest
Visualization of real-
time and historical

Data Integration,
data mining,
InfoSphere machine learning,
statistical modeling
1. Data Ingest
2. Bootstrap/Enrich BigInsights,
Database &
Data ingest,
online analysis,
model validation
3. Adaptive Analytics Model

38 2012 IBM Corporation

The Platform Advantage

Analytic Applications
BI / Exploration / Functional Industry Predictive Content
Reporting Visualization App App BI /
Analytics Analytics
Increase over By moving from entry to a 2nd
time and 3rd project
IBM Big Data Platform
Lowering Shared components Visualization Application Systems
deployment costs & Discovery Development Management

Points of leverage Shared text analytics for
Streams and BigInsights
Hadoop Stream Data
System Computing Warehouse
HDFS connectors (data
integration (ETL, ),

Build across multiple Information Integration & Governance

39 2012 IBM Corporation

IBM big data
IBM big data IBM big data IBM big data


IBM big data

IBM big data
IBM big data

IBM big data IBM big data IBM big data

40 2012 IBM Corporation