Professional Documents
Culture Documents
2014
2011
Devices
& Machines
2007
Communities
& Society
1990s
Business
Ecosystems
1980s
BUSINESS
1960s-1970s
USERS
VALUE
Few
Employees
Back Office
Automation
Customers/
Consumers
Many
Employees
Front Office
Productivity
Line-of-Business
Self-Service
Social
Engagement
Real-Time
Optimization
E-Commerce
TECHNOLOGIES
OS/360
SOURCES
TECHNOLOGY
MAINFRAME
10 2
CLIENT-SERVER
10 4
WEB
10 6
CLOUD
10 7
SOCIAL
10 9
INTERNET
OF THINGS
10 11
Financial Services
Proactive Customer
Engagement,
Location Based
Services
Manufacturing
Public Sector
Connected Vehicle,
Predictive Maintenance
Predicting Patient
Outcomes,
Total Cost of Care
Drug Discovery
Health Insurance
Exchanges,
Public Safety,
Tax Optimization
Fraud Detection
Big Data
Project
Data
Integration
& Quality
Parse
ETL
Cleanse
Match
Analytics Teams
Relational, Mainframe
Load
Load
Data Warehouse
Documents and
Emails
Replicate
Services
Stream
Events
Archive
Topics
Analytics & Op
Dashboards
Mobile Apps
Machine Device,
Cloud
Alerts
Development
Deployment
Data
Virtualization
Cloud
Desktop
Server
Data
Federation
Embedded
DQ in apps
Embedded
data quality
in apps
Data
Integration
Hub
HADOOP
7
Online Analytical
Processing (OLAP) &
DW Appliances
Teradata
Redbrick
EssBase
Sybase IQ
Netezza
Exadata
High-Speed Data
Ingestion and
Extraction
HANA
Greenplum
DataAllegro
Asterdata
Vertica
Paraccel
Cloud
Business-IT
Collaboration
Unified Administration
Complex Data
Parsing on Hadoop
Web applications
Blogs
Discussion forums
Communities
Partner portals
PowerCenter
Big Data Edition
Salesforce.com
Concur
Google App Engine
Amazon
ETL on Hadoop
Clickstream
image/Text
Scientific
Genomoic/pharma
Medical
Medical/Device
Sensors/meters
RFID tags
CDR/mobile
affordable PowerCenter
developers to handle
data preparation
P&L Goals
Analyst
Prioritize
Goals
Generate
Insights
Data Scientist
Developer
Validate
Hypothesis
Business
Make
Operational
Inspire
Action
Big Data
Refine &
Enrich
Explore &
Curate
Distribute
& Manage
Business
Value
Data
Customer orders
Social data
Web logs
Market data
Information
Customers likely
to churn
Next best offers
Optimal channels
Optimal pricing
models
Value
Increase
customer loyalty
Build sustainable
relationships
Increase
marketing ROI
Increase market
share
Agile Analytics
Data Sources
Visualization
Data Ingestion
Transactions,
OLTP, OLAP
Batch Load
Advanced
Analytics
Machine
Learning
Data Management
Data
Integration &
Data Quality
Replication
Applications
Data Delivery
Data
Integration
Hub
Analytics & Op
Dashboards
Data
Governance
Virtual Data
Machine
Social Media, Web Logs
Change Data
Capture
Data Streaming
Machine Device,
Scientific
Mobile
Apps
Data Security
MDM /
PIM
Event-Based
Processing
Archiving
Data
Warehouse
Real-Time
Alerts
The Solution
The Result
Manage data integration
and load of 10+ billion
records from multiple
disparate data sources
Traditional Grid
Mainframe
RDBMS
EDW
Data Virtualization
DW
Business
Reports
DW
Unstructured
Data
The Solution
Profile
Parse
The Result
BI / Analytics
Visualization & Reporting
ETL
Comprehensive data
integration platform to
integrate large
volumes of data from
over 18+ systems
Ability to use existing
skill sets & make
them more productive
Lowest risk as
industry leader
The Challenge
Data increasing 20x every year with costs rising from $17K per
day to $50K per day within 6 months. Time to deliver information taking too long.
The Solution
Business
Reports
Traditional Grid
Transactions from
70 Data Centers
In-Store POS
Data
B2B Data
Exchange
Power Center
Big Data Edition
Expected Result
Data
Warehouse
172 TB
& Data
Validation
Reduce time to
deliver information to
business from 48
hours to 15 minutes
Operational Systems
OLTP
Analytical Systems
Data Products
Data
Warehouse
MDM
Transactions,
OLTP, OLAP
OLTP
Data
Mart
ODS
Documents,
Email
Machine Device,
Scientific
Access
& Ingest
Parse &
Prepare
Discover
& Profile
Transform
& Cleanse
Extract &
Deliver
Relational and
Flat Files
Mainframe
and Midrange
Unstructured
Data and Files
MPP Appliances
WebSphere MQ
JMS
MSMQ
SAP NetWeaver XI
Web Services
TIBCO
webMethods
Oracle
DB2 UDB
DB2/400
SQL Server
Sybase
Informix
Teradata
Netezza
ODBC
JDBC
ADABAS
Datacom
DB2
IDMS
IMS
VSAM
C-ISAM
Binary Flat Files
Tape Formats
Word, Excel
PDF
StarOffice
WordPerfect
Email (POP, IMPA)
HTTP
Pivotal
Vertica
Netezza
Flat files
ASCII reports
HTML
RPG
ANSI
LDAP
Teradata
Aster
JD Edwards
SAP NetWeaver
Lotus Notes
SAP NetWeaver BI
Oracle E-Business SAS
PeopleSoft
Siebel
Salesforce CRM
Force.com
RightNow
NetSuite
ADP
Hewitt
SAP By Design
Oracle OnDemand
EDIX12
EDI-Fact
RosettaNet
HL7
HIPAA
AST
FIX
SWIFT
Cargo IMP
MVR
Packaged
Applications
SaaS/BPO
Industry
Standards
XML Standards
XML
LegalXML
IFX
cXML
Facebook
Twitter
LinkedIn
ebXML
HL7 v3.0
ACORD (AL3, XML)
Kapow
Datasift
Social Media
Handhelds, Smart
Meters, etc.
Discrete Data
Messages
Internet of Things,
Sensor Data
VDS
Node
VDS
Node
VDS
Node
Sources
Web Servers,
Operations
Monitors, rsyslog,
SLF4J, etc.
Publish / Subscribe
Zookeeper
VDS
Node
Hadoop HDFS,
HBase,
VDS
Node
Real Time
Analysis,
Complex Event
Processing
VDS
Node
No SQL
Databases:
Cassandara,
Riak, MongoDB
Targets
= Value
^/>Delimited<\^
Svc Repository
Productivity
Visual
parsing
environment
Predefined
translations
from
passing
dataserver
between
processes,
thesocial
network,
NOTE:
If the
file
system
isacross
mountable
from
are
moved.
respective
design
environments.
in
reduced
development
and
maintenance
times
and
lower
etc. The
engine
is
also
dynamically
invoked
and
does
This
isDTs
athe
GUI
transformation
the
developer
machine
directly,
then
step not
2
A good
example
is
support
of
PowerCenter
partitioning
Though
not
shown
below,
engine
fully
supports
multiple
input
impact
of
change.
On
the
output
side,
DT
can
also
directly
Fortoothers
the
API
layer
can
be
used
directly.
need
be
started
up
or
maintained
externally.
would
deploy
directly
tothe
thetransformation.
server.
and output
files
orto
buffers
as
needed
by
scale
up
processing.
widget
in Powercenter
which
Java, C++, C,
web services
write.NET,
to the filesystem.
Device/sensor
wraps around the DT API and
scientific
engine.
Any DI/BI architecture
PIG
EDW
MDM
No-code visual
development
environment
Preview results at
any point in the
data flow
Hive-QL
MapReduce
UDF
Accelerate Development
Reuse and Import PowerCenter Metadata
1. Profiling Stats:
Min/Max Values, NULLs,
Inferred Data Types, etc.
Stats to identify
outliers and
anomalies in data
2. Value &
Pattern
Analysis of
Hadoop Data
1
4
PowerCenter
Big Data
Edition
DDM
5
Data Security
Transformation
2
Security
Policy
UDF
Values Presented:
BLAKE
BL****
JONES
JO****
KING
KI****
Business user
application screen
Private Information
Stored
Application screens
and tools used by
production support,
DBAs, Outsourced or
unauthorized workforce
(2)Select substring(name,1,2)||***
from table1
BLAKE
JONES
KING
Hadoop
29
Archive to Hadoop
Compression Extends Hadoop Cluster Capacity
Without INFA Optimized
Archive Compression
10 TB
10 TB
10 TB
500 GB
10 TB replicated 3X = 30TB
500 GB
500 GB
Replicated 3X = 1.5 TB
20X less I/O bandwidth required
20 min vs. 1 min response time
8 hours vs. 24 mins backup window
Unified Administration
Single Place to Manage & Monitor
Full traceability from workflow
to MapReduce jobs
View generated
Hive scripts
Learn more at
http://bit.ly/powercenterbde
32