You are on page 1of 49

Created by C. M.

Saracco, IBM Silicon Valley Lab


June 2016

IBM BigInsights:
Bringing you big value from Big Data

© 2016 IBM Corporation


IBM Disclaimer

Information regarding potential future products is intended to outline our


general product direction and it should not be relied on in making a purchasing
decision. The information mentioned regarding potential future products is not
a commitment, promise, or legal obligation to deliver any material, code or
functionality. Information about potential future products may not be
incorporated into any contract. The development, release, and timing of any
future features or functionality described for our products remains at our sole
discretion.

2 © 2016 IBM Corporation


Agenda

 The big picture about Big Data

 IBM’s approach
 Portfolio overview
 BigInsights
• Open source core platform with Apache Hadoop
• IBM technologies for enhanced analytics
• How BigInsights fits within a broader IT infrastructure

 How IBM can help you get off to a quick start

3 © 2016 IBM Corporation


The Big Picture about Big Data

© 2016 IBM Corporation


Information is at the center … and organizations
of a new wave of opportunity… need deeper insights
2.5 million items 5 TB per flight
per minute Business leaders frequently make

300,000 tweets
> 1 PB per day
1 in 3 decisions based on information they
don’t trust, or don’t have
per minute
gas turbines
200 million emails
per minute 220,000 photos
per minute 1 in 2 Business leaders say they don’t
have access to the information they
need to do their jobs

of CIOs cited “Business

83% intelligence and analytics” as part


of their visionary plans
to enhance competitiveness

of CEOs need to do a better job

60% capturing and understanding


information rapidly in order to
make swift business decisions

1 ZB = 1 billion TB

5 © 2016 IBM Corporation


Big Data presents big opportunities
Extract insight from a high volume, variety and velocity of data in a
timely and cost-effective manner

Variety: Manage and benefit from


diverse data types and data
structures

Velocity: Analyze streaming data and


large volumes of persistent
data

Volume: Scale from terabytes to


zettabytes

6 © 2016 IBM Corporation


What we hear from customers . . . .

 Lots of potentially valuable data is dormant or


discarded due to size/performance issues

 Large volume of unstructured or semi-structured data


is not worth integrating fully (e.g. Tweets, logs, . . .)

 Not clear what should be analyzed (exploratory,


iterative)

 Information distributed across multiple systems


and/or Internet

 Some data has a short useful lifespan

 Volumes can be extremely high

 Query-ready resource for “cold” historic data needed


(prevent unwieldy growth of data warehouses)

 Analysis needed in the context of existing information


(not stand alone).

7 © 2016 IBM Corporation


Merging the traditional and Big Data approaches
Traditional Approach Big Data Approach
Structured & Repeatable Iterative & Exploratory

IT
Business Users
Delivers a platform
Determine what to enable creative
question to ask discovery

IT Business
Structures the Explores what
data to answer questions could be
that question asked
Monthly sales reports Brand sentiment
Profitability analysis Product strategy
Customer surveys Maximum asset utilization

8 © 2016 IBM Corporation


Why invest in analytics?

 Analytics pay back $13.01 for every


dollar spent1
 69% created significant positive impact
on business outcomes2
 60% created significant positive impact
on revenues2
 53% created significant competitive
advantage2

1 “Analytics Pays Back $13.01 for Every Dollar Spent” Nucleus Research, September 2014
2 “Analytics: The speed advantage” IBM Institute for Business Value, 2014

9 © 2016 IBM Corporation


Big Data scenarios span many industries

Multi-channel customer
sentiment and experience a
analysis

Detect life-threatening
conditions at hospitals in
time to intervene

Predict weather patterns to plan


optimal wind turbine usage, and
optimize capital expenditure on
asset placement

Make risk decisions based on


real-time transactional data

Identify criminals and threats


from disparate video, audio,
and data feeds
10 © 2016 IBM Corporation
IBM Big Data and analytics sample architecture
Advanced
Big Data Platform Analytics / New / Enhanced
All Data Sources Capabilities New Insights Applications

Streaming Data • Information Ingest


Watson
• Real Time Analytics
Text Data • Warehouse & Data Marts
Cognitive Alerts
• Analytic Appliances Learn Dynamically?

Applications Data Real-time


Prescriptive Automated Process
Analytics
Best Outcomes?
Zone

Enterprise
Time Series Warehouse Predictive Case Management
and Mart What Could Happen?
Ingestion Zone
and
Operational
Geo Spatial Information
Descriptive Analytic Applications
Landing and What Has Happened?
Archive Zone
Video & Analytic
Image Appliances Cloud Services
Exploration and
Discovery
Relational What Do You Have?
Information Governance, Security and Business Continuity ISV Solutions

Social Network

11 © 2016 IBM Corporation


Big Data use expanding rapidly

Big data adoption over time,


as reported by respondents:
Execute:
Using big data and analytics 5%-6% 13% 210%
increase
pervasively across the enterprise 2012 to 2014 2015

Engage:
Implementing infrastructure and 22%-27% 25% 0%
change
running pilot activities
2012 to 2014 2015

Explore:
Exploring internal use cases and 43%-47% 53% 125%
increase
developing a strategy 2012 to 2014 2015

Educate:
Learning about 24%-26% 10% 250%
decrease
big data capabilities 2012 to 2014 2015

2015 IBV study “Analytics: The Upside of Disruption” (ibm.biz/w3_2015analytics)


12 © 2016 IBM Corporation
Big Data technologies pay off

2015 IBV study “Analytics: The Upside of Disruption” (ibm.biz/w3_2015analytics)


13 © 2016 IBM Corporation
Big Data ROI often < 18 months

Return on investment period for big data and analytics projects


as reported by respondents

2015 IBV study “Analytics: The Upside of Disruption” (ibm.biz/w3_2015analytics)


14 © 2016 IBM Corporation
Big Data in practice: focus areas

15 Survey summaries from Forbes, May 2015 © 2016 IBM Corporation


IBM’s approach

© 2016 IBM Corporation


IBM analytics platform strategy for Big Data

• Integrate and
manage the full IBM ANALYTICS PLATFORM
Built on Spark. Hybrid. Trusted.
variety, velocity and
volume of Big Data
Discovery Predictive Prescriptive Content
• Apply advanced & Exploration Analytics Analytics Analytics
analytics
• Visualize all available Business Intelligence
data for ad-hoc
analysis
Data Content Hadoop & Data
• Support workload Mgmt Mgmt NoSQL Warehouse
optimization and
scheduling Information Integration & Governance
• Provide for security
and governance Spark Analytics Operating System
On premises Machine Learning On cloud
• Integrate with
enterprise software Data at rest & In-motion. Inside & outside the firewall. Structured & unstructured.

17 © 2016 IBM Corporation


IBM BigInsights for Apache Hadoop and Spark
 Analytical platform for
persistent Big Data
– 100% open source core IBM ANALYTICS PLATFORM
Built on Spark. Hybrid. Trusted.
with IBM add-ons for
analysts, data
scientists, and admins Discovery Predictive Prescriptive Content
– On premise or cloud & Exploration Analytics Analytics Analytics
 Distinguishing
characteristics Business Intelligence
– Built-in analytics . . . .
Enhances business
knowledge Data Content Hadoop & Data
– Enterprise software Mgmt Mgmt NoSQL Warehouse
integration . . . .
Complements and
Information Integration & Governance
extends existing
capabilities
Spark Analytics Operating System
– Production-ready . . . .
On premises Machine Learning On cloud
Speeds time-to-value
 IBM advantage
Data at rest & In-motion. Inside & outside the firewall. Structured & unstructured.
– Combination of
software, hardware,
services and research

18 © 2016 IBM Corporation


Overview of BigInsights

Free Quick Start (non production):


• IBM Open Platform
• IBM added value features
• Community support

IBM-specific BigInsights features


Big SQL (industry standard SQL)
Text analytics
BigSheets (spreadsheet-style tool)
Big R (R support)

IBM Streams, Cognos (limited use licenses)

IBM Open Platform


100% open source platform compliant with ODPi
Apache Hadoop ecosystem
Apache Spark ecosystem

19 © 2016 IBM Corporation


BigInsights ISV Partner Ecosystem

lHelium SW

20 © 2016 IBM Corporation


A Closer Look at IBM BigInsights . . . .

© 2016 IBM Corporation


Overview of BigInsights

Free Quick Start (non production):


• IBM Open Platform
• IBM added value features
• Community support

IBM-specific BigInsights features


Big SQL (industry standard SQL)
Text analytics
BigSheets (spreadsheet-style tool)
Big R (R support)

IBM Streams, Cognos (limited use licenses)

IBM Open Platform


100% open source platform compliant with ODPi
Apache Hadoop ecosystem
Apache Spark ecosystem

22 © 2016 IBM Corporation


IBM Open Platform foundational components

 Apache Hadoop
 Distributed file system, popular API (MapReduce)
for clustered computing
 Originally designed for batch processing of massive
data volumes, varied data formats

 Apache Spark
 General purpose, high-speed data processing
engine for clustered computing
 In-memory processing, popular built-in libraries
(e.g., machine learning)
 No built-in storage. Attaches to other data stores
(e.g., Hadoop Distributed File System)

23 © 2016 IBM Corporation


IBM Open Platform: a closer look
 Timely updates as new open source versions released
 Install only those components you want / need
 Compliant with ODPi runtime
Ambari 2.2
Flume 1.6.0
Hadoop (includes MapReduce, YARN) 2.7.2
HBase 1.2.0
Hive 1.2.1
Kafka 0.9.0.1
Knox 0.7.0
Oozie 4.2.0
Parquet 2.2
Phoenix 4.6.1
Pig 0.15.0
Ranger 0.5.2
Slider 0.90.2
Solr 5.5
Spark 1.6.1
Sqoop 1.4.6
Titan 1.0.0
ZooKeeper 3.4.6

24 © 2016 IBM Corporation


What is ODPi?
• Non-profit organization • ODPi has an open governance ODPi & Apache Software
model. Developers form a Foundation (ASF)
accelerating the delivery of Big
ODPi supports the ASF mission
Data solutions by powering a Technical Steering Committee ASF provides governance around
platform called ODPi Core. • All members have an equal individual projects without looking
vote on ODPi Core decisions. at ecosystem and collections of
• The ODPi Core focuses on a small projects
but critical set of projects • ODPi has a Board of Directors ODPi provides a vendor-led
responsible for the financial, consistent packaging model and
• Goal: enables a rapid start and an legal and promotional aspects certification for Big Data
industry driven definition components as an ecosystem -
of ODPi. Test once ; Run anywhere for big data
applications

ODPi Members include: Ampool, Altiscale, ArenaData, AsiaInfo, Capgemini, DataTorrent, EMC, GE,
Hortonworks, IBM, Infosys, NEC, Pivotal, PLDT, SAS, Squid Solutions, SyncSort, Telstra, Toshiba, UNIFi,
VMware, WANdisco, Xiilab, zData and Zettaset.

25 © 2016 IBM Corporation


Overview of BigInsights

Free Quick Start (non production):


• IBM Open Platform
• IBM added value features
• Community support

IBM-specific BigInsights features


Big SQL (industry standard SQL)
Text analytics
BigSheets (spreadsheet-style tool)
Big R (R support)

IBM Streams, Cognos (limited use licenses)

IBM Open Platform


100% open source platform compliant with ODPi
Apache Hadoop ecosystem
Apache Spark ecosystem

26 © 2016 IBM Corporation


SQL for Hadoop (Big SQL)
 Comprehensive, standard SQL
– SELECT: joins, unions, aggregates, subqueries . . . SQL-based
– GRANT/REVOKE, INSERT … INTO Application
– UPDATE / DELETE (HBase)
IBM data server
– Procedural logic in SQL client
– Stored procs, user-defined functions
– IBM data server JDBC and ODBC drivers

 Optimization and performance


– IBM MPP engine (C++) replaces Java MapReduce layer
– Continuous running daemons (no start up latency) Big SQL Engine
– Message passing allow data to flow between nodes SQL MPP Run-time
without persisting intermediate results
– In-memory operations with ability to spill to disk (useful
for aggregations, sorts that exceed available RAM)
Data Storage
– Cost-based query optimization with 140+ rewrite rules

 Various storage formats supported


– Data persisted in DFS, Hive, HBase DFS
– No IBM proprietary format required

 Integration with RDBMSs via LOAD, query federation BigInsights

27
27 © 2016 IBM Corporation
Big SQL query federation = virtualized data access
Transparent
 Appears to be one source
 Programmers don’t need to know how /
where data is stored

Heterogeneous
 Accesses data from diverse sources

Virtualized High Function


data  Full query support against all data
 Capabilities of sources as well

Autonomous
 Non-disruptive to data sources, existing
applications, systems.

High Performance
 Optimization of distributed queries
SQL tools,
applications Data sources

28 © 2016 IBM Corporation


Overview of BigInsights

Free Quick Start (non production):


• IBM Open Platform
• IBM added value features
• Community support

IBM-specific BigInsights features


Big SQL (industry standard SQL)
Text analytics
BigSheets (spreadsheet-style tool)
Big R (R support)

IBM Streams, Cognos (limited use licenses)

IBM Open Platform


100% open source platform compliant with ODPi
Apache Hadoop ecosystem
Apache Spark ecosystem

29 © 2016 IBM Corporation


Text analytics
 Distills structured info from unstructured text
 Sentiment analysis
 Consumer behavior
 Illegal or suspicious activities
 …
 Parses text and detects meaning with annotators
 Understands the context in which the text is analyzed
 Features pre-built extractors for names, addresses, phone numbers, etc.

I had an iphone, but it's dead


@JoaoVianaa.
(I've no idea where it's) !Want a
Galaxy now !!!
@rakonturmiami im moving to
miami in 3 months.
i look foward to the new
lifestyle
I'm at Mickey's Irish Pub Downtown (206 3rd St, Court
Ave, Des Moines) w/ 2 others http://4sq.com/gbsaYR

30 © 2016 IBM Corporation


Extracting information from text

Text Classified Classified


Single column or Text words / words /
document preparation attributes attributes

Information Extraction (IE)


Recognize Describe Analyze
via extractors
via lexical analysis via deep linguistic
analysis
Entity Analytics
Entity
• language • verb-centric
Recognition
detection
Preventative
abstraction
• sentence • noun-centric Tagged Maintenance
segmentation abstraction syntax Machine Data
• tokenization • shallow parsing
Primitives Customer
• part-of-speech • … Segmentation
tagging
• extraction Sentiment Sentiment
operations
• span Affinity
operations
• join
operations …
• consolidations …
• ……

31 © 2016 IBM Corporation


Text analytics tooling

Web-based tool to define rules to extract data


and derive information from unstructured text

Graphical interface to describe structure of


various textual formats – from log file data to
natural language

32 © 2016 IBM Corporation


Pre-built text extractors
 The extractor library contains a rich set of
pre-built extractors
 Finance actions
 Named Entities
 Generic
 Machine Data
 Sentiment Analysis

 You can control output properties


 Output columns and names
 Row filters

 Some pre-built extractors can be


customized
 Add / remove dictionary terms

33 © 2016 IBM Corporation


Overview of BigInsights

Free Quick Start (non production):


• IBM Open Platform
• IBM added value features
• Community support

IBM-specific BigInsights features


Big SQL (industry standard SQL)
Text analytics
BigSheets (spreadsheet-style tool)
Big R (R support)

IBM Streams, Cognos (limited use licenses)

IBM Open Platform


100% open source platform compliant with ODPi
Apache Hadoop ecosystem
Apache Spark ecosystem

34 © 2016 IBM Corporation


Spreadsheet-style analysis (BigSheets)
 Web-based analysis
and visualization

 Spreadsheet-like
interface
 Explore, manipulate
data without writing
code
 Invoke pre-built
functions
 Generate charts
 Export results of
analysis
 Create custom plug-ins
 ...

35 © 2016 IBM Corporation


Overview of BigInsights

Free Quick Start (non production):


• IBM Open Platform
• IBM added value features
• Community support

IBM-specific BigInsights features


Big SQL (industry standard SQL)
Text analytics
BigSheets (spreadsheet-style tool)
Big R (R support)

IBM Streams, Cognos (limited use licenses)

IBM Open Platform


100% open source platform compliant with ODPi
Apache Hadoop ecosystem
Apache Spark ecosystem

36 © 2016 IBM Corporation


What is Big R?
“End-to-end integration of R-Project with BigInsights”
R Clients
1. Explore, visualize, transform,
and model big data using
Pull data
familiar R syntax and (summaries) to
paradigm (no MapReduce R client
code) R Packages

2. Scale out R
1
Data Sources
• Partitioning of large data (“divide”)
• Parallel cluster execution of 3 Scalable
pushed down R code (“conquer”) Statistic
• All of this from within the R s Engine
environment (Jaql, Map/Reduce
are hidden from you
• Almost any R package can run in
this environment 2
Or, push R
3. Scalable machine learning functions R Packages
• A scalable statistics engine that right on the
provides canned algorithms, and data
an ability to author new ones, all
via R Embedded R Execution
37 © 2016 IBM Corporation
Overview of BigInsights

Free Quick Start (non production):


• IBM Open Platform
• IBM added value features
• Community support

IBM-specific BigInsights features


Big SQL (industry standard SQL)
Text analytics
BigSheets (spreadsheet-style tool)
Big R (R support)

IBM Streams, Cogmos (limited use licenses)

IBM Open Platform


100% open source platform compliant with ODPi
Apache Hadoop ecosystem
Apache Spark ecosystem

38 © 2016 IBM Corporation


Limited use license: IBM Streams
Just-in-time decisions
 Platform for real-time Big
Data analytics
 “Data in motion” Powerful Persist to
analytics BigInsights,

 Gigabytes+ per
Millions of Microsecond
second or more events per Latency
second
 Terabyte+ per day

 All kinds of data Sensor, video, audio, text, Hadoop


and relational data sources

 Insights in
microseconds

 Connectivity to varied
data sources
39 © 2016 IBM Corporation
Limited use license: Cognos BI
 Model, explore, analyze
data from many sources

 Visualize and report on


results

 Connection to BigInsights
via Big SQL

 In-memory dynamic views


cache data in Cognos for
quick data access

 Part of IBM BigInsights for


Apache Hadoop
Demo: https://www.youtube.com/watch?v=yxnoGrK6PSY

40 © 2016 IBM Corporation


Thinking cloud? Think IBM!

FASTER BETTER LOWER RISK


INNOVATION ECONOMICS OF FAILURE

Lower Skill
+ Less Cost
Buy only what you need.
Start small and grow. EQUALS

41 © 2016 IBM Corporation


IBM BigInsights on cloud

Build Manage Support Protect


 Ready-to-run Hadoop  Key platform  24x7 cloud operations  Deployed in world-
clusters in the cloud components monitored and support team class, secure SoftLayer
for availability data centers
 IBM Open Platform -  Access to deep
100% open source  Hadoop, OS and Hadoop expertise  Dedicated physical
Hadoop; will align with BigInsights patched machines
ODP and maintained  Faster time to problem
resolution  Certified SSAE SOC2
 Based on proven,  Ambari cluster Type 1, ISO 27001
performant reference manager for complete
architectures control

http://www.ibm.com/cloud
42 © 2016 IBM Corporation
http://www.bluemix.net
Summary and Fast Start

© 2016 IBM Corporation


IBM investing heavily in Big Data and analytics

$100M
$24B Announced investment
in IBM Interactive
Experience, creating
9
Investment Analytics
10 new labs worldwide
in both organic
development $1B Solution
Centers
and 30+ To bring Developing
acquisitions cognitive curriculum
services and and training for
applications analytics with
to market
1,000
universities

44 © 2016 IBM Corporation


Spark investments: community, core, and consumption
3500+ IBM developers and researchers

Community Core Consumption


Growing Spark Accelerating Spark Using Spark within IBM
knowledge & expertise capabilities & partner products

Spark Technology SystemML open Spark stand-alone


Center source contribution
Hadoop distribution
Big Data University
IBM portfolio

30+ research initiatives

45 © 2016 IBM Corporation


The bottom line about IBM and Big Data

 Big Data is a strategic initiative for IBM


 Significant investments across software, hardware and services.

 BigInsights
 Enables firms to exploit growing variety, velocity, and volume of data
 Delivers diverse range of analytics
 Leverages and extends open source
 Provides enterprise-class features and supporting services
 Complement existing software investments and commercial offerings

 IBM advantage
 Full solution spanning software, hardware & services
 Rapid technology advances through partnerships with IBM Research
 Global reach

46 © 2016 IBM Corporation


Jump start your efforts with IBM Analytics Stampede
Leading the charge for your analytics success

 IBM’s Expertise - takes the guesswork out and delivers savings in time and cost for your
early enablement and success
 IBM’s Analytics Solution - provides unmatched capabilities for processing and analyzing all
types of data
 Skills & Knowledge Transfer - ensures knowledge transfer and training roadmap for skills
enablement in your organization for new analytics requirements

Standard Research Use Case Selection Product Selection Skills & Knowledge Services Soluiton
Roadmap Success

Time to insights
 Knowledge Transfer
Stampede  Analytics Prototypes
Solution
 BVA / Roadmaps Success
IBM Expertise

https://www-01.ibm.com/software/data/services/stampede.html

47 © 2016 IBM Corporation


Want to learn more?
 Download Quick Start offering
 Follow tutorials, videos, and more
 Links all available from HadoopDev
– https://developer.ibm.com/hadoop/

48 © 2016 IBM Corporation


IBM big data
IBM big data • IBM big data • IBM big data

THINK

• IBM big data


• IBM big data
IBM big data

IBM big data • IBM big data • IBM big data © 2016 IBM Corporation

You might also like