You are on page 1of 31

Hortonworks: Hadoop for the Enterprise

We Do Hadoop

Spring
Page 1 2015
© Hortonworks Inc. 2011 – 2015. All Rights Reserved
Hadoop emerged as foundation of new data architecture
Apache Hadoop is an open source data platform for
Application
managing large volumes of high velocity and variety of data
Batch Processing
MapReduce •  Built by Yahoo! to be the heartbeat of its ad & search business
Storage •  Donated to Apache Software Foundation in 2005 with rapid adoption by
HDFS
large web properties & early adopter enterprises
•  Incredibly disruptive to current platform economics

Traditional Hadoop Advantages Modern Hadoop overcame limitations – NOW SOLVED


ü  Manages new data paradigm ü  Now on-line + real-time/streaming + batch (it was batch-only
architecture)
ü  Handles data at scale
ü  Now multi-tenancy cluster (it was single purpose cluster)
ü  Cost effective
ü  Now fully integrated with other technologies (it was difficult to
ü  Open source integrate with existing investments)
ü  Now enterprise ready (it was not enterprise-grade)

Page 2 © Hortonworks Inc. 2011 – 2015. All Rights Reserved


Traditional systems under pressure 2020
40 Zettabytes
1 Challenges INDUSTRY
Clickstream
•  Constrains data to app LEADERS
•  Can’t manage new data Geolocation
•  Costly to Scale
Web Data

2 New Data
New Internet of Things
Business Value
Docs, emails

Server logs

LAGGARDS
ERP CRM SCM

2012 Traditional
2.8 Zettabytes

Page 3 © Hortonworks Inc. 2011 – 2015. All Rights Reserved


Modern Data Architecture emerges to unify data & processing

Modern Data Architecture


ANALYTICS

Data Business Visualization


Applications Analytics
Marts & Dashboards
•  Enable applications to have access
to all your enterprise data through an
efficient centralized platform
Batch
MPP  
Batch
EDW   Batch Interactive Real-Time Partner ISV •  Supported with a centralized
YARN: Data Operating System
approach governance, security and
° ° ° ° ° ° °HDFS
° ° ° ° ° ° ° °
operations
° (Hadoop Distributed° File System)
•  Versatile to handle any applications
° ° ° ° ° ° ° ° ° ° ° ° °

and datasets no matter the size or


type
SOURCES

ERP   CRM   SCM   Clickstream   Web     Geoloca3on   Sensor     Server     Unstructured  


&  Social   &  Machine   Logs  
Existing Systems

Page 4 © Hortonworks Inc. 2011 – 2015. All Rights Reserved


Hadoop adoption follows a predictable journey
Cost Optimization, new analytic apps, and ultimately to a “data lake”

Page 5 © Hortonworks Inc. 2011 – 2015. All Rights Reserved


Hadoop Use Cases
1.  Cost Optimization
✓  Archive
✓  ETL Offload ERP   CRM   SCM  

✓  Enrich value DW environment

2.  Advanced Analytics Applications


✓  Single View
✓  Predictive Analytics
✓  Data Discovery

3.  Centralized Data Architecture (Data Lake)


✓  Data Operating System Systems of Insight

✓  Any App, Any Data


DATA LAKE
✓  Systems of Insight
Page 6 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Hadoop Driver: Cost optimization
HDP helps you reduce costs and optimize the value associated with your EDW
ANALYTICS

Data
Marts
Business
Analytics
Visualization
& Dashboards
Archive Data off EDW
Move rarely used data to Hadoop as active
archive, store more data longer

MPP Enterprise Data


Warehouse HDP 2.2 Offload costly ETL process
DATA SYSTEMS

Free your EDW to perform high-value functions


Cold Data,
Deeper Archive
like analytics & operations, not ETL
& New Sources ELT
In-Memory Hot

Enrich the value of your EDW


° ° ° ° ° ° ° ° ° °

° ° ° ° ° ° ° ° ° N

Use Hadoop to refine new data sources, such


as web and machine data for new analytical
context
SOURCES

ERP   CRM   SCM   Clickstream   Web     Geoloca3on   Sensor     Server     Unstructured  


&  Social   &  Machine   Logs  
Existing Systems

Page 7 © Hortonworks Inc. 2011 – 2015. All Rights Reserved


Hadoop Driver: Cost optimization

Page 8 © Hortonworks Inc. 2011 – 2015. All Rights Reserved


Hadoop Driver: Advanced analytic applications
Hadoop allows you to bring together new and existing data to capture new value through
advanced analytic applications. Many customers start with one of three app types:

Single View Predictive Analytics Data Discovery


•  Bring together data from many •  Analyze streams of data as they flow •  Combine and explore large amounts
sources to gain a single view of a from machines and sensors to of data from many sources to mine
customer or product predict failure and avoid downtime for insights
•  Improve marketing and customer •  Monitor peak or low usage of a •  Analyze new data types – sentiment,
service interactions with customers resource, optimize flow of raw clickstream, sensor, geolocation,
•  Identify cross sell and up sell resources and optimize supply chain logfile, unstructured – together to
opportunities, increase products per •  Monitor biometric data to deliver uncover patterns and opportunities
customer, and reduce churn proactive care •  Look at data across longer time
•  Create a single product catalog •  Analyze transactions to identify horizons to uncover new patterns
•  …and more fraud as it occurs •  …and more
•  …and more
Page 9 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Hadoop Driver: Advanced analytic applications
Single View Predictive Analytics Data Discovery
Improve acquisition and retention Identify your next best action Uncover new findings

New Account Risk Screens Trading Risk Insurance Underwriting

Financial Services Improved Customer Service Insurance Underwriting Aggregate Banking Data as a Service
Cross-sell & Upsell of Financial Products Risk Analysis for Usage-Based Car Insurance Identify Claims Errors for Reimbursement

Unified Household View of the Customer Searchable Data for NPTB Recommendations Protect Customer Data from Employee Misuse

Telecom Analyze Call Center Contacts Records Network Infrastructure Capacity Planning Call Detail Records (CDR) Analysis
Inferred Demographics for Improved Targeting Proactive Maintenance on Transmission Equipment Tiered Service for High-Value Customers
360° View of the Customer Supply Chain Optimization Website Optimization for Path to Purchase

Retail Localized, Personalized Promotions A/B Testing for Online Advertisements Data-Driven Pricing, improved loyalty programs
Customer Segmentation Personalized, Real-time Offers In-Store Shopper Behavior
Supply Chain and Logistics Optimize Warehouse Inventory Levels Product Insight from Electronic Usage Data

Manufacturing Assembly Line Quality Assurance Proactive Equipment Maintenance Crowdsource Quality Assurance
Single View of a Product Throughout Lifecycle Connected Car Data for Ongoing Innovation Improve Manufacturing Yields
Electronic Medical Records Monitor Patient Vitals in Real-Time Use Genomic Data in Medical Trials

Healthcare Improving Lifelong Care for Epilepsy Rapid Stroke Detection and Intervention Monitor Medical Supply Chain to Reduce Waste
Reduce Patient Re-Admittance Rates Video Analysis for Surgical Decision Support Healthcare Analytics as a Service
Unify Exploration & Production Data Monitor Rig Safety in Real-Time Geographic exploration
Oil & Gas
DCA to Slow Well Declines Curves Proactive Maintenance for Oil Field Equipment Define Operational Set Points for Wells
Single View of Entity CBM & Autonomic Logistic Analysis Sentiment Analysis on Program Effectiveness
Government
Prevent Fraud, Waste and Abuse Proactive Maintenance for Public Infrastructure Meet Deadlines for Government Reporting

Page 10 © Hortonworks Inc. 2011 – 2015. All Rights Reserved


Hadoop Driver: Enabling the data lake
Journey to the Data Lake with Hadoop

Data Lake Definition


SCALE

Systems of Insight
•  Centralized Architecture
Multiple applications on a shared data set
DATA LAKE with consistent levels of service

•  Any App, Any Data


Multiple applications accessing all data
affording new insights and opportunities.
Goal:
•  Centralized Architecture •  Unlocks ‘Systems of Insight’
•  Data-driven Business Advanced algorithms and applications
used to derive new value and optimize
existing value.

Drivers:
1.  Cost Optimization
2.  Advanced Analytic Apps

SCOPE
Page 11 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
HDP: Any Data, Any Application, Anywhere
Any Data Anywhere
Deploy applications fueled by clickstream, sensor, Implement HDP naturally across the
social, mobile, geo-location, server log, and other complete range of deployment options
new paradigm datasets with existing legacy
datasets. commodity appliance cloud

ERP   CRM   SCM   Clickstream   Web     Geoloca3on   Internet  of   Server     Files,  emails  
&  Social   Things   Logs  
hybrid

Any Application
•  Deep integration with ecosystem
partners to extend existing Over 70 Hortonworks Certified YARN Apps
investments and skills
•  Broadest set of applications through
the stable of YARN-Ready applications

Page 12 © Hortonworks Inc. 2011 – 2015. All Rights Reserved


Case Study: 12 month Hadoop evolution at TrueCar

12 Month Results at TRUECar May ‘14


•  Six Production Hadoop Applications IPO
•  Sixty nodes/2PB data
•  Storage Costs/Compute Costs
from $19/GB to $0.23/GB
Feb 2014
“We addressed our data platform capabilities Three More
strategically as a pre-cursor to IPO.” Production
Apps
Jan 2014
Data Platform Capabilities

(6 total)
40% Dev
Dec 2013 Staff
Three Perficient
Nov 2013
Production
Aug 2013 Production
Apps
Training Cluster
June 2013 (3 total)
Begin July 2013 & Dev 60 Nodes
Hortonworks Begins 2 PB
Hadoop
Execution Partnership

12 months execution plan

Page 13 © Hortonworks Inc. 2011 – 2015. All Rights Reserved


Hortonworks Data Platform
Hadoop for the Enterprise

Page 14 © Hortonworks Inc. 2011 – 2015. All Rights Reserved


Hadoop for the Enterprise:
Implement a Modern Data Architecture with HDP

Customer Momentum
•  330+ customers (as of year-end 2014)

Hortonworks Data Platform


•  Founded in 2011 •  Completely open multi-tenant platform for any app & any
•  Original 24 architects, developers, data.
operators of Hadoop from Yahoo! •  A centralized architecture of consistent enterprise services
for resource management, security, operations, and
•  600+ Employees
governance.
•  1000+ Ecosystem Partners
Partner for Customer Success
•  Open source community leadership focus on enterprise
needs
Page 15 © Hortonworks Inc. 2011 – 2015. All Rights Reserved •  Unrivaled world class support
Hortonworks Differentiators
1.  Centralized Architecture
✓  No “siloed" architecture

2.  Partnerships
✓  Technology Partnership
✓  Customer Partnership

3. Open-Source + Open-Community
✓  Development Model
✓  Business Model - SUBSCRIPTION, NO LICENSES
✓  Leadership

Page 16 © Hortonworks Inc. 2011 – 2015. All Rights Reserved


Only HDP delivers a Centralized Architecture
HDP is uniquely built around YARN serving as a data operating system that provides multi-tenant Resource
Management, consistent Governance & Security and efficient Operations services across Hadoop applications.

Hortonworks Data Platform


YARN
Existing New Partner
Applications Analytics Applications Data Operating System Key Benefits
Data Access: Batch, Interactive & Real-time
•  A centralized architecture of •  Multiple applications on a shared data set
consistent enterprise services for with consistent levels of service: a
Resource Management resource management, security, multitenant data platform.
operations, and governance.
Governance Security •  Provides a shared platform to enable new
•  The versatility to support multiple analytic applications.
YARN: Data Operating System applications and diverse workloads
•  Delivers maximum cost efficiency for
from batch to interactive to real-time,
Operations cluster resource management. Fewer
open source and commercial.
servers fewer nodes.
Storage

Page 17 © Hortonworks Inc. 2011 – 2015. All Rights Reserved


HDP delivers a Centralized Architecture
Hortonworks Data Platform Other Hadoop Vendors
A centralized architecture built on YARN A siloed “with” YARN architecture

Existing New Partner


Applications Application Application Application
Applications Analytics (ie. SAS)

Data Access: Batch, Interactive & Real-time Security Security Security

Governance Governance
… Governance
Resource Management

Cluster N
Cluster 1

Cluster 2
Governance Security Batch Interactive Real-time

YARN: Data Operating System Dedicated Dedicated


Resource mgt Resource mgt
YARN
Operations
Storage Storage Storage

Storage Operations Operations Operations

Single cluster, multiple applications Disjoint, Siloed Clusters


•  Efficient storage, processing •  Inefficient use of resources, single tenant, duplicate storage & processing
•  Centralized Security, Operations, Governance •  Multiple implementations of governance, security and operations
•  Run a variety of applications simultaneously
•  New applications require new clusters

YARN
Page 18 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
The value of HDP’s Centralized Architecture
Hortonworks Data Platform Other Hadoop Vendors
A centralized architecture built on YARN Siloed “with” YARN Architecture

Governance
Shared services ensure consistent, effective
Fragmented and bolt-on application of the key
Consistent polices implemented for governance &
Security services increases complexity and risk
Services security. Single point for operations
streamlines deployment
Operations

Resource Mgmnt
Applications require separate clusters:
Efficient Hardware
Shared storage and processing creates lack of sharing creates duplication of storage
Resources efficiencies: less hardware, less cost. and processing: more hardware, more data
movement, and more cost
People

Engines Consistent approach to on-boarding:


New applications and new engines require
new engines and new applications deployed
Ease of additional clusters: resulting in cluster
Applications inherit shared services; add more resources
Expansion sprawl, extended deployment time, and
to existing cluster - no need to spin up a new
costly integration efforts
Clusters cluster

Page 19 © Hortonworks Inc. 2011 – 2015. All Rights Reserved


The value of HDP’s Centralized Architecture
Hortonworks Data Platform Other Hadoop Vendors
A centralized architecture built on YARN Disjoint “with” YARN Architecture

Governance Shared metadata & management of lifecycle Application specific, requires multiple definitions

Consistent Security Comprehensive policy, consistent enforcement Inconsistent app specific security policy, increases risk
Services
Operations Single configuration point, eases deployment, production Configure multiple clusters, resource intense

Resource Mgmnt Efficient sharing & predictable performance for all apps Resources monopolized by specific apps and users

Efficient Hardware Shared resources result in less hardware, less cost Expensive, each cluster/app requires dedicated hardware
Resources
People Single team to manage/maintain cluster Multiple cluster, multiple teams to manage, added cost

Engines New engines slide on YARN seamlessly New engines require costly integration effort

Ease of Applications New applications inherit consistent services Extended deployment time, re-implementing services
Expansion
Clusters More apps & data expand cluster. No new cluster. More apps & data require new cluster and new costs

Page 20 © Hortonworks Inc. 2011 – 2015. All Rights Reserved


HDP delivers a completely open data platform
Hortonworks Data Platform provides Hadoop for the Enterprise: a centralized architecture
of core enterprise services, for any application and any data.

HDP 2.2 Completely Open


Hortonworks Data Platform
•  HDP incorporates every element
GOVERNANCE BATCH, INTERACTIVE & REAL-TIME required of an enterprise data
SECURITY OPERATIONS
& INTEGRATION DATA ACCESS platform: data storage, data
access, governance, security,
Data Workflow, Script SQL NoSQL Stream Search In-Memory Others Authentication Provision,
Lifecycle & Hbase ISV Authorization Manage & operations
Governance Pig Hive Accumulo Storm Solr Spark Engines Accounting Monitor
HCatalog Phoenix
Data Protection
Falcon
Sqoop
Tez Tez Slider Slider Tez / Slider
Storage: HDFS
Ambari
Zookeeper
•  All components are developed in
Flume
YARN: Data Operating System Resources: YARN open source and then rigorously
Kafka Access: Hive, …
NFS Pipeline: Falcon Scheduling tested, certified, and delivered as
WebHDFS Cluster: Knox
1 ° ° ° ° ° ° ° ° ° ° Cluster: Ranger Oozie an integrated open source platform
° ° ° ° °
HDFS
° ° ° ° ° ° that’s easy to consume and use by
(Hadoop Distributed File System)
° ° ° ° ° ° ° ° ° ° N the enterprise and ecosystem.

DATA MANAGEMENT

Deployment Choice
Linux Windows On-Premise Cloud

Page 21 © Hortonworks Inc. 2011 – 2015. All Rights Reserved


Technology Partnerships matter

Apache Project Hortonworks


It is not just about
Relationship
Named
Partner
Certified
Solution
Resells
Joint
Engr
packaging and
Microsoft u u u u
certifying software…
HP u u u u
Our joint engineering
SAS u u u
with our partners
SAP u u u u
drives open source
IBM u u u standards for
Pivotal u u u Apache Hadoop
Redhat u u u

Teradata u u u u

Informatica u u u HDP is
Oracle u u Apache Hadoop

Page 22 © Hortonworks Inc. 2011 – 2015. All Rights Reserved


Customer Partnerships matter
Driving our innovation through
Apache Software Foundation Projects
Hortonworkers are the architects and PMC
Apache Project Committers
engineers that lead development of Members
Hadoop 27 21
open source Apache Hadoop at the Yahoo
Pig 5 5
10
ASF Facebook: 5
Hive 18 6
IBM: 2
27 Tez 16 15
•  Expertise LinkedIn: 2
HBase 6 4
Cloudera: 11
Uniquely capable to solve the most complex Others: 23
Phoenix 4 4
issues & ensure success with latest features Accumulo 2 2
Storm 3 2

•  Connection Slider 11 11

Provide customers & partners direct input into Falcon 5 3

the community roadmap Flume 1 1


Sqoop 1 1

•  Partnership Ambari 34 27
Oozie 3 2
We partner with customers with subscription
Zookeeper 2 1
offering. Our success is predicated on yours.
Knox 13 3
Ranger 10 n/a
TOTAL 161 108
Source: Apache Software Foundation. As of 11/7/2014.
Page 23 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Open Source IS the standard for platform technology
Modern platform standards are defined by open communities
Roadmap matches user For Hadoop, the ASF provides guidelines and
requirements not vendor a governance framework and the open
monetization requirements community defines the standards for Hadoop.

Hortonworks Open Source Development Model yields unmatched


efficiency
•  Infinite number of developers under governance of ASF applied to problem
•  End users motivated to contribute to Apache Hadoop as they are consumers
•  IT vendors motivated to align with Apache Hadoop to capture adjacent opportunities

Hortonworks Open Source Business Model de-risks investments


•  Buying behavior changed: enterprise wants support subscription license
•  Vendor needs to earn your business, every year is an election year
•  Equitable balance of power between vendor and consumer
•  IT vendors want platform technologies to be open source to avoid lock-in
Page 24 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Supporting the full application lifecycle
Hadoop usage follows a consistent lifecycle
From architecture to expansion, all with a consistent support experience

Issue Type Most Common Support


Architecture 7%
Issues by Project Phase Project 2
Issues address by Hortonworks
Application Development   10%
Support by type for the past year
Installation   10% # tickets

Performance   5% Project 3

Configuration   25%
.
Executing Jobs   20%
Full Lifecycle Subscription Support .
Cluster Administration   18%
Support through EVERY phase of adoption of .
HDP Upgrades   3% your Hadoop project to ensure your success
Enhancement Requests   3% Project N
TOTAL 100% Architecture &
Development Implementation Production Expansion

Hortonworks Support

Page 25 © Hortonworks Inc. 2011 – 2015. All Rights Reserved


A Leader in Hadoop
“Hortonworks loves and lives
open source innovation”
The Forrester Wave™
Big Data Hadoop Solutions
Q1 2014 World Class Support and Services.
Hortonworks' Customer Support received a
maximum score and was significantly higher
than both Cloudera and MapR

Page 26 © Hortonworks Inc. 2011 – 2015. All Rights Reserved


Hadoop is a Platform Decision
Adoption follows a consistent journey HDP subscription supports entire lifecycle
Data architecture efficiencies, new analytic apps, and World class experience to ensure success from architecture to
ultimately to a “data lake”. production to expansion.

HDP: A completely open data platform HDP: A centralized architecture built on YARN
Platforms are ultimately defined by open communities. Any application, any data, anywhere.

Page 27 © Hortonworks Inc. 2011 – 2015. All Rights Reserved


HDP is deeply integrated in the data center
APPLICATIONS  

BusinessObjects BI

DEV  &  DATA  TOOLS   Deep Partnerships


Hortonworks engages
in deep engineered relationships
with the leaders in the data center,
such as HP, Microsoft, Redhat, SAP,
OPERATIONAL  TOOLS   SAS & Teradata
DATA SYSTEM

HDP 2.2
RDBMS   EDW   MPP  
& Integration

Broad Partnerships
Governance

Data Access

Operations
HANA

Security
INFRASTRUCTURE  
YARN
Over 600 partners work with us to
Data Management
certify their applications to work with
Hadoop so they can extend big data
to their users
SOURCES

EXISTING   Clickstream   Web  &Social   Geoloca3on   Sensor  &   Server  Logs   Unstructured  
Systems   Machine  

Page 28 © Hortonworks Inc. 2011 – 2015. All Rights Reserved


Hortonworks Technology Partnerships
ANALYTIC  TOOLS  &  APPLICATIONS   DEV  &  DATA  TOOLS  

BusinessObjects BI

OPERATIONAL  TOOLS  

DATA
DATA SYSTEMS
SYSTEM INFRASTRUCTURE

RDBMS   EDW   MPP  

       
MicrosoN  Analy3cs  PlaOorm  System  

Page 29 © Hortonworks Inc. 2011 – 2015. All Rights Reserved


Hadoop at Scale

•  Yahoo – 34,000 nodes, 478 PB


•  Facebook – 300PB, 600TB loaded / day
•  eBay – 10,000 nodes, 150 PB
•  Twitter – 3,500 nodes, 50 PB, crunchs 6PB/day
•  Linkedin – 5,000 nodes, 170TB logs (kafka)
•  Spotify – 700 nodes, 15PB of data

Page 30 © Hortonworks Inc. 2011 – 2015. All Rights Reserved


Cautionary Statement Regarding Forward-Looking Statements
This presentation contains forward-looking statements involving risks and uncertainties.
Such forward-looking statements in this presentation generally relate to future events, our
ability to increase the number of support subscription customers, the growth in usage of the
Hadoop framework, our ability to innovate and develop the various open source projects
that will enhance the capabilities of the Hortonworks Data Platform, anticipated customer
benefits and general business outlook. In some cases, you can identify forward-looking
statements because they contain words such as “may,” “will,” “should,” “expects,” “plans,”
“anticipates,” “could,” “intends,” “target,” “projects,” “contemplates,” “believes,” “estimates,”
“predicts,” “potential” or “continue” or similar terms or expressions that concern our
expectations, strategy, plans or intentions. You should not rely upon forward-looking
statements as predictions of future events. We have based the forward-looking statements
contained in this presentation primarily on our current expectations and projections about
future events and trends that we believe may affect our business, financial condition and
prospects. We cannot assure you that the results, events and circumstances reflected in the
forward-looking statements will be achieved or occur, and actual results, events, or
circumstances could differ materially from those described in the forward-looking
statements.

The forward-looking statements made in this prospectus relate only to events as of the date
on which the statements are made and we undertake no obligation to update any of the
information in this presentation.

Trademarks
Page 31 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Hortonworks is a trademark of Hortonworks, Inc. in the United States and other jurisdictions.

You might also like