You are on page 1of 8

WHITE PAPER

Cloudera Enterprise:
Evaluating the True Value
of a Modern Data Platform

Version: 102

Cloudera Enterprise: Evaluating the True Value of a Modern


Data Platform
Data has the potential to be your most important asset. The right modern data strategy can open
up limitless possibilities for your business, allowing you to drive new insights, improve operating
efficiency, and even lower overall business risk. However, simply having the data isnt enough.
Your business needs to actually turn this data into real value. And that requires a modern data
platform which can support not only todays strategy, but evolve into tomorrows.
Apache Hadoop is at the core of this. Hadoop is a new type of data platformone place to store,
process, and analyze unlimited amounts of data of any type, all with unprecedented flexibility, scale,
and cost-savings. This concept has proved to be a drastic shift in what data could be unlocked to
the business. Suddenly, new unstructured data sources can be tapped and paired with structured
data, and all data can be kept online for as long as needed. While Hadoop may have removed the
barriers to data, that alone doesnt ensure value. Thats where you need a modern data platform,
such as Cloudera Enterprise to have data at the center of your architecture and enable an enterprise
data hub.
An enterprise data hub serves as a flexible repository to collect and keep unlimited data, whether
for compliance purposes or for sophisticated applications such as real-time anomaly detection. It
speeds up business intelligence reporting and analytics to deliver markedly better throughput on
key service-level agreements. And it increases the availability and accessibility of data for the activities
that support business growth, providing a full view of your operations to enable process innovation.
Most importantly, centralizing and enabling analytics across all your data unlocks new data-driven
business opportunities that were previously too expensive or complex for most enterprises. An
enterprise data hub delivers advanced capabilitiessuch as synchronous customer models based
on social networks and offline behaviors, truly real-time analysis of streaming data in motion,
proactive security against fraud and cyberattackswith a unified, flexible, scalable platform
thats quick to implement and easy to grow with your business, all at lower cost per terabyte
than ever before possible.
Powered by the worlds most popular Hadoop distribution, Cloudera Enterprise provides a fast, easy,
and secure platform that supports this breadth of capabilities across the business for a complete
enterprise data hub solution.

Fast
Data is the new normal. We have entered an age where we can measure anything and everything.
This is pervasive across industries and your competitors. The competitive advantage of data comes
from not only the insights we can gain from it, but also how quickly we can do so. You need to not
only understand what happened, but have the ability to understand why and how to change what
happens in the futureall in near real-time. Only a modern data platform such as Cloudera Enterprise
can support this new analytics paradigm, with the fastest time-to-insights.
This all starts with the ability to handle both data at-rest and data in-motion. With the rise of the
Internet of Things and sensor data, data is being generated and collected faster than ever before
and the ability to tap into the value of this data is critical. Your platform needs to support not only
the ability to ingest and process this streaming data in real-time, but also make it available for
analytics and data applications for immediate business value.
Once the data is available in the platform, insights cannot then be limited to a select few or siloed
off for different departments. From processing to serving, and all the analytics in between, a modern
data platform needs to support the full cycle of insights, all within a single enterprise data hub to
deliver the fastest time-to-value.
CLOUDERA ENTERPRISE:
EVALUATING THE TRUE VALUE
OF A MODERN DATA PLATFORM
WHITE PAPER
2

For the data engineers, brittle ETL pipelines and missed SLAs can become a thing of the past.
Cloudera Enterprise is designed to handle large-scale, batch processing workloads over flexible
data types. This means workloads will run orders of magnitude fastercutting days down to
minutesand scale to support more data sources and outputs, so data is always available for
reporting or other workload needs right when your business needs it.

British Telecom Increased Data Velocity by 15x


With 18 billion (about US$30 billion) in revenue in 2014, BT is one of the largest
telecommunications providers in the world. The company serves more than 18 million
consumers and nearly three million businesses. In its legacy environment, business
client records were spread across multiple databases. They needed to be reconciled
and updated daily with Dun & Bradstreet data in order to provide business units with
the most relevant and up-to-date information. With nearly one billion records being
compared and reconciled daily, BTs legacy ETL platformbuilt on a traditional relational
databasecouldnt keep up any more. At any given point in time, its business units
were working with day-old data. It wasnt processing all the data we wanted to process,
and it was taking more than 24 hours to process 24 hours of the data, said Phillip
Radley, chief data architect at BT.
Moving its ETL platform to Hadoop enabled BT to accelerate data velocity. We were
able to increase data velocity by a factor of 15, said Radley. Were processing five
times the data in a third of the time. The business sponsors dont know that we moved
to Hadoop and they dont care. All they know is that theyre now working with todays
data instead of yesterdays.
Business intelligence and analytics continue to be the lifeblood of the business. But too often,
these critical insights are too little, too late. Today, business users are stuck running canned
reports on limited data or funnel through a lengthy IT bottleneck to answer new questions,
request additional data, and wait. This slow chain of events means no one is happyleaving
IT constantly backlogged and overworked, and the business frustrated with the lack of agility.
Cloudera Enterprise makes self-service BI and analytics a reality by removing the restrictions
on business agilityallowing IT to empower the business instead of encumbering it. Whether
they prefer SQL, or any of the leading BI tools, these analysts can explore data freely and faster
than ever before with Apache Impala (incubating) supporting high-performance analytic
queries, even for highly concurrent user access.

Spotlight on Apache Impala (incubating)


Apache Impala (incubating) is the leading analytic SQL engine running natively in Hadoop
combining the power of a modern analytic database with the flexibility and scale of Hadoop.
Open up big data to all business analysts and SQL developers with the interactive performance
required for self-service BI and exploratory analytics, even during times of high user
concurrency, and compatibility with SQL and all the leading BI tools.
Impala provides analysts with near-Netezza speeds, but on the Hadoop cluster. Analysts
benefit from having access to all [datasets], using the SQL skills and tools theyre already
used to, and getting the same speed as theyre used to, said Jon Gregg, senior analytics
engineer, Cox Automotive.

CLOUDERA ENTERPRISE:
EVALUATING THE TRUE VALUE
OF A MODERN DATA PLATFORM
WHITE PAPER
3

For data scientists, they have often been separated from the rest of the business; forced to work
with small data samples as they train and test models, with no clean path to pass them off for
production scoring and serving, and a latent feedback cycle. Cloudera Enterprise opens up the
power of big data to these users, while allowing them to work with their preferred tools and libraries.
Data scientists can now have direct access to data in its entirety, and the best-of-breed processing
tool, Apache Spark, for faster model development. Integrations with popular machine learning
libraries and preferred languages such as Python and R means these users can be productive
out of the gate. Finally, as part of a single, unified platform that supports multiple applications,
these users can cleanly pass their models and applications to production for immediate results.

Spotlight on Apache Spark


Apache Spark is the open standard for flexible, in-memory data processing that enables
batch, streaming, and advanced modeling and analytics on the Hadoop platform. While
Spark is primed to replace MapReduce as the standard data processing engine in Hadoop,
due to its easy development and faster processing speeds, its flexible API extends the
capabilities of Spark well beyond batch processing (with Spark Streaming for stream
processing and MLlib for machine learning being just a few examples).
Using machine-learning algorithms in Spark, Cisco WebExs modern data platform can
identify new fraud patterns as they evolve and automatically create the new rule sets to
keep up with fraudsters evolving tactics. Previously, the organization had to manually
code rules based on detected fraud patterns. When we compared both approaches, the
machine-learned rules did much better, helping us detect up to 17 times more fraud based
on historic data, said Joe Hsy, director, Development and Engineering, Cisco WebEx.
The last step in this is to operationalize your findings, whether to create new end-user applications
driven by this data, or to act in real-time based on insights. Supported by Apache HBase and Apache
Kudu, Cloudera Enterprise provides a simple architecture to let your business respond in real-time
based on behavior or immediate updates. Serve out product recommendations based on real-time
purchasing or detect threats before they happen through operationalized predictive modeling.

Spotlight on Apache HBase and Apache Kudu


Apache HBase is a high-performance, distributed datastore that provides flexible, integrated
data storage and real-time data access. This is ideal for use cases in which its important
to quickly find and write to individual rows, a common requirement for operational databases.
In addition to fast random access, some use cases require fast scans across all data for
analytic purposes. Apache Kudu provides high throughput for large scans across updating
data for real-time insights. While previously available only through hybrid architectures,
Kudu can now serve as a consolidated storage layer for use cases requiring a simultaneous
combination of sequential and random reads and writes, such as for time series workloads,
machine data analytics, and online reporting.
HBase is the means by which we are able to deliver our insights at scale, in real time,
to any user, says Eric Chang, technical lead for data services, Opower.
As a single, unified platform, Cloudera Enterprise provides the best-of-breed technologies to support
the entire businessfrom data engineering and data science to analytic workloads to operationalizing
all over the same shared data for maximum efficiency and the fastest time-to-insights.

Data Engineering
& Science

Analytic
Database

Process data, develop &


serve predictive models

Operational
Database

ELT, reporting, exploratory


business intelligence

WHITE PAPER
4

UNIFIED SERVICES
STORE

DATA
MANAGEMENT

CLOUDERA ENTERPRISE:
EVALUATING THE TRUE VALUE
OF A MODERN DATA PLATFORM

OPERATIONS

PROCESS, ANALYZE, SERVE

INTEGRATE

Cloudera Enterprise

Build data-driven applications


to deliver real-time insights

Cloudera Manager made it so easy to


manage our clusters, was extremely
user-friendly, and dramatically reduced
our [Hadoop] learning curve.
Kerry Shaffer, IT director at Magnify

Easy
Especially at scale, a modern data platform must have easy administration to keep mission critical
applications up and running. Only Cloudera Enterprise provides the Operations Team with what
they need to focus on: new applications and results, not fighting fires. Supporting the largest scale
deployments and applications, Cloudera Manager is the most trusted tool for managing Hadoop in
production. Automated deployments and configurations let you get up and running quickly, and
fully customizable monitoring gives you the visibility and control you need to keep it running.
Whether you need to efficiently troubleshoot an issue, ensure optimal, multi-tenant performance,
or upgrade without downtime, Cloudera Manager is a single interface to manage it all with ease.
A direct connection to Clouderas expert support is also built in to Cloudera Manager. Using their
own modern data platform, Cloudera Support can quickly analyze your diagnostic information
against known issues, best practices, similar deployments, and more, to not only resolve issues
35% faster but also provide proactive guidance and protectionpreventing over 15% of issues
before they actually become issues.
For most enterprises, its only a matter of time before they have a footprint in the public cloud, if
not already. In fact, a recent study by Gartner found that the average enterprise is using 4.6 public
cloud providers. Its critical that a modern data platform can be deployed anywhere, so the business
can get value from all its data, whether its on-premises, in one or many cloud environment(s), or
all of the above. Cloudera Enterprise is the only hybrid platform that allows you to take advantage
of the scalability and flexibility of the cloud, while still getting the same high-performance,
enterprise-grade platform.
Using Cloudera Director, you can deploy how you want, when you want, and manage multiple
clusters across cloud providers from a single, unified interface. Additionally, Cloudera Director
makes it easy to reduce your overall operating costs, whether you want to orchestrate transient
workloads for efficient ETL and batch analytics, or support elastic demand for analytics and
reporting. Finally, by featuring native integration with cloud object stores, such as Amazon S3,
you can start getting value from your data immediately, no matter where it lives.

GoPros Modern Data Platform in the Cloud


GoPro, the technology company that develops action cameras and video editing software,
needed to understand user interaction across the ecosystem to guide research and development
spend, as well as better tailor its marketing and predict revenue. The key for their Data Science
and Engineering (DS&E) team was to figure out how to take this data in, make sense of it,
and report their findings to executives. Since this data was already generated in the cloud,
they wanted to manage and process it where it already lives. Thats why they built their
Hadoop-based data platform using Cloudera Enterprise running on Amazon Web Services
(AWS). AWS provides the necessary speed and flexibility, while Cloudera Enterprise provides
the best-of-breed technologies needed for results. With a single platform, they are now
able to do real-time streaming and processing of product logs, web analytics, channel
data, and ERP, while also enabling high-performance reporting over all their data for new
insights and intelligent feature development.

CLOUDERA ENTERPRISE:
EVALUATING THE TRUE VALUE
OF A MODERN DATA PLATFORM
WHITE PAPER
5

Secure and Governed


All the benefits of a modern data platformmore data of more types, unfettered access for all
users, real-time actions and updatesalso make it difficult to secure. However, that doesnt mean
you can compromise and put your data at risk. Only Cloudera Enterprise balances even the most
stringent security needs with the ability to get business agility and value. With security built into
the core across multiple layers, and dedicated security expertise and innovation, Cloudera Enterprise
provides comprehensive, compliance-ready security and governance.

Perimeter

Access

Visibility

Data

Guarding access to
the cluster itself

Defining what users


and applications can
do with data

Reporting on where
data came from and
how its being used

Protecting data in
the cluster from
unauthorized visibility

Technical Concepts

Technical Concepts

Technical Concepts
Auditing
Lineage

Encryption, Tokenization,
Data Masking

CLOUDERA MANAGER

APACHE SENTRY &


RECORD SERVICE

CLOUDERA NAVIGATOR

NAVIGATOR ENCRYPT &


KEY TRUSTEE | PARTNERS

Authentication
Network Isolation

Permissions
Authorization

Technical Concepts

Clouderas platform ensures you have everything you need to protect your data and your customers
optimized and automated for Hadoop scale. Even your most sensitive data can be used for analytics
with native, high-performance encryption that protects everything in your platform, without impacting
the time-to-insights. Paired with the only enterprise-grade key manager for Hadoop, you can rest
assured that your data and keys are protected.
Additionally, you can safely open up access to all users with uniformly enforced, role-based access
controls. No matter which platform tools they are using, they will get fine-grained access to the
data they need to do their job, without the manual burden on the Security Team.
Finally, no modern data platform is complete without integrated data management and governance.
Not only is governance a critical aspect for any compliance audit, but it provides necessary visibility
and controls to make sure your platform and data are actually usable to the business. From a security
perspective, you automatically get full audit and lineage information to understand who is accessing
what, and how data is changing. When paired with metadata discovery and policy management,
this also allows data stewards to curate data for the business based on usage and enable new
insights on new, trusted data.
Cloudera has led the way when it comes to security in Hadoop. In fact, Cloudera Enterprise is the
only Hadoop distribution to have passed compliance audits with our most regulated financial services,
healthcare, and retail customers.

MasterCard Creates New Revenue Streams with an Advanced Anti-fraud Solution


MasterCard operates the worlds fastest payments processing network, delivering the products
and services that make everyday commerce activities easier, more secure, and more efficient.
To help financial institutions better evaluate a merchants credit risk, MasterCard created
an anti-fraud solution called MATCH (MasterCard Alert to Control High-risk Merchants).
The MATCH database maintains data on hundreds of millions of fraudulent businesses.
MasterCard acquirers submit nearly one million inquiries to the database each month.
Replatforming to Cloudera Enterprise helped MasterCard accelerate searches, incorporate
new data as industry trends and opportunities emerge, and expand its user base. With
improved platform scalability, performance, and accuracy, MasterCard can now also offer
its solution to new markets and build new revenue streams.
CLOUDERA ENTERPRISE:
EVALUATING THE TRUE VALUE
OF A MODERN DATA PLATFORM
WHITE PAPER
6

As this solution not only dealt with sensitive data, but also had to integrate with other
regulated internal systems, MasterCard had to ensure that the platform could meet its
high security standards and comply with PCI DSS (Payment Card Industry Data Security
Standards). With Clouderas full security stack and industry expertise, they are able to
pass this compliance audit year after year.

Future-Proof and Open


If there is one constant, its that things change. Within your business, your priorities will shift,
business demand for both access and scale will grow, and new use cases and possibilities will
emerge. You cant afford to be replatforming every few years just to keep up. A modern data
platform is designed to evolve with your business needsletting you define whats next, versus
trying to just keep up.
Built on a foundation of open standards, Cloudera Enterprise is the modern data platform that
lets you take advantage of the lock-in free world of open source technology, while ensuring you
get the best-of-breed technologies that meet your enterprise requirements. With the most open
source experience, Cloudera not only creates the leading open source technologies to meet
ever-changing business needsincluding Apache Impala (incubating) for high-performance SQL,
Apache Kudu for real-time analytics, and Apache Sentry for role-based access controlsbut also
curates the broader open source ecosystem to discover and develop additional cutting-edge
technologies, such as being the first platform to ship and support Apache Spark for data processing.
All technologies within the platform meet the open standard guarantee of an active development
community for continued advancements, partner integrations and certifications for continued
business productivity, and multi-vendor support for portability. And, as part of Cloudera Enterprise,
we bring them all together within a single platform through rigorous quality assurance testing,
so everything simply works for you.
This open source foundation, rigorous testing, and modular platform design also ensures a futureproof platform. Cloudera Enterprise is not only improving the capabilities and experience of use
cases today (such as data engineering, analytic database, and operational database workloads),
but also constantly extending the storage engines and access frameworks to better support new
use cases (such as for data science and real-time serving workloads), all within a single, multi-tenant
platform. Additionally, with a deep development partnership with Intel, Clouderas platform not
only runs on todays hardware landscape, but will also take advantage of next-generation
hardware design.
PROCESS, ANALYZE, SERVE
BATCH

STREAM

SQL

SEARCH

OTHER

UNIFIED SERVICES
RESOURCE MANAGEMENT

SECURITY

DATA
MANAGEMENT

OPERATIONS
FILESYSTEM RELATIONAL

NoSQL

OTHER

STORE
STRUCTURED

UNSTRUCTURED

INTEGRATE
Hybrid Development
Flexibility

Public Cloud
Private Cloud
Hybrid Environments

Conclusion

CLOUDERA ENTERPRISE:
EVALUATING THE TRUE VALUE
OF A MODERN DATA PLATFORM
WHITE PAPER
7

The right technology is key for removing the barriers to your data and turning it into business
value. You need a modern data platform built to handle any data, wherever it lives, while scaling
analytics and data science to the masses. Powered by Hadoop, Cloudera Enterprise is the fastest,
easiest, and most secure modern data platform that leading organizations trust to get the results
that drive their business. Contact us to get started.

About Cloudera
Cloudera delivers the modern platform for data management and analytics. Public sector
organizations trust Cloudera to help them apply data to the center of their missions with
Cloudera Enterprisethe fastest, easiest, and most secure platform built on Apache
Hadoop and the latest open source technologies. Agencies can efficiently capture, store,
process, and analyze vast amounts of dataempowering them to use advanced analytics
to drive business decisions quickly, flexibly, and at lower cost than has been possible
before. Focused on customer success, Cloudera offers comprehensive support, training,
and professional services. Learn more at cloudera.com.

cloudera.com
1-888-789-1488 or 1-650-362-0488
Cloudera, Inc. 1001 Page Mill Road, Palo Alto, CA 94304, USA
2016 Cloudera, Inc. All rights reserved. Cloudera and the Cloudera logo are trademarks or registered trademarks of Cloudera Inc. in the USA and
other countries. All other trademarks are the property of their respective companies. Information is subject to change without notice.

You might also like