You are on page 1of 1

Data Warehousing

Leveraging Hadoop to Save Millions in a


Data Warehouse Environment

Background
The data warehouse team at a Fortune 100 company analyzes large sets of customer data obtained from various sources such as billing
Solution Brief

and order management systems and customer-satisfaction surveys. The data warehouse architecture involves ingesting all of the data
into a Teradata system, performing the various cleanup and transformation operations, and then loading query-ready data into master-
tables. Hundreds of users and applications across the business then access the master-tables for various analytics and growth initiatives.

Business Challenge
As its customer base and associated data sources continue to grow, the company now collects larger volumes of data. The increased
storage of data however cannot be addressed by larger Teradata servers because of the higher costs involved. Also, moving away
from Teradata is also not an option because of its tight integration with end users and applications. The company turned to Hadoop
for a cost effective storage solution that would also integrate well with the existing Teradata ETL workflows.

Why MapR?
The primary goal of obtaining cheaper storage is easily solved by Hadoop, so any distribution could meet that need. However, the
company chose the MapR Distribution for Hadoop for two reasons beyond the cheaper storage benefit. First, Direct-Access NFS that
enables direct data flow into and out of Hadoop allowing easy integration of Hadoop with the existing ETL workflows. Second,
MapR’s enterprise grade features such as self-healing High Availability and Snapshots that make Hadoop reliable and failure-tolerant.

results
The MapR Distribution for Hadoop replaced five out of the seven steps involved in the existing ETL process saving millions of dollars
per year, providing a more flexible platform for advanced analytics and supporting mission critical applications.

The data warehouse team modified the architecture so that data is first ingested into Hadoop instead of Teradata. They now store all
data in Hadoop and discard nothing unlike earlier operations. The team then mimics the ETL steps over Hadoop to provide existing
functionality. As an added advantage, data processing capabilities have exponentially increased for the team with the addition of
MapReduce functionality over Hadoop.

After the ETL processing is done, master-table data is then directly ingested into Teradata leveraging MapR’s Direct Access NFS. This
direct ingestion eliminates the costs associated with transferring data between Hadoop and Teradata – a capability available only
with MapR. The presentation layer to end-users has remained the same through all these changes allowing existing applications to
function seamlessly.

The data warehouse team is pleased with the simplicity and cost effectiveness of the solution. They are also excited about providing
newer and better services to their internal customers by leveraging MapReduce functionality over Hadoop.

For more information, © 2013 MapR Technologies. All rights reserved. Apache Hadoop and Hadoop are trademarks of the Apache Software Foundation
please visit www.mapr.com. and not affiliated with MapR Technologies.

You might also like