Welcome to Scribd, the world's digital library. Read, publish, and share books and documents. See more
Download
Standard view
Full view
of .
Look up keyword
Like this
1Activity
0 of .
Results for:
No results containing your search query
P. 1
How Cisco IT Built Big Data Platform to Transform Data Management

How Cisco IT Built Big Data Platform to Transform Data Management

Ratings: (0)|Views: 14|Likes:
Published by Cisco IT
Enterprise Hadoop architecture, built on Cisco UCS Common Platform Architecture (CPA) for Big Data, unlocks hidden business intelligence.
Enterprise Hadoop architecture, built on Cisco UCS Common Platform Architecture (CPA) for Big Data, unlocks hidden business intelligence.

More info:

Categories:Types, Research
Published by: Cisco IT on Aug 15, 2013
Copyright:Traditional Copyright: All rights reserved

Availability:

Read on Scribd mobile: iPhone, iPad and Android.
download as PDF, TXT or read online from Scribd
See more
See less

01/16/2014

pdf

text

original

 
 
© 2013 Cisco and/or its affiliates. All rights reserved. This document is Cisco Public. Page 1 of 10
Cisco IT Case Study – June 2013Big Data Analytics
EXECUTIVE SUMMARY
CHALLENGE
Unlock the business value of large data sets,including structured and unstructuredinformation
Provide service-level agreements (SLAs) for internal customers using big-data analyticsservices
Support multiple internal users on sameplatform
SOLUTION
Implemented enterprise Hadoop platform onCisco Unified Computing System C240 M3Rack Servers, each with 24 TB of localstorage
 Automated job scheduling and processorchestration using Cisco Tidal EnterpriseScheduler as alternative to Oozie
RESULTS
Implemented platform less than nine monthsafter conceiving idea
 Analyzed service sales opportunities in one-tenth the time, at one-tenth the cost 
Identified 235 million new sales opportunitiesthe first day by analyzing 1.5 billion records 
LESSONS LEARNED
Library of Hive and Pig user-defined functions(UDF) increases developer productivity.
Cisco TES simplifies job scheduling andprocess orchestration
Build internal Hadoop skills
Educate internal users about opportunities touse big-data analytics to improve decisionmaking 
NEXT STEPS
 Apply big-data analytics for more Ciscoprograms 
Evaluate NoSQL 
How Cisco IT Built Big-Data Analytics Platform toTransform Data Management
Enterprise Hadoop architecture, built on Common Platform Architecture, unlockshidden business intelligence.
Challenge
 At Cisco, very large datasets about customers, products, and networkactivity represent hidden business intelligence. The same is true of terabytes of unstructured data such as web logs, video, email,documents, and images.These big-data stores are widely distributed. Cisco IT manages 38global data centers comprising 334,000 square feet. Approximately 80percent of applications in newer data centers are virtualized and IT isworking toward a goal of 95 percent virtualization.To unlock the business intelligence hidden in globally distributed bigdata, Cisco IT wanted to use Hadoop, an open-source softwareframework that supports data-intensive, distributed applications.“Hadoop behaves like an affordable supercomputing platform,” saysPiyush Bhargava, a Cisco IT distinguished engineer who focuses on big-data programs. “It moves compute to where the data is stored, whichmitigates the disk I/O bottleneck and provides almost linear scalability.Hadoop would enable us to consolidate the islands of data scatteredthroughout the enterprise.”To offer big-data analytics services to Cisco business teams, Cisco ITfirst needed to design and implement an enterprise platform that couldsupport appropriate service-level agreements (SLAs) for availability andperformance. “Our challenge was adapting the open-source Hadoopplatform for the enterprise,” says Bhargava.Technical requirements for the big-data architecture included:
 
Open-source components
 
Scalability and enterprise-class availability
 
Multitenant support so that multiple Cisco teams could use the same platform at the same time
 
Overcoming disk I/O speed limitations to accelerate performance
 
Integration with IT support processes
 
 
© 2013 Cisco and/or its affiliates. All rights reserved. This document is Cisco Public. Page 2 of 10
Solution
Cisco IT built a Hadoop platform for big-data analytics platform using the Cisco® Common Platform Architecture(CPA), which is based on Cisco Unified Computing System™ (Cisco UCS®), as shown in Figure 1. The Hadoopplatform was ready for production less than nine months after Cisco IT first conceived the idea. Cisco makes theCPA available to other organizations as a reference architecture.
Figure 1.
Cisco Hadoop Platform Physical Architecture
Cisco IT designed Cisco CPA to provide high performance in a multitenant environment, anticipating that internalusers will continually find more use cases for big-data analytics. “CPA provides the capabilities we need to use big-data analytics for business advantage, including high-performance, scalability, and ease of management,” saysJag Kahlon, Cisco IT architect.
Linearly Scalable Hardware with Very Large Onboard Storage Capacity
The Hadoop software framework operates on Cisco UCS C240 M3 Rack Servers, each with 16 cores, 256 GB of RAM, and 24 TB of local storage. Of the 24 TB, Hadoop Distributed File System (HDFS) can use 22 TB, and theremaining 2 TB is available for the operating system.“Cisco UCS C-Series Servers provide a large amount of onboard storage, the biggest factor in Hadoopperformance, far more important than the number of CPUs,” says Virendra Singh, Cisco IT architect. “With onboardstorage, processing can come to the data, avoiding the latency that occurs when data has to move over thenetwork to the processors.”
Cisco Unified Computing System C240 M3
( High-Performance, Multirack Cluster )ScalabilityHigh PerformanceUnifiedManagementOperationalSimplicity
Cisco Nexus2232PP 10 GEFabric Extenders( Per Rack)Cisco UCS 6248UPFabric Interconnects( Per Domain )
High Availability
ZooKeeper andWebServer CLDBJobTracker 
Three nodes apiece for ZooKeeper, CLDB, WebServer, and JobTracker File Server and TaskTracker Run Across All Nodes
 
ZooKeeper andWebServer CLDBJobTracker ZooKeeper andWebServer CLDBJobTracker 
 
 
© 2013 Cisco and/or its affiliates. All rights reserved. This document is Cisco Public. Page 3 of 10
The current architecture comprises four racks, each containing 18 server nodes. This configuration provides 432TB storage per rack, of which 396 TB is available to the application. “Each cluster can scale to 160 nodes withpetabytes of capacity, and we can increase capacity by adding more clusters,” says Kahlon
.
 
Low-Latency, Lossless Network Connectivity
 A redundant pair of Cisco UCS 6200 Series Fabric Interconnects connects all servers in the cluster to the accessnetwork and to external data sources. These fabric interconnects provide line-rate, low-latency, lossless 10 GigabitEthernet connectivity between all servers in the system. An active-active configuration provides high performancefor up to 160 servers in a single switching domain, enabling Cisco IT to scale the cluster without adding newswitching components or redesigning connectivity.Each rack connects to the fabric interconnects through a redundant pair of Cisco Nexus® 2232PP FabricExtenders, which behave like remote line cards.
Simple Management
Cisco IT server administrators manage all elements of the Cisco UCS (servers, storage access, networking, andvirtualization) from a single interface, Cisco UCS Manager. “Cisco UCS Manager simplifies management of theHadoop platform because we can manage all servers, fabric interconnects, and fabric extenders as a single entity,”Kahlon says. “It’s as easy to manage 1000 servers as one server, which will help us support more big-dataanalytics programs without increasing staffing.” Using Cisco UCS Manager service profiles saved time and effortfor server administrators by making it unnecessary to manually configure each server. Service profiles alsoeliminated configuration errors that could cause downtime.
Open, Enterprise-Class Hadoop Distribution
Cisco IT usesMapR Distributionfor Apache Hadoop, which speeds up MapReduce jobs with an optimized shufflealgorithm, direct access to the disk, built-in compression, and code written in advanced C++ rather than Java.
“Cisco UCS C-Series Servers provide a large amount of onboardstorage the biggest factor in Hadoop performance, far more importantthan the number of CPUs.
Virendra Singh, Cisco IT Architect
“Cisco UCS Manager simplifies management of the Hadoop platformbecause we can manage all servers, fabric interconnects, and fabricextenders as a single entity. It’s as easy to manage 1000 servers asone server, which will help us support more big-data analyticsprograms without increasing staffing.”
 
Jag Kahlon, Cisco IT Architect

You're Reading a Free Preview

Download
scribd
/*********** DO NOT ALTER ANYTHING BELOW THIS LINE ! ************/ var s_code=s.t();if(s_code)document.write(s_code)//-->