Scaling Map-Reduce in New Hadoop Release

Scaling Hadoop Map-Reduce in version 0.
23
As businesses are getting more and more technology dependent to gain competitive advantage, their systems are generating lots of feedback, usage data. While this data can unlock hugely valued business critical information and help fine-tune services, this data is lying unused in backend system on disks and tapes. This is mainly because most of this data is unstructured, vast and there are not many well-established cost effective alternatives that help analyzing this information. This is where Hadoop comes to rescue. As Hadoop deployments are getting larger, its main components HDFS and Map-Reduce are confronted with scalability concerns. Hadoop team is working on addressing some of these aspects in release 0.23, planned for release in early 2012. One of the major change as a part of this effort is rearchitecture of Hadoop Map-Reduce infrastructure with focus on debottlenecking job tracker, one of the major component of Map-Reduce Infrastructure. This blog compares new Map-Reduce architecture with existing one and highlights how it addresses associated scale issues. Image below shows 4 main component of map reduce infrastructure Client creates a job, splits the data that needs Map-Reduce processing into smaller chunks, submit it to job tracker for execution and probe job tracker for job status. Task runner Task runner
Task Tracker
Client
Job Tracker
Task Tracker
Task runner Task runner
Job Tracker (JT) Mainly responsible for -
Cluster Resource management Keeps track of available task tracker in the cluster and each task trackers capacity to execute concurrent map and reduce tasks Job management Complete management of job state and its execution, including creating and scheduling of map/reduce task on task tracker nodes, guiding task tracker through various state transition associated with a task execution, tracking tasks progress and running speculative tasks (speculative tasks are run in lieu of slow running tasks). Monitoring the health of task trackers and recovery of failed task/jobs when a task tracker fails in the middle of tasks execution Maintain job progress, history and diagnostics information.
Task Tracker (TT) Responsible for initiating map/reduce task execution on a cluster node. It does so by spawning a new process for task execution, called task runner Task Runner Executes actual map/reduce tasks and reports progress to Task Tracker, which intern report it to Job Tracker and Job Tracker guides various state transition related to task management.
There can only be one instance of Job tracker in Hadoop cluster, which make it a single point of scale bottleneck and failure. If a job tracker fails, all information related to currently running job and tasks is lost and all jobs need to be executed again on recovery of Job tracker. A large Hadoop cluster could have 1000s of Task Tracker, and mostly limited by the capacity of how many task trackers can be managed by Job Tracker instance. As per some of industry benchmarks available, an instance of Job tracker is able to manage at max 4000 Task trackers at the moment and new Hadoop version aims to scale this to 10s of thousand of nodes. Let us see how new version aims to achieve this scale. Let us get introduced to new roles in new architecture Significant chunk of processing and memory intensive activity of Job tracker and task tracker is moved into another entity named Application Master (AM). Job tracker mainly left responsible for resource management and allocating resources when demanded by AM. Resource management mainly includes o Tracking nodes (NM see below for definition of NM) that are available for task executions o Keeping track of available resource (at the moment available memory for task execution on that node) on these nodes. The component owning this reduced functionality of Job Tracker is name as Resource Manager (RM) in new architecture.
Application Master (AM) is responsible for initiating and tracking the map/reduce task execution. For every map/reduce task that needs to be executed, AM asks RM to allocate resources on any for the available task execution nodes (NM). While asking for resource allocation for a task, AM specifies the resource requirement for that task. For every Job there is one and only one AM in cluster. Task tracker is mainly left responsible for allocation and deallocation of Task Runners. Task Runner is referred generically as Container in new architecture. This reduced functionality of Task tracker is named as Node Manager (NM).
Following image segregates different roles of Job Tracker and Task Tracker in upper half of the image and how those roles are mapped to components of new architecture in lower half of the image Allocates and deallocated Task Runner
based on the task scheduling request from JT
Tracks available TT and their task execution capacity. Assigns tasks to a TT based on available capacity Maintains the progress and status of all the running jobs, associated tasks capacity, interacts with TT for task scheduling and task status progress and state transition. This function consumes majority of JT CPU and memory bandwidth Coordinates and tracks the execution of tasks and its progress with Task Runner and feeds the status back into JT Executes map/reduce tasks
Current Map Reduce Components
Client
Job Tracker
Task Tracker
Task runner
New Map Reduce Components Container
Client
RM
NM AM
Following image explains in detail how various components in new architecture interact with each other to execute a Map-Reduce job.
5. Client interact with AM

to track the job status
4. NM launches 3. RM selects one of available NM to

host AM that will control the job execution. There will only be one instance of AM per job in cluster new JVM to run AM on RM req request
AM
1. NM registers its
presence with RM and regularly heartbeats RM with its status
6. AM requests RM to Client 2. Client splits the job

data into small chunks and submits the job to RM allocate containers to run map and reduce tasks.
10. Once container is RM 7. RM responds with list of NM

URLs that can host map/reduce tasks containers launched, AM and Container coordinate Map-Reduce task execution among themselves
8. AM coordinates with 1. NM registers its

presence with RM and regularly heartbeats RM with its status NM to launch the container on NM host machine
NM 9. NM
launches Container as per AM specification
Container
(map/Red Task Runner)
1. As new NM instances are provisioned in the cluster; they register their present with RM along with the available resources (memory) on their host machine. Once registered, NM periodically heartbeats RM with its status and status of container running on its host. If a NM fails to heartbeat within expected time window, RM considers it failed and starts the recovery procedures. 2. Map Reduce client defines the job execution environment (AM executable, which in this case is Map-Reduce executable, AM and map-reduce tasks resource requirements, data that needs to Map-Reduce analyzed), split the data into smaller independently analyzable chunks (typically each chunk size is equal to the size of HDFS block) and submits the job to RM for execution. 3. On receiving a job submission, RM selected one of the registered NM to host AM for that job instance. There can only be one AM instance per job in the cluster. Once initiated, AM is completely responsible for Map-Reduce execution for that job. 4. When instructed by RM, NM launches a java process to run AM executable. 5. Reference to AM process is passed back to map-reduce client, which uses AM interface to administrate and track/inquire the status of map-reduce job. 6. Each data split is Mapped by a separate Map task. For each Map & Reduce task, AM asks RM to allocate a container (JVM that will execute the Map/Reduce task) in the cluster. 7. RM selects the appropriate NM hosts for container allocation based on the resource availability at NM and the Map-Reduce tasks resource requirements. This list of selected NM hosts is returned to AM. 8. AM coordinates with NM for launching container and passes on details of MapReduce executables and their execution environment to NM 9. NM launches a new Map-Reduce task execution container based on AM specification. 10. Once Container is launched, AM and Container directly interact with each other for task execution, status tracking, recovery from failed/slow tasks etc. With reduced responsibility, the load on RM (erstwhile Job Tracker) is significantly lesser; hence it is able to support much larger cluster of NMs and concurrent jobs. Notice, AM is generic entity, which is not restricted to run only Map-Reduce tasks. Interfaces are generic enough to have other style of data processing e.g. Master-Worker. In new architecture Map-Reduce becomes a user land library that is executed by AM. There are many other changes in release 0.23. Some of those are HDFS federation (aims at scaling HSFS NameNode and reduce the impact of NameNode failure), improved resource utilization in HDFS (e.g. connection pooling with Data Nodes), Job recovery if RM/AM fails in the middle, improved Map-Reduce performance, Node local MapReduce tasks for smaller sized jobs. I will be covering them in detail in other blogs.

Scaling Map-Reduce in New Hadoop Release

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Scaling Map-Reduce in New Hadoop Release

Uploaded by

Copyright:

Available Formats

Scaling Hadoop Map-Reduce in version 0.

Task runner Task runner

Job Tracker (JT) Mainly responsible for -

Current Map Reduce Components

New Map Reduce Components Container

5. Client interact with AM

4. NM launches 3. RM selects one of available NM to

6. AM requests RM to Client 2. Client splits the job

10. Once container is RM 7. RM responds with list of NM

8. AM coordinates with 1. NM registers its

You might also like