You are on page 1of 5

Scaling Hadoop Map-Reduce in version 0.

23
As businesses are getting more and more technology dependent to gain competitive advantage, their systems are generating lots of feedback, usage data. While this data can unlock hugely valued business critical information and help fine-tune services, this data is lying unused in backend system on disks and tapes. This is mainly because most of this data is unstructured, vast and there are not many well-established cost effective alternatives that help analyzing this information. This is where Hadoop comes to rescue. As Hadoop deployments are getting larger, its main components HDFS and Map-Reduce are confronted with scalability concerns. Hadoop team is working on addressing some of these aspects in release 0.23, planned for release in early 2012. One of the major change as a part of this effort is rearchitecture of Hadoop Map-Reduce infrastructure with focus on debottlenecking job tracker, one of the major component of Map-Reduce Infrastructure. This blog compares new Map-Reduce architecture with existing one and highlights how it addresses associated scale issues. Image below shows 4 main component of map reduce infrastructure Client creates a job, splits the data that needs Map-Reduce processing into smaller chunks, submit it to job tracker for execution and probe job tracker for job status. Task runner Task runner

Task Tracker

Client

Job Tracker

Task Tracker

Task runner Task runner

Job Tracker (JT) Mainly responsible for -

Cluster Resource management Keeps track of available task tracker in the cluster and each task trackers capacity to execute concurrent map and reduce tasks Job management Complete management of job state and its execution, including creating and scheduling of map/reduce task on task tracker nodes, guiding task tracker through various state transition associated with a task execution, tracking tasks progress and running speculative tasks (speculative tasks are run in lieu of slow running tasks). Monitoring the health of task trackers and recovery of failed task/jobs when a task tracker fails in the middle of tasks execution Maintain job progress, history and diagnostics information.

Task Tracker (TT) Responsible for initiating map/reduce task execution on a cluster node. It does so by spawning a new process for task execution, called task runner Task Runner Executes actual map/reduce tasks and reports progress to Task Tracker, which intern report it to Job Tracker and Job Tracker guides various state transition related to task management.

There can only be one instance of Job tracker in Hadoop cluster, which make it a single point of scale bottleneck and failure. If a job tracker fails, all information related to currently running job and tasks is lost and all jobs need to be executed again on recovery of Job tracker. A large Hadoop cluster could have 1000s of Task Tracker, and mostly limited by the capacity of how many task trackers can be managed by Job Tracker instance. As per some of industry benchmarks available, an instance of Job tracker is able to manage at max 4000 Task trackers at the moment and new Hadoop version aims to scale this to 10s of thousand of nodes. Let us see how new version aims to achieve this scale. Let us get introduced to new roles in new architecture Significant chunk of processing and memory intensive activity of Job tracker and task tracker is moved into another entity named Application Master (AM). Job tracker mainly left responsible for resource management and allocating resources when demanded by AM. Resource management mainly includes o Tracking nodes (NM see below for definition of NM) that are available for task executions o Keeping track of available resource (at the moment available memory for task execution on that node) on these nodes. The component owning this reduced functionality of Job Tracker is name as Resource Manager (RM) in new architecture.

Application Master (AM) is responsible for initiating and tracking the map/reduce task execution. For every map/reduce task that needs to be executed, AM asks RM to allocate resources on any for the available task execution nodes (NM). While asking for resource allocation for a task, AM specifies the resource requirement for that task. For every Job there is one and only one AM in cluster. Task tracker is mainly left responsible for allocation and deallocation of Task Runners. Task Runner is referred generically as Container in new architecture. This reduced functionality of Task tracker is named as Node Manager (NM).

Following image segregates different roles of Job Tracker and Task Tracker in upper half of the image and how those roles are mapped to components of new architecture in lower half of the image Allocates and deallocated Task Runner
based on the task scheduling request from JT

Tracks available TT and their task execution capacity. Assigns tasks to a TT based on available capacity Maintains the progress and status of all the running jobs, associated tasks capacity, interacts with TT for task scheduling and task status progress and state transition. This function consumes majority of JT CPU and memory bandwidth Coordinates and tracks the execution of tasks and its progress with Task Runner and feeds the status back into JT Executes map/reduce tasks

Current Map Reduce Components

Client

Job Tracker

Task Tracker

Task runner

New Map Reduce Components Container

Client

RM

NM AM

Following image explains in detail how various components in new architecture interact with each other to execute a Map-Reduce job.

5. Client interact with AM


to track the job status

4. NM launches 3. RM selects one of available NM to


host AM that will control the job execution. There will only be one instance of AM per job in cluster new JVM to run AM on RM req request

AM

1. NM registers its
presence with RM and regularly heartbeats RM with its status

6. AM requests RM to Client 2. Client splits the job


data into small chunks and submits the job to RM allocate containers to run map and reduce tasks.

10. Once container is RM 7. RM responds with list of NM


URLs that can host map/reduce tasks containers launched, AM and Container coordinate Map-Reduce task execution among themselves

8. AM coordinates with 1. NM registers its


presence with RM and regularly heartbeats RM with its status NM to launch the container on NM host machine

NM 9. NM
launches Container as per AM specification

Container
(map/Red Task Runner)

1. As new NM instances are provisioned in the cluster; they register their present with RM along with the available resources (memory) on their host machine. Once registered, NM periodically heartbeats RM with its status and status of container running on its host. If a NM fails to heartbeat within expected time window, RM considers it failed and starts the recovery procedures. 2. Map Reduce client defines the job execution environment (AM executable, which in this case is Map-Reduce executable, AM and map-reduce tasks resource requirements, data that needs to Map-Reduce analyzed), split the data into smaller independently analyzable chunks (typically each chunk size is equal to the size of HDFS block) and submits the job to RM for execution. 3. On receiving a job submission, RM selected one of the registered NM to host AM for that job instance. There can only be one AM instance per job in the cluster. Once initiated, AM is completely responsible for Map-Reduce execution for that job. 4. When instructed by RM, NM launches a java process to run AM executable. 5. Reference to AM process is passed back to map-reduce client, which uses AM interface to administrate and track/inquire the status of map-reduce job. 6. Each data split is Mapped by a separate Map task. For each Map & Reduce task, AM asks RM to allocate a container (JVM that will execute the Map/Reduce task) in the cluster. 7. RM selects the appropriate NM hosts for container allocation based on the resource availability at NM and the Map-Reduce tasks resource requirements. This list of selected NM hosts is returned to AM. 8. AM coordinates with NM for launching container and passes on details of MapReduce executables and their execution environment to NM 9. NM launches a new Map-Reduce task execution container based on AM specification. 10. Once Container is launched, AM and Container directly interact with each other for task execution, status tracking, recovery from failed/slow tasks etc. With reduced responsibility, the load on RM (erstwhile Job Tracker) is significantly lesser; hence it is able to support much larger cluster of NMs and concurrent jobs. Notice, AM is generic entity, which is not restricted to run only Map-Reduce tasks. Interfaces are generic enough to have other style of data processing e.g. Master-Worker. In new architecture Map-Reduce becomes a user land library that is executed by AM. There are many other changes in release 0.23. Some of those are HDFS federation (aims at scaling HSFS NameNode and reduce the impact of NameNode failure), improved resource utilization in HDFS (e.g. connection pooling with Data Nodes), Job recovery if RM/AM fails in the middle, improved Map-Reduce performance, Node local MapReduce tasks for smaller sized jobs. I will be covering them in detail in other blogs.

You might also like