You are on page 1of 5

LESSON-7

YARN
Class will start with refreshing the previous class with QA….
Today topics:

 What is YARN
 Why YARN
 Basic component of YARN
 YARN Workflow
 Illustrate WordCount
 Command

What is YARN

YARN stands for “Yet Another Resource Negotiator”

YARN is a very important aspect of the enterprise Hadoop setup that is used for resource
management process. Apart from resource management, Yarn also does job Scheduling.

Why YARN

In Earlier version of Hadoop-1, MapReduce-1, performed both processing and resource management
functions. It consisted of a Job Tracker which was the single master. The Job Tracker allocate the
resources, perform scheduling and monitor the processing of jobs.

1
But the Job Tracker get overloaded, during handling huge task across huge cluster. The fundamental
idea behind YARN is to split the job of Job Tracker, so that it can manage resources in an efficient and
perfect way.

Basic Component involving YARN

 Global Resource Manager


 Application Master (per application)
 Slave Node Manager (per node)
 Container running on a Node Manager (per application), which is IT resource.

Resource Manager and Node Manager form the basis for managing Applications in a distributed
manner. The responsibility of Resource Manager is to distribute available resources to the Applications.
Resource Manager runs on Master Daemon.

Application Master in one side communicates with Resource Manager on the other side with Node
Manager. It negotiates resources with Resource Manager and it works with Node Manager to execute
2
and monitor the task. Each Application Master negotiates resource containers from the
scheduler.(Resource Manager has a scheduler), and monitor their progress.

Resource Manager has a scheduler, who is responsible for allocating resources to the various running
applications. Scheduler performs allocations according to the constraints. (for example user limits,
queue capacity etc.). Scheduling is performed based on the resource requirements of the applications

Node Manager is responsible for launching container for applications. It monitors the resource usage
of cpu, memory, disc and network. It also report back the resource usage to Resource Manager.

Application Workflow in Hadoop YARN

The following steps involved in Application workflow of Apache Hadoop YARN:

1. Client submits an application


2. Resource Manager starts Application Manager for allocating containers
3. Application Manager registers with Resource Manager (regarding job to be carried out)
4. Application Manager asks Resource Manager for containers.
5. Application Manager notifies Node Manager to launch containers
6. Application code is executed in the container
7. Client contacts Resource Manager/Node Manager to monitor application’s status
8. Application Manager unregisters with Resource Manager

3
4
How our wordcount command worked

SNN

Let us consider we are working on a cluster of size of 7 (1-NN, 1-RM, 4-DN, and 1-SNN), to understand the
cluster operations in a realistic way…

1. We have created directory in HDFS (to save a file)


2. Created some data or file (Say the file name is MyData and size is 234 MB for the sake of above fig)
3. We put the files in HDFS, say with 0 replication factor it saves in 2 blocks distributed in 2 Data Nodes.
4. To run WordCount application, we run the command (like hadoop jar wc.jar WordCount)
5. It goes to RM for next action, who then assign and Application Master/Manager for this MR job
6. MRApplication Master ask RM for resources, after getting feedback it notifies the concern NM
7. The NM launch Container for the job, monitor and reports to RM

Commands
start-yarn.sh # To start YARN
start-all.sh # To start all daemon including YARN
stop-all.sh # To stop all daemon or services including YARN
start-dfs.sh # To start dfs
jps # To show all daemon/services including YARN

Link
https://www.youtube.com/watch?v=2poZMI7it74

You might also like