Professional Documents
Culture Documents
7-Leson YARN
7-Leson YARN
YARN
Class will start with refreshing the previous class with QA….
Today topics:
What is YARN
Why YARN
Basic component of YARN
YARN Workflow
Illustrate WordCount
Command
What is YARN
YARN is a very important aspect of the enterprise Hadoop setup that is used for resource
management process. Apart from resource management, Yarn also does job Scheduling.
Why YARN
In Earlier version of Hadoop-1, MapReduce-1, performed both processing and resource management
functions. It consisted of a Job Tracker which was the single master. The Job Tracker allocate the
resources, perform scheduling and monitor the processing of jobs.
1
But the Job Tracker get overloaded, during handling huge task across huge cluster. The fundamental
idea behind YARN is to split the job of Job Tracker, so that it can manage resources in an efficient and
perfect way.
Resource Manager and Node Manager form the basis for managing Applications in a distributed
manner. The responsibility of Resource Manager is to distribute available resources to the Applications.
Resource Manager runs on Master Daemon.
Application Master in one side communicates with Resource Manager on the other side with Node
Manager. It negotiates resources with Resource Manager and it works with Node Manager to execute
2
and monitor the task. Each Application Master negotiates resource containers from the
scheduler.(Resource Manager has a scheduler), and monitor their progress.
Resource Manager has a scheduler, who is responsible for allocating resources to the various running
applications. Scheduler performs allocations according to the constraints. (for example user limits,
queue capacity etc.). Scheduling is performed based on the resource requirements of the applications
Node Manager is responsible for launching container for applications. It monitors the resource usage
of cpu, memory, disc and network. It also report back the resource usage to Resource Manager.
3
4
How our wordcount command worked
SNN
Let us consider we are working on a cluster of size of 7 (1-NN, 1-RM, 4-DN, and 1-SNN), to understand the
cluster operations in a realistic way…
Commands
start-yarn.sh # To start YARN
start-all.sh # To start all daemon including YARN
stop-all.sh # To stop all daemon or services including YARN
start-dfs.sh # To start dfs
jps # To show all daemon/services including YARN
Link
https://www.youtube.com/watch?v=2poZMI7it74