Professional Documents
Culture Documents
Map Reduce Yarn: Big Data Huawei Course
Map Reduce Yarn: Big Data Huawei Course
MAP REDUCE
YARN
AVISO
Este documento foi gerado a partir de um material de estudo
da Huawei. Considere as informações nesse documento
como material de apoio.
4. YARN Features............................................................................................5
� Map and Reduce tasks are user defined. Combine and Partition can be defined
by user too. Combine is optional
� Each Map task takes one split of the input file and generates an output as key-
value pairs
o WordCount example:
�<hello, 1>
�<world, 1>
�<bye, 1>
�<world, 1>
� Outputs from all Map tasks are merged, combined, and a MapOutputFile (MOF)
is generated. Then, it is sorted and copied to input of Reduce tasks
� Input to Reduce tasks is a file with pairs with key-<list of values>
o WordCount example:
�<bye, 1 1 1>
�<hadoop, 2 2>
�<hello, 1 1 1>
�<world, 2>
� Reduce tasks takes MOF and generates output of all MapReduce processing
� Client
o Submit jobs to Resource Manager
� Resource Manager
o Manage the use of resources across the cluster
o Only one (active) Resource Manager in a Hadoop cluster
� Node Manager
o Run on every node in the cluster
o Launch and monitor Containers
o Report node status to Resource Manager periodically
� ApplicationMaster
o Negotiate resources with Resource Manager
o Work with NodeManager to execute and monitor tasks
4. YARN Features
� Active/Standby architecture
� Failover can be triggered automatically or manually to switch between active
and standby state
o Manually - administrator must run a command to manually switch one of
the Resource Manager nodes to the active state
o Automatically - when the active Resource Manager goes down or be-
comes unresponsive, another Resource Manager is automatically elected
through Zookeeper to be the active node; then, services are reestab-
lished
� If an Application Master goes down, its Containers will be closed, including the
ones where tasks are running
� Resource manager will start a new Application Master on another computing
node
� By queue
o Resources used by a queue may exceed its capacity due to resource shar-
ing
o The maximum resource usage can be limited by parameter, which is set
to 100 by default
� When maximum capacity of a queue is set to 100, it means that
tasks running on this queue can use all the cluster resources if
available
� By user
o minimum resource assurance of a user
� multiple tasks of a different users running at the same time in a
queue
� available resources for each user are between the minimum value
and the maximum value
� maximum value is determined by the number of running tasks