You are on page 1of 2

MAP REDUCE ASSIGNMENT

Pre-requisites : A clear understanding of wordount program in map reduce (working


demo)

Outline : A map reduce program would execute in the following order

MAP phase :
1) mapper code / mapper class would be executed
2 )partition and in-memory sort

REDUCE phase :
1)Shuffle and merge sort (merge the duplicate keys maintaining the mapper sort order)
2)reducer code / reducer class would be executed

Note: In case if the user does not explictly define a reducer class, then the map reduce
framework would execute a default built in reducer which would just allow the mapper key-
value pairs to pass through it.

The zipped folder contains 4 jar files

1) wc.jar (Mapper and reducer implemented by the programmer/user)


2) wc_0reducer.jar (Map only job and no reducer is implmented or allowed to run)
3) wc_defaultreducer.jar (Uses does not define a reducer class)
4) wc_2reducer.jar (A map reduce job with 2 reducers configured by the user)

Input data on which the above map reduce jobs needs to be run (this needs to be copy-
pasted into a notepad, use the same notepad in lab or local VM)

Note: Once u copy the below contents into a notepad, ensure you move the file from
your local system to the lab using WinSCP. The jar files needs to be downloaded to
your local VM or copied to the lab using WinSCP. Ensure you move this input file
from local file system to your HDFS.

I love apples but I do not like the green apples , I just like the red ones . sun is red and
apples remind me of sun . I like cold weather but the sun is out today and hope its not
very hot . I like to eat apples on a hot day .

Question
Execute each of the jar files (map reduce jobs) individually on the above input and the ouput
needs to be directed to appropriate directory. Observe and compare the outputs in each case
and infer the inner working of map reduce. Justify your concusions with appropriate
evidence (screenshots of the output file or by copy-pasting the contents of the output file
during the assignment submission)
Commands for execution

Note : Ensure you execute the below command from the same directry where you have
copied the jar files

hadoop jar nameof thejarfile.jar WordCount input_dir/input_filename


output_dir_name

Note : The input_dir is the name of the input directory where you choose to keep your
input file and input_filename is the name of your input file

You might also like