Professional Documents
Culture Documents
MAP phase :
1) mapper code / mapper class would be executed
2 )partition and in-memory sort
REDUCE phase :
1)Shuffle and merge sort (merge the duplicate keys maintaining the mapper sort order)
2)reducer code / reducer class would be executed
Note: In case if the user does not explictly define a reducer class, then the map reduce
framework would execute a default built in reducer which would just allow the mapper key-
value pairs to pass through it.
Input data on which the above map reduce jobs needs to be run (this needs to be copy-
pasted into a notepad, use the same notepad in lab or local VM)
Note: Once u copy the below contents into a notepad, ensure you move the file from
your local system to the lab using WinSCP. The jar files needs to be downloaded to
your local VM or copied to the lab using WinSCP. Ensure you move this input file
from local file system to your HDFS.
I love apples but I do not like the green apples , I just like the red ones . sun is red and
apples remind me of sun . I like cold weather but the sun is out today and hope its not
very hot . I like to eat apples on a hot day .
Question
Execute each of the jar files (map reduce jobs) individually on the above input and the ouput
needs to be directed to appropriate directory. Observe and compare the outputs in each case
and infer the inner working of map reduce. Justify your concusions with appropriate
evidence (screenshots of the output file or by copy-pasting the contents of the output file
during the assignment submission)
Commands for execution
Note : Ensure you execute the below command from the same directry where you have
copied the jar files
Note : The input_dir is the name of the input directory where you choose to keep your
input file and input_filename is the name of your input file