Professional Documents
Culture Documents
Assignment6DBS Spring2015
Assignment6DBS Spring2015
Assignment:6
IntroductiontoHadoopandMapReduce
Deadline:9:00PM,13thApril,2015
Forthisassignment,youwouldberunningaHadoopVirtualMachineonyoursystemandwrite
codeforthefollowingproblems.
CodingLanguage:Python
VirtualMachineSetup:
Downloading the VM
1. Downloaditfromhttp://content.udacitydata.com/courses/ud617/ClouderaUdacity
TrainingVM4.1.1.c.zip.Warningthezippedfilesizeis1.7GB.Ifyouareona
WindowsmachineyouwilllikelyneedtouseWinRARtoopenthis.zipfilebecauseother
methodsfailtoopentheunzippedfile(whichexceedsthemaximumspecified4GBfora
.zipfile).
2. MD5sumfilecanbefoundherehttp://content.udacity
data.com/courses/ud617/ClouderaUdacityTrainingVM4.1.1.c.zip.md5
3. Unzipit.Warningtheunzippedsizeis4.2GB
4. MD5hashesforfiles:
8a610c151d4b1ebdce11542d13dd2a53ClouderaTrainingVM4.1.1.c.log
6b44c965c1c6062554bf4cc12d11e87eClouderaTrainingVM4.1.1.c.plist
46dedeba3e0affd8311431d7e370705eClouderaTrainingVM4.1.1.c.vmdk
d41d8cd98f00b204e9800998ecf8427eClouderaTrainingVM4.1.1.c.vmsd
096956c1cbabeaa652ca63a2d5e14612ClouderaTrainingVM4.1.1.c.vmx
c9f8a375e82ef1e9d96097850e237df9ClouderaTrainingVM4.1.1.c.vmxf
0d7c8becb5a515068e81bb303c794e4fnvram
b. Chooseaname,useType:Linux:
c. PressNext
d. SelectmemorysizefortheVM.
e. PressNext
f. SelectUseanexistingvirtualharddrivefile,clickthebuttontobrowsetothe
directoryyouunzippedtheprovidedVMimageandpressCreate.
g. StarttheVM!
Using VMWare
1. Downloadandinstallfrom
https://my.vmware.com/web/vmware/free#desktop_end_user_computing/vmware_player/6_
0
2. CreatetheVirtualMachine:
a. ClickonOpenaVirtualMachineand,whenprompted,navigatetothefolderyou
unzippedtheVM,choosethefileandclickOpen.
b. SelectthemachineandclickPlayvirtualmachine
DatasetDownload:
DatasetfortheproblemisadatasetonAirportswhichcanbedownloadedfromthe
coursespageundertheresourcestab.
Problem1:
WriteMapperandReducertogetthenumberofAirportsby:
1. Country
2. Type
Problem2:
WriteMapperandReducertofindthe
1. Country
2. Region
havingthehighestnumberofairports
NOTE:Forboththeproblemsandeachpart,writeseparateMappersandReducers
anddontmixtheproblem.
Resources:
1. Unit2andUnit3fromthisonlinecourse(~12hours).Unit1and4
arenotneeded.https://www.udacity.com/course/ud617
2. Chapter6shouldsufficewhichisalsofreetodownload.
http://go.cloudera.com/udacitylesson2
Deliverables/UploadFormat:
RollNo/ProblemNumber
Mapper.py
Reducer.py
YoucanuploadthecodefromtheVirtualMachineitself.