You are on page 1of 5

CSE441:DATABASESYSTEMS

Assignment:6
IntroductiontoHadoopandMapReduce
Deadline:9:00PM,13thApril,2015
Forthisassignment,youwouldberunningaHadoopVirtualMachineonyoursystemandwrite
codeforthefollowingproblems.

CodingLanguage:Python

VirtualMachineSetup:
Downloading the VM
1. Downloaditfromhttp://content.udacitydata.com/courses/ud617/ClouderaUdacity
TrainingVM4.1.1.c.zip.Warningthezippedfilesizeis1.7GB.Ifyouareona
WindowsmachineyouwilllikelyneedtouseWinRARtoopenthis.zipfilebecauseother
methodsfailtoopentheunzippedfile(whichexceedsthemaximumspecified4GBfora
.zipfile).
2. MD5sumfilecanbefoundherehttp://content.udacity
data.com/courses/ud617/ClouderaUdacityTrainingVM4.1.1.c.zip.md5
3. Unzipit.Warningtheunzippedsizeis4.2GB
4. MD5hashesforfiles:

8a610c151d4b1ebdce11542d13dd2a53ClouderaTrainingVM4.1.1.c.log
6b44c965c1c6062554bf4cc12d11e87eClouderaTrainingVM4.1.1.c.plist
46dedeba3e0affd8311431d7e370705eClouderaTrainingVM4.1.1.c.vmdk
d41d8cd98f00b204e9800998ecf8427eClouderaTrainingVM4.1.1.c.vmsd
096956c1cbabeaa652ca63a2d5e14612ClouderaTrainingVM4.1.1.c.vmx
c9f8a375e82ef1e9d96097850e237df9ClouderaTrainingVM4.1.1.c.vmxf
0d7c8becb5a515068e81bb303c794e4fnvram

Using Oracle VirtualBox


1. DownloadandinstallVirtualBoxfromhttps://www.virtualbox.org/wiki/Downloads
2. CreateanewVirtualmachine:
a. CreateanewvirtualmachinebypressingtheNewbutton:

b. Chooseaname,useType:Linux:

c. PressNext
d. SelectmemorysizefortheVM.

e. PressNext
f. SelectUseanexistingvirtualharddrivefile,clickthebuttontobrowsetothe
directoryyouunzippedtheprovidedVMimageandpressCreate.

g. StarttheVM!

Using VMWare
1. Downloadandinstallfrom
https://my.vmware.com/web/vmware/free#desktop_end_user_computing/vmware_player/6_
0
2. CreatetheVirtualMachine:
a. ClickonOpenaVirtualMachineand,whenprompted,navigatetothefolderyou
unzippedtheVM,choosethefileandclickOpen.

b. SelectthemachineandclickPlayvirtualmachine

DatasetDownload:
DatasetfortheproblemisadatasetonAirportswhichcanbedownloadedfromthe
coursespageundertheresourcestab.

Problem1:
WriteMapperandReducertogetthenumberofAirportsby:
1. Country
2. Type

Problem2:
WriteMapperandReducertofindthe
1. Country
2. Region
havingthehighestnumberofairports
NOTE:Forboththeproblemsandeachpart,writeseparateMappersandReducers
anddontmixtheproblem.

Resources:
1. Unit2andUnit3fromthisonlinecourse(~12hours).Unit1and4
arenotneeded.https://www.udacity.com/course/ud617
2. Chapter6shouldsufficewhichisalsofreetodownload.
http://go.cloudera.com/udacitylesson2

Deliverables/UploadFormat:
RollNo/ProblemNumber
Mapper.py
Reducer.py
YoucanuploadthecodefromtheVirtualMachineitself.

You might also like