Welcome to Scribd!

Skip carousel

Lab Assignment 1: Mapreduce / Hadoop: Notes

Uploaded by

Karthick Jothivel

0% found this document useful (0 votes)

102 views2 pages

Original Title

CS432S17Lab1

Copyright

Available Formats

PDF, TXT or read online from Scribd

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Report this Document

Copyright:

Available Formats

Download as PDF, TXT or read online from Scribd

Flag for inappropriate content

0% found this document useful (0 votes)

102 views2 pages

Lab Assignment 1: Mapreduce / Hadoop: Notes

Uploaded by

Karthick Jothivel

Copyright:

Available Formats

Download as PDF, TXT or read online from Scribd

Flag for inappropriate content

Jump to Page

You are on page 1of 2

Search inside document

Alexandria University Lab Assignment 1

Faculty of Engineering CS432: Distributed Systems

Computer and Systems Engineering Assigned: Feb 19, 2017
Spring 2017 Due: Feb 22, 2017

Lab Assignment 1: MapReduce / Hadoop

Notes
You can work on this assignment in teams of two.

Objectives
• Understand the MapReduce programming model.

• Setting up Hadoop on a single node and on a cluster of nodes.

Overview
It is required to install Hadoop on both single node cluster and multiple nodes cluster. Next, you
will practice running few HDFS commands and executing Hadoop jobs. You can use the following
command to download Hadoop on your machine:
wget http://www-eu.apache.org/dist/hadoop/common/hadoop-2.7.3/hadoop-2.7.3.tar.gz
You can extract the downloaded file using:
tar -xzvf hadoop-2.7.3.tar.gz

Setting up Hadoop
• You will need to download the latest stable version of Hadoop (2.7.3) from this link: http:
//hadoop.apache.org/releases.html.

• Setup the downloaded Hadoop version on your machine. These are the steps that you will
need to follow: https://goo.gl/8KVyGJ. You can do this step before the lab time.

• During the lab, you will need to setup Hadoop on a cluster of machines using the following
steps: https://goo.gl/KLyzFU. You have the choice to use one of the following options to
setup the Hadoop clusters: (1) AWS EC2 instances; (2) Lab machines; or (3) your laptops.

Useful resources: This tutorial can help you setup hadoop: Part I and Part II

HDFS
• Create a directory called input in your home directory.

• Download the following text files from the Gutenberg project, in Plain Text UTF-8 format (hint:
you can use wget):

– The Outline of Science, Vol. 1 (of 4) by J. Arthur Thomson

– The Notebooks of Leonardo Da Vinci

1 of 2
– Ulysses by James Joyce
– The Art of War by 6th cent. B.C. Sunzi
– The Adventures of Sherlock Holmes by Sir Arthur Conan Doyle
– Encyclopaedia Britannica, 11th Edition, ”Brquigny, Louis Georges Oudard Feudrix

• Download the above data and store them to the input directory on your machine.

• Create a new file mydata.txt in the input directory. Open the file and write to it this line:
CS432 FirstStudentID SecondStudentID. Repeat this line four times in the file.

• Copy the input directory from your local disk to HDFS. You can use the command:
hadoop fs -copyToLocal /home/userid/input /home/userid/input. The first path is the
source, which is on your local disk. The second path is the destination, which is on HDFS.

• Now check that the files were already copied using this command:
hadoop fs -ls /home/userid/input

Running Hadoop Jobs

At this step, you run Hadoop jobs on the data loaded on HDFS.

• You need to build the WordCount example described in this tutorial. Name the created jar file
wc.jar.

• You are now ready to run the jar file using:

hadoop jar wc.jar /home/userid/input /home/userid/output

• Check the output files created in the /home/userid/output.

• Copy the output directory to your local disk using:

hadoop fs -get /home/userid/output /home/userid/output. You can also use
copyToLocal or getmerge

• In the output file, check that CS432 and your SIDs are counted four times. Check the word
count for various words that appeared in the input files.

• You can now update the words count to filter the words based on a second input file. Therefore,
only the words that appear in that dictionary file will be counted.

Resources
• HDFS shell commands

• MapReduce Tutorial

2 of 2

Data Storage Data Processing: Hadoop Distributed File System (HDFS) Mapreduce
Document35 pages
Data Storage Data Processing: Hadoop Distributed File System (HDFS) Mapreduce
SUDHEER REDDY
No ratings yet
HOL Hive PDF
Document23 pages
HOL Hive PDF
Kishore Kumar
No ratings yet
Bigdata Lab
Document55 pages
Bigdata Lab
Radheshyam Shah
No ratings yet
Hadoop File Complte
Document18 pages
Hadoop File Complte
rashant
No ratings yet
Lab Hadoop
Document7 pages
Lab Hadoop
Tev Wallace
No ratings yet
@bigdatalabfile 09
Document35 pages
@bigdatalabfile 09
goatrip2024
No ratings yet
Experiment No 1
Document13 pages
Experiment No 1
Aman Jain
No ratings yet
Hadoop Administrator Training - Lab Hand Book
Document12 pages
Hadoop Administrator Training - Lab Hand Book
debkrc
No ratings yet
Root Password and Log Into The System
Document4 pages
Root Password and Log Into The System
Nasir Kamal
No ratings yet
How To Set Up A Hadoop Cluster in Docker
Document13 pages
How To Set Up A Hadoop Cluster in Docker
NP Neupane
No ratings yet
Operating System: Lab Session # 6 BS (CS) - 2020
Document8 pages
Operating System: Lab Session # 6 BS (CS) - 2020
taz araish
No ratings yet
Notes
Document53 pages
Notes
Radheshyam Shah
No ratings yet
Introduction To Hadoop
Document52 pages
Introduction To Hadoop
anytingac1
No ratings yet
Hadoop Installation Manual 2.odt
Document20 pages
Hadoop Installation Manual 2.odt
Gurasees Singh
No ratings yet
Bda - Unit 2
Document56 pages
Bda - Unit 2
Kajal Vaniya
No ratings yet
Big Data Lab Manual and Syllabus
Document71 pages
Big Data Lab Manual and Syllabus
startechbyjus123
No ratings yet
BDA Lab Assignment 1 PDF
Document20 pages
BDA Lab Assignment 1 PDF
parth shah
No ratings yet
Bda Manual
Document80 pages
Bda Manual
bhuvans80_m
No ratings yet
Hands On
Document26 pages
Hands On
Ashok Kumar K R
No ratings yet
Lab 1 - Hadoop HDFS and MapReduce
Document4 pages
Lab 1 - Hadoop HDFS and MapReduce
Shiv GM
No ratings yet
11 Preparing For Programming
Document19 pages
11 Preparing For Programming
Giovanni
No ratings yet
Running Ha Do Op Michel Noll
Document23 pages
Running Ha Do Op Michel Noll
Mahashwetha Rao
No ratings yet
C21053 Jay Vijay Karwatkar-Big Data Analytics & Visualization
Document210 pages
C21053 Jay Vijay Karwatkar-Big Data Analytics & Visualization
Jay Karwatkar
No ratings yet
Hadoop Nishant Gandhi.
Document21 pages
Hadoop Nishant Gandhi.
Bhavesh Lodaliya
No ratings yet
04 Hadoop Setup 05 CLI 06 Running MapRed
Document30 pages
04 Hadoop Setup 05 CLI 06 Running MapRed
Manjula Annamalai
No ratings yet
OS X D2XX Library Version 1.2.2 © Future Technology Devices International Ltd. 2011 Disclaimer
Document5 pages
OS X D2XX Library Version 1.2.2 © Future Technology Devices International Ltd. 2011 Disclaimer
xythri+fake
No ratings yet
Extreme Computing Lab Exercises Session One: 1 Getting Started
Document6 pages
Extreme Computing Lab Exercises Session One: 1 Getting Started
Kumar Satti
No ratings yet
Experiment No 1
Document15 pages
Experiment No 1
ZEESHAN KHAN
No ratings yet
PDC All Labs
Document129 pages
PDC All Labs
Sai Kiran
100% (1)
UnixClass01 PDF
Document111 pages
UnixClass01 PDF
Qtp Selenium
No ratings yet
04 Hadoop Setup 05 CLI 06 Running MapRed-1
Document42 pages
04 Hadoop Setup 05 CLI 06 Running MapRed-1
Manjula Annamalai
No ratings yet
HDFS Commands Updated
Document87 pages
HDFS Commands Updated
sowjanya kandukuri
No ratings yet
Deployment of A Highlevel Cluster in Freebsd
Document4 pages
Deployment of A Highlevel Cluster in Freebsd
Muhammad Aamer
No ratings yet
Hadoop File System: CSC 369 Distributed Computing Alexander Dekhtyar
Document5 pages
Hadoop File System: CSC 369 Distributed Computing Alexander Dekhtyar
Carl Alabaster
No ratings yet
DAN Lab ManuaL
Document53 pages
DAN Lab ManuaL
SARANYA A
No ratings yet
Lab2 WC
Document2 pages
Lab2 WC
wisesharkwhale
No ratings yet
Linux v2
Document34 pages
Linux v2
Phan Minh Hải
No ratings yet
Big Data File
Document16 pages
Big Data File
Arnav Shrivastava
No ratings yet
Linux Administrator Guide1
Document31 pages
Linux Administrator Guide1
Moe Thet Hnin
No ratings yet
Dsa Practical File
Document16 pages
Dsa Practical File
Giri Kanchan
No ratings yet
CS333 Project 1
Document14 pages
CS333 Project 1
VanlocTran
No ratings yet
Guide To Unix Using Linux 4th Edition Palmer Solutions Manual
Document8 pages
Guide To Unix Using Linux 4th Edition Palmer Solutions Manual
JohnDeckerfrgam
100% (16)
A New Way To Store and Analyze Data: Presented By:: Harsha Jain
Document20 pages
A New Way To Store and Analyze Data: Presented By:: Harsha Jain
C. Valeriu
No ratings yet
Kcs 061 PPT Unit 2
Document56 pages
Kcs 061 PPT Unit 2
PRACHI ROSHAN
No ratings yet
SEN-762 Advanced Big Data Analytics
Document39 pages
SEN-762 Advanced Big Data Analytics
بالیراجپوت
No ratings yet
(F2-Formative) - Linux Installation, Environment Familiarization and File Creation Commands-1
Document15 pages
(F2-Formative) - Linux Installation, Environment Familiarization and File Creation Commands-1
Sophia Manianglung
No ratings yet
Introduction About Linux
Document81 pages
Introduction About Linux
Manwinder Singh Gill
No ratings yet
What Is O.S.: Hardware, Memory, Processes and Applications. An
Document18 pages
What Is O.S.: Hardware, Memory, Processes and Applications. An
Ankit Bhutwala
No ratings yet
GCC Lab Manual
Document61 pages
GCC Lab Manual
Madhu Bala
No ratings yet
(F2-Formative) - Linux Installation, Environment Familiarization and File Creation Commands
Document7 pages
(F2-Formative) - Linux Installation, Environment Familiarization and File Creation Commands
Michael Lato
No ratings yet
MR YARN - Lab 2 - Cloud - Updated-V2.0
Document22 pages
MR YARN - Lab 2 - Cloud - Updated-V2.0
bender1686
No ratings yet
Linux System Programming Part 1 - Linux Basics: IBA Bulgaria 2018
Document18 pages
Linux System Programming Part 1 - Linux Basics: IBA Bulgaria 2018
Калоян Тотев
No ratings yet
Big Data Lab Manual
Document32 pages
Big Data Lab Manual
Devendra Singh Kaushal
No ratings yet
Hadoop Mapreduce Assigment
Document16 pages
Hadoop Mapreduce Assigment
Huỳnh Lý Minh Chương
No ratings yet
HuynhLyMinhChuong HadoopMapReduceAssignment
Document16 pages
HuynhLyMinhChuong HadoopMapReduceAssignment
Huỳnh Lý Minh Chương
No ratings yet
Bda Lab Manual
Document20 pages
Bda Lab Manual
RAKSHIT AYACHIT
No ratings yet
Unit III
Document86 pages
Unit III
Farhan Sj
No ratings yet
Developing A Simple Map-Reduce Program For Hadoop: Big Data Course CS6350 Professor: Dr. Latifur Khan
Document22 pages
Developing A Simple Map-Reduce Program For Hadoop: Big Data Course CS6350 Professor: Dr. Latifur Khan
Kaushal Prajapati
No ratings yet
Homework Labs Lecture01
Document9 pages
Homework Labs Lecture01
Episode Unlocker
No ratings yet
Linux Commands By Example
From Everand
Linux Commands By Example
Khaled Jamal
Rating: 4.5 out of 5 stars
4.5/5 (3)
IoT CBS
Document1 page
IoT CBS
Karthick Jothivel
No ratings yet
Steps: /usr/lib/hadoop-0.20/ Usr/lib/hadoop-0.20/lib
Document4 pages
Steps: /usr/lib/hadoop-0.20/ Usr/lib/hadoop-0.20/lib
Karthick Jothivel
No ratings yet
Cssyll 8
Document15 pages
Cssyll 8
Aryan
No ratings yet
Hadoop PDF
Document39 pages
Hadoop PDF
Naveen S Yeshodara
No ratings yet
De Liu 2 PDF
Document27 pages
De Liu 2 PDF
Karthick Jothivel
No ratings yet
SWINGS
Document38 pages
SWINGS
Karthick Jothivel
No ratings yet
Vtu Syllabus
Document140 pages
Vtu Syllabus
s.kotireddy44586
No ratings yet
SWINGS
Document38 pages
SWINGS
Karthick Jothivel
No ratings yet
15CSL77 - Web Lab Manual - 19-20 PDF
Document61 pages
15CSL77 - Web Lab Manual - 19-20 PDF
Karthick Jothivel
No ratings yet
Functional Areas
Document18 pages
Functional Areas
mohanemm
No ratings yet
Management and Entrepreneurship Notes Module 1
Document23 pages
Management and Entrepreneurship Notes Module 1
Naima Azhar
No ratings yet
ICAICT 2016 Paper 26
Document8 pages
ICAICT 2016 Paper 26
Karthick Jothivel
No ratings yet
Management and Entrepreneurship-Veer Bhadra
Document195 pages
Management and Entrepreneurship-Veer Bhadra
Prakash Bansal
100% (1)
Checkpoints PPT1
Document13 pages
Checkpoints PPT1
kumshubham9870
No ratings yet
LSMW Direct Input and Scheduling SM36
Document60 pages
LSMW Direct Input and Scheduling SM36
joseph aligbe
No ratings yet
Statistics and Probability Midterm Exam
Document3 pages
Statistics and Probability Midterm Exam
Dave Adajar Tejedo
No ratings yet
Aplitude
Document36 pages
Aplitude
B Vasu
75% (4)
EDQ WebLogic OCI Instructions v6
Document15 pages
EDQ WebLogic OCI Instructions v6
Lena Gomez
No ratings yet
NoSQL MongoDB HBase Cassandra
Document142 pages
NoSQL MongoDB HBase Cassandra
justin maxton
No ratings yet
Dynamic Web Page Development - h-1
Document37 pages
Dynamic Web Page Development - h-1
Hardik Panchal
No ratings yet
2017IMT 110 Colloqium Latex
Document28 pages
2017IMT 110 Colloqium Latex
Aayush Singh
No ratings yet
Database 11.2.0.4 Support For Database Cloud Services: September 1, 2020
Document6 pages
Database 11.2.0.4 Support For Database Cloud Services: September 1, 2020
Martin Carrozzo
No ratings yet
Automatic Generation of Stopwords
Document10 pages
Automatic Generation of Stopwords
Bini Teflon Ankh
No ratings yet
Hadoop Administrator Interview Questions: Cloudera® Enterprise Version
Document13 pages
Hadoop Administrator Interview Questions: Cloudera® Enterprise Version
madhubaddapuri
No ratings yet
Os Finals Thankyou Sir
Document4 pages
Os Finals Thankyou Sir
Ched Lorenz Toledo
No ratings yet
SQL181920
Document35 pages
SQL181920
Aaassff
No ratings yet
En - Left Outer Join
Document2 pages
En - Left Outer Join
Pavan Pulicherla
No ratings yet
SQL01 - Introduction
Document16 pages
SQL01 - Introduction
Dan Perez
No ratings yet
Emc Ito Adapter BMC Remedy Troubleshooting The Emc BMC Remedy Adapter Software
Document182 pages
Emc Ito Adapter BMC Remedy Troubleshooting The Emc BMC Remedy Adapter Software
ArunKumar Alagarsamy
No ratings yet
Aprasial Dispagia
Document3 pages
Aprasial Dispagia
Muhammad Husein
No ratings yet
Diagrama Arquitectura
Document1 page
Diagrama Arquitectura
Miguel Romero
No ratings yet
T Codes
Document14 pages
T Codes
Vijayendra Sawant
No ratings yet
QB CP02 Eng
Document18 pages
QB CP02 Eng
Poon Yin Kwong
No ratings yet
Integration With Third-Party Applications and Data Sources
Document52 pages
Integration With Third-Party Applications and Data Sources
ibmkamal-1
No ratings yet
Study Material For Lab - DM
Document20 pages
Study Material For Lab - DM
Shana Kaur
No ratings yet
SCADA For E-Mobility: Operational Confidence
Document8 pages
SCADA For E-Mobility: Operational Confidence
Paolo Salinetti
No ratings yet
Lab Book Checkpoint 2
Document14 pages
Lab Book Checkpoint 2
Ace Writer1
No ratings yet
Tackle Literature Reviews With Confidence: Your Essential Guide To Conducting A Review
Document34 pages
Tackle Literature Reviews With Confidence: Your Essential Guide To Conducting A Review
Sandesh Giri
No ratings yet
Connect To Sqlite Database Firedac Rad Studio PDF Free
Document6 pages
Connect To Sqlite Database Firedac Rad Studio PDF Free
Hi Chem
No ratings yet
Payroll-Management-System Synopsis
Document6 pages
Payroll-Management-System Synopsis
Karthik Ganapathi
No ratings yet
Erori
Document1 page
Erori
aurelia angelica
No ratings yet
Access Workshop 3
Document30 pages
Access Workshop 3
Minh Anh Lê
No ratings yet
04 - Logical Design in Data Warehouse
Document39 pages
04 - Logical Design in Data Warehouse
Muhamad Lukman Nurhakim
No ratings yet