Welcome to Scribd!

Skip carousel

DSBDA Lab Assignment No B-2

Uploaded by

sanudantal42003

0% found this document useful (0 votes)

16 views4 pages

Original Title

DSBDA_Lab_Assignment_No_B-2

Copyright

Available Formats

PDF, TXT or read online from Scribd

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Report this Document

Copyright:

Available Formats

Download as PDF, TXT or read online from Scribd

Flag for inappropriate content

0% found this document useful (0 votes)

16 views4 pages

DSBDA Lab Assignment No B-2

Uploaded by

sanudantal42003

Copyright:

Available Formats

Download as PDF, TXT or read online from Scribd

Flag for inappropriate content

Jump to Page

You are on page 1of 4

Search inside document

Group B

Assignment No: 2

Title of the Assignment: Big Data Analytics I

Design a distributed application using MapReduce which processes a log file of a system.

Theory:
● Steps to Install Hadoop for distributed environment

Steps to Install Hadoop for distributed environment:

Initially create one folder logfiles1 on the desktop. In that folder store input file
(access_log_short.csv), SalesMapper.java, SalesCountryReducer.java, SalesCountryDriver.java
files)

Step 1) Go to the Hadoop home directory and format the NameNode.

cd hadoop-2.7.3

bin/hadoop namenode -format

Step 2) Once the NameNode is formatted, go to hadoop-2.7.3/sbin directory and start all the
daemons/nodes.

cd hadoop-2.7.3/sbin

1) Start NameNode:

The NameNode is the centerpiece of an HDFS file system. It keeps the directory tree of all files
stored in the HDFS and tracks all the files stored across the cluster.

./hadoop-daemon.sh start namenode

2) Start DataNode:

On startup, a DataNode connects to the Namenode and it responds to the requests from the
Namenode for different operations.

./hadoop-daemon.sh start datanode

3) Start ResourceManager:
ResourceManager is the master that arbitrates all the available cluster resources and thus helps in
managing the distributed applications running on the YARN system. Its work is to manage each
NodeManagers and each application’s ApplicationMaster.

./yarn-daemon.sh start resourcemanager

4) Start NodeManager:

The NodeManager in each machine framework is the agent which is responsible for managing
containers, monitoring their resource usage and reporting the same to the ResourceManager.

./yarn-daemon.sh start nodemanager

5) Start JobHistoryServer:

JobHistoryServer is responsible for servicing all job history related requests from client.

./mr-jobhistory-daemon.sh start historyserver

Step 3) To check that all the Hadoop services are up and running, run the below command.

jps

Step 4) cd

Step 5) sudo mkdir mapreduce_vijay

Step 6) sudo chmod 777 -R mapreduce_vijay/

Step 7) sudo chown -R vijay mapreduce_vijay/

Step 8) sudo cp /home/vijay/Desktop/logfiles1/* ~/mapreduce_vijay/

Step 9) cd mapreduce_vijay/

Step 10) ls
Step 11) sudo chmod +r *.*

Step 12) export CLASSPATH="/home/vijay/hadoop-2.7.3/share/hadoop/mapreduce/hadoop-

mapreduce-client-core-2.7.3.jar:/home/vijay/hadoop-2.7.3/share/hadoop/mapreduce/hadoop-
mapreduce-client-common-2.7.3.jar:/home/vijay/hadoop-2.7.3/share/hadoop/common/hadoop-
common-2.7.3.jar:~/mapreduce_vijay/SalesCountry/*:$HADOOP_HOME/lib/*"

Step 13) javac -d . SalesMapper.java SalesCountryReducer.java SalesCountryDriver.java

Step 14) ls

Step 15) cd SalesCountry/

Step 16) ls (check is class files are created)

Step 17) cd ..

Step 18) gedit Manifest.txt

(add following lines to it:
Main-Class: SalesCountry.SalesCountryDriver)

Step 19) jar -cfm mapreduce_vijay.jar Manifest.txt SalesCountry/*.class

Step 20) ls

Step 21) cd

Step 22) cd mapreduce_vijay/

Step 23) sudo mkdir /input200

Step 24) sudo cp access_log_short.csv /input200

Step 25) $HADOOP_HOME/bin/hdfs dfs -put /input200 /

Step 26) $HADOOP_HOME/bin/hadoop jar mapreduce_vijay.jar /input200 /output200

Step 27) hadoop fs -ls /output200

Step 28) hadoop fs -cat /out321/part-00000

Step 29) Now open the Mozilla browser and go to localhost:50070/dfshealth.html to check the
NameNode interface.
Conclusion: In this way we have designed a distributed application using MapReduce which
processes a log file of a system

Viva Questions
1. Write down the steps for Design a distributed application using MapReduce which
processes a log file of a system.

The Hardest Part of Microservices: Your Data - Christian Posta, Red Hat
Document37 pages
The Hardest Part of Microservices: Your Data - Christian Posta, Red Hat
Austin Gunter
No ratings yet
CS 382-Network-Centric Computing-Muhammad Fareed Zaffar
Document4 pages
CS 382-Network-Centric Computing-Muhammad Fareed Zaffar
Muhammad Hassan
No ratings yet
Big Data File
Document16 pages
Big Data File
Arnav Shrivastava
No ratings yet
Hadoop File Complte
Document18 pages
Hadoop File Complte
rashant
No ratings yet
Amrita CC 3.1
Document7 pages
Amrita CC 3.1
Ujjawal Agarwal
No ratings yet
Step 1: Download Binary Package
Document50 pages
Step 1: Download Binary Package
Mrunal Bhilare
No ratings yet
There Are Two Ways To Install Hadoop in Ubantu
Document10 pages
There Are Two Ways To Install Hadoop in Ubantu
Srinivasa Rao T
No ratings yet
Bigdata Lab
Document55 pages
Bigdata Lab
Radheshyam Shah
No ratings yet
Big Data Manual
Document19 pages
Big Data Manual
Madhubala J
No ratings yet
Bda Manual
Document80 pages
Bda Manual
bhuvans80_m
No ratings yet
HADOOP PPT
Document21 pages
HADOOP PPT
[L]Akshat Modi
No ratings yet
DAN Lab ManuaL
Document53 pages
DAN Lab ManuaL
SARANYA A
No ratings yet
BDA Lab Manual-1
Document60 pages
BDA Lab Manual-1
pavan chittala
No ratings yet
Big Data Manual Ai
Document33 pages
Big Data Manual Ai
smitcse2021
No ratings yet
Hadoop Single Node Cluster Setup Steps
Document7 pages
Hadoop Single Node Cluster Setup Steps
Shushrutha Reddy K
No ratings yet
Hadoop Installation Steps
Document4 pages
Hadoop Installation Steps
B49 Pravin Teli
No ratings yet
Go To Cloudera Quickstart VM To Download A Pre-Setup CDH Virtual Machine
Document20 pages
Go To Cloudera Quickstart VM To Download A Pre-Setup CDH Virtual Machine
Sayeth Saabith
No ratings yet
Bda Lab
Document47 pages
Bda Lab
pawan
No ratings yet
Notes
Document53 pages
Notes
Radheshyam Shah
No ratings yet
BIGDATA LAB MANUAL
Document27 pages
BIGDATA LAB MANUAL
john wick
No ratings yet
Install and Run Hadoop On Windows
Document29 pages
Install and Run Hadoop On Windows
sunilswastik
No ratings yet
Bda A2
Document17 pages
Bda A2
Deepti Agrawal
No ratings yet
MR YARN - Lab 2 - Cloud - Updated-V2.0
Document22 pages
MR YARN - Lab 2 - Cloud - Updated-V2.0
bender1686
No ratings yet
Big Data & Analytics Lab Manual
Document51 pages
Big Data & Analytics Lab Manual
Sathish
No ratings yet
Hadoop Administrator Training - Lab Hand Book
Document12 pages
Hadoop Administrator Training - Lab Hand Book
debkrc
No ratings yet
2 - Installation
Document15 pages
2 - Installation
boxbe9876
No ratings yet
Hadoop Administration
Document97 pages
Hadoop Administration
arjun.ec633
No ratings yet
Big Data
Document22 pages
Big Data
Kapil Soni
No ratings yet
Worksheet3.1 CC
Document8 pages
Worksheet3.1 CC
Shantanu Rai
No ratings yet
DSBDSAssingment 11
Document20 pages
DSBDSAssingment 11
403 Chaudhari Sanika Sagar
No ratings yet
CS-702 (D) BigData
Document61 pages
CS-702 (D) BigData
garima bh
No ratings yet
Hadoop Week 2
Document40 pages
Hadoop Week 2
Rahul Kolluri
No ratings yet
Big Data Security 20100BTCSDSI07268
Document76 pages
Big Data Security 20100BTCSDSI07268
Disha Dhamdhere
No ratings yet
Hadoop Installation Manual 2.odt
Document20 pages
Hadoop Installation Manual 2.odt
Gurasees Singh
No ratings yet
Hadoop Installation
Document11 pages
Hadoop Installation
Alekhya Abbaraju
No ratings yet
Experiment No 1
Document15 pages
Experiment No 1
ZEESHAN KHAN
No ratings yet
Introduction To HDFS PDF
Document27 pages
Introduction To HDFS PDF
Sumit Gupta
No ratings yet
Lab Manual Bda
Document57 pages
Lab Manual Bda
pnirmal086
No ratings yet
Practical N0.2 AIM: Install Hadoop Hadoop Installation On Windows 10
Document12 pages
Practical N0.2 AIM: Install Hadoop Hadoop Installation On Windows 10
Rutikesh Shetye
No ratings yet
Cloud Lab Exe 6,7,8
Document17 pages
Cloud Lab Exe 6,7,8
Vasanth Ananth
No ratings yet
BDM Lab Manual 2
Document4 pages
BDM Lab Manual 2
Vijay Mano
No ratings yet
BDA LAB Programs
Document56 pages
BDA LAB Programs
raghu rama teja vegesna
No ratings yet
1) Hadoop Basics
Document86 pages
1) Hadoop Basics
angeline
No ratings yet
Tutorial-Counting Words in File (S) Using Mapreduce: Prerequisites
Document11 pages
Tutorial-Counting Words in File (S) Using Mapreduce: Prerequisites
saiconze
No ratings yet
Online:: Setting Up The Environment
Document9 pages
Online:: Setting Up The Environment
Januari Saga
No ratings yet
(EX: Mkdir Demo) (EX: Touch Filename) (Ex: CD Movies)
Document15 pages
(EX: Mkdir Demo) (EX: Touch Filename) (Ex: CD Movies)
Naga Bushan
No ratings yet
Big Data Analytics IT
Document55 pages
Big Data Analytics IT
MAHENDRAN SELLADURAI
No ratings yet
EX. NO Date Program NO Sign
Document80 pages
EX. NO Date Program NO Sign
Dheepa
No ratings yet
Hadoop Presentaton
Document47 pages
Hadoop Presentaton
Jhumri Talaiya
No ratings yet
Prepared By: Manoj Kumar Joshi & Vikas Sawhney
Document47 pages
Prepared By: Manoj Kumar Joshi & Vikas Sawhney
kavitha
No ratings yet
Big Data
Document16 pages
Big Data
roushan singh
No ratings yet
Practical-1: Aim: Hadoop Configuration and Single Node Cluster Setup and Perform File Management Task in
Document61 pages
Practical-1: Aim: Hadoop Configuration and Single Node Cluster Setup and Perform File Management Task in
Parth
No ratings yet
Hadoop On Windows
Document6 pages
Hadoop On Windows
georgedavidcse
No ratings yet
Weather Data Analysis Using Had Oop
Document9 pages
Weather Data Analysis Using Had Oop
Ganesh Kumar
No ratings yet
@bigdatalabfile 09
Document35 pages
@bigdatalabfile 09
goatrip2024
No ratings yet
UNIT 4 Notes by ARUN JHAPATE
Document20 pages
UNIT 4 Notes by ARUN JHAPATE
Ankit “अंकित मौर्य” Mourya
No ratings yet
Big Data Lab Manual and Syllabus
Document71 pages
Big Data Lab Manual and Syllabus
startechbyjus123
No ratings yet
MapReduce Hands On
Document28 pages
MapReduce Hands On
varun3dec1
No ratings yet
Untitled
Document185 pages
Untitled
Umair Dawre
No ratings yet
Installation of Hadoop in Windows10
Document4 pages
Installation of Hadoop in Windows10
Bindu Madhuri
No ratings yet
Big Data Unit 2
Document31 pages
Big Data Unit 2
Nazima Begum
No ratings yet
Footprinting, Reconnaissance, Scanning and Enumeration Techniques of Computer Networks
From Everand
Footprinting, Reconnaissance, Scanning and Enumeration Techniques of Computer Networks
Dr. Hidaia Mahmood Alassouli
No ratings yet
Multiprocessor Architecture and Programming
Document20 pages
Multiprocessor Architecture and Programming
வெ. விஷ்வா
No ratings yet
Java Lab Practical 7
Document3 pages
Java Lab Practical 7
Sahil Kumar
No ratings yet
Soft Delete Exception
Document2 pages
Soft Delete Exception
Prasanthi Sanampudi
No ratings yet
7 Synchronization
Document24 pages
7 Synchronization
Phương Trinh Quảng Thị
No ratings yet
103-EJB Components
Document34 pages
103-EJB Components
Dawood Savage
No ratings yet
Log
Document15 pages
Log
Agiska Za
No ratings yet
17512
Document2 pages
17512
Gaurav Bade
No ratings yet
AcA Syllabus
Document2 pages
AcA Syllabus
Purushothama Reddy
No ratings yet
Install Tuner
Document5 pages
Install Tuner
camilo896857
No ratings yet
Lamports Logical Clock & Vector Clock
Document30 pages
Lamports Logical Clock & Vector Clock
Subhadharshini Karthikeyan
No ratings yet
Semaphores and Monitors
Document30 pages
Semaphores and Monitors
rinspd
No ratings yet
Deadlock 4
Document8 pages
Deadlock 4
safina mirza
No ratings yet
Tutorial: Operating System MCQS (PART-1)
Document115 pages
Tutorial: Operating System MCQS (PART-1)
Noor Thamer
No ratings yet
Research Paper On Round Robin Scheduling
Document8 pages
Research Paper On Round Robin Scheduling
egydsz2n
100% (1)
Aos CH 2.3
Document47 pages
Aos CH 2.3
Efrain Sanjay Adhikary
No ratings yet
Anr 5.22.21.0033 (2102221033) 20210704 100005 1611174316
Document4 pages
Anr 5.22.21.0033 (2102221033) 20210704 100005 1611174316
shilpa
No ratings yet
Untitled
Document11 pages
Untitled
Nemih Nemih
No ratings yet
Process Scheduling Algorithms
Document3 pages
Process Scheduling Algorithms
Kartik Verma
No ratings yet
MCQ MULTICORE ANSWERS-edited
Document31 pages
MCQ MULTICORE ANSWERS-edited
GEO MERIN
No ratings yet
Multi Threading
Document5 pages
Multi Threading
Nitish Goyal
No ratings yet
Dcs Mcqs
Document31 pages
Dcs Mcqs
iraj shaikh
No ratings yet
Distributed Operating Systems
Document22 pages
Distributed Operating Systems
thangavelu2009
100% (1)
CS526 1 Intro
Document15 pages
CS526 1 Intro
Kamran Anwar
No ratings yet
RTOS uCOS II
Document66 pages
RTOS uCOS II
Akshay Bhosale
No ratings yet
Programming Real-Time Systems With C/C++ and POSIX: Michael González Harbour
Document9 pages
Programming Real-Time Systems With C/C++ and POSIX: Michael González Harbour
goutambiswa
No ratings yet
Modern Tutorial
Document18 pages
Modern Tutorial
AnalystSE
No ratings yet
BCS 413 - Lecture5 - Replication - Consistency
Document25 pages
BCS 413 - Lecture5 - Replication - Consistency
Desmond Kampoe
No ratings yet
Basic of Contiki Process
Document50 pages
Basic of Contiki Process
HoàngCôngAnh
100% (1)