0% found this document useful (0 votes)

55 views7 pages

Installing Hadoop 3.2.4 Guide

Uploaded by

Mohamed Mensi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

55 views7 pages

Installing Hadoop 3.2.4 Guide

Uploaded by

Mohamed Mensi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

Subject: Framework et technologies Big Data

Audience: 3-IM
Lab Supervisor: Ikram Chaabane Academic Year
2024-2025

TP 2 – Hadoop Installation and configuration

(standalone installation)
1. Definitions
 Hadoop is a platform that allows storing and processing very large amounts of data in a
distributed manner. It enables data and tasks to be spread across a group of multiple machines
called a cluster.
 Server cluster, called also compute farm (Computer Cluster), or compute cluster, referring to
techniques that group multiple independent computers called nodes.
 HDFS: Hadoop's file system.
 Map-reduce : programming model of Hadoop

2. Hadoop Installation
2.1. Download a version of Hadoop
 From Apache's official website, download a version of Hadoop
https://www.apache.org/dyn/closer.cgi/hadoop/common/hadoop-3.2.4/hadoop-3.2.4.tar.gz
 Click on the link, then Save.
2.2. Extract the zipped folder
 Go to the Downloads directory and extract the zipped folder

sudo tar -zxvf hadoop-3.2.4.tar.gz

2.3. Create the hadoop folder in /usr/local

 Enter the command: sudo mkdir /usr/local/hadoop

2.4. Move the downloaded Hadoop source folder

sudo mv Téléchargements/hadoop-3.2.4 /usr/local/hadoop
2.5. Add the following aliases to the .bashrc file
Note: Aliases are shorthand substitutions for repetitive and/or lengthy commands to type in the
console. You can define your aliases in a hidden file called .bashrc, which is located in your Home
directory using sudo gedit (or nano) .bashrc (hidden file located in /home/<name>/.bashrc), write these
lines at the bottom of the file, replacing the … with the corresponding paths.
#HADOOP VARIABLES START
export JAVA_HOME=/usr/lib/jvm/java-11-openjdk-amd64
export HADOOP_INSTALL=/usr/local/hadoop/hadoop-3.2.4
export PATH=$PATH:$HADOOP_INSTALL/bin
export PATH=$PATH:$HADOOP_INSTALL/sbin

1
export HADOOP_MAPRED_HOME=$HADOOP_INSTALL
export HADOOP_COMMON_HOME=$HADOOP_INSTALL
export HADOOP_HDFS_HOME=$HADOOP_INSTALL
export YARN_HOME=$HADOOP_INSTALL
export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_INSTALL/lib/native
export HADOOP_OPTS="-Djava.library.path=$HADOOP_INSTALL/lib"
#HADOOP VARIABLES END

Ctrl X to save / Shift O / Enter

For the aliases to take effect, you need to restart the terminal or type the command
source ~/.bashrc
2.6. Configure Hadoop on a single-node cluster (pseudo-distributed mode)
To configure Hadoop, you need to modify 5 files located in
/usr/local/hadoop/hadoop<version>/etc/hadoop/, where all the configuration files are found.

2.6.1. Modify the first file hadoop-env.sh : the startup file for Hadoop daemons.
These daemons, in programming terms, are processes running in the background. Hadoop has
five daemons: - Namenode, - Secondary Namenode, - Datanode, - NodeManager, -
ResourceManager.

Since Hadoop is developed in Java, we need to specify the JDK path so it can activate its
daemons. To modify the JAVA_HOME path in the hadoop-env.sh file, first type one of the
following commands to open it.
sudo nano $HADOOP_INSTALL/etc/hadoop/hadoop-env.sh
or sudo nano /usr/local/hadoop/hadoop<version>/etc/hadoop/hadoop-env.sh
or sudo gedit /usr/local/hadoop/hadoop<version>/etc/hadoop/hadoop-env.sh

Go to the line where there is « export JAVA_HOME={…} » and modify the path by
/usr/lib/jvm/<version java>
<version java> corresponds to the name retrieved in step 2.5.
Once finished, type Ctrl + X, then O, and then Enter to save the changes made.

2.6.2. Modify the second file core-site.xml

The core-site.xml file informs the Hadoop daemons that a namenode is running on the
cluster by specifying its address..

2
Since we have only one machine in the cluster, the namenode will be on localhost
(127.0.0.1). Port 9000 is associated with the HDFS file system.
To modify core-site.xml, type :

sudo nano $HADOOP_INSTALL/etc/hadoop/core-site.xml

A la fin du fichier, ajoutez les lignes suivantes :
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://localhost:9000</value>
</property>
</configuration>

 Once finished, press Ctrl X, then O, then Enter to save the changes made.
2.6.3. Modify the third file hdfs-site.xml
The file hdfs-site.xml informs Hadoop and its HDFS system about the number of replications
(property 1) (only one in our case since we have a single machine), the address of the
transaction history of the NameNode (property 2), the address for block storage by the
DataNode (property 3), and the URLs for the web interfaces of the NameNode and the
SecondaryNameNode..

Modify this file by typing :

sudo nano $HADOOP_INSTALL/etc/hadoop/hdfs-site.xml

 At the end of the file, add the following lines:

3
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
<property>
<name>dfs.name.dir</name>
<value>/home/…/name</value>
</property>
<property>
<name>dfs.data.dir</name>
<value>/home/…/data</value>
</property>
<property>
<name>dfs.namenode.http-address</name>
<value>localhost:50070</value>
</property>
<property>
<name>dfs.namenode.secondary.http-address</name>
<value>localhost:50090</value>
</property>
</configuration>
 You should replace … by your username.

 Once finished, press Ctrl X, then O, then Enter to save the changes made.

4
2.6.4. Modify the fourth file mapred-site.xml
This file mapred-site.xml informs mainly the MapReduce package that it will run as a YARN
application (separating resource management from job management).
Open the mapred-site.xml file to make modifications:
sudo nano $HADOOP_INSTALL/etc/hadoop/mapred-site.xml

- The property mapreduce.framework.name specifies the framework that MapReduce

must use to execute tasks. It determines the environment in which the MapReduce jobs
will run, and it can significantly impact how resources are managed and how tasks are
scheduled. The possible values for this property are: yarn, local, and standalone (generally
used for development and testing).
- The property yarn.app.mapreduce.am.env is used to define specific environment
variables that will be accessible by the Application Master during its execution.
- The property mapreduce.map.env allows you to define a set of environment variables
that will be passed to the Mapper processes.
- The property mapreduce.reduce.env allows you to define a set of environment variables
that will be passed to the Reducer processes.

 Once finished, press Ctrl X, then O, then Enter to save the changes made.

2.6.5. Modify the fifth file yarn-site.xml

The yarn-site.xml file is essential for the configuration and optimization of YARN's behavior,
particularly for:

 Configuration of YARN Resources (how resources, such as memory and CPU, are allocated
to applications running on the Hadoop cluster).

5
 Definition of YARN Components (where these components are located, how they
communicate, and their operational properties).
 Management of Applications (the types of applications that can be executed and the
scheduling policies to be used, including parameters for expiration times, job priorities, and
other aspects of scheduling).
 Configuration of Network Parameters (such as the ports used for communication between
the ResourceManager and the NodeManagers, etc.).
 Definition of Quotas and Limits (such as quotas for allocated resources, ensuring that no
application monopolizes the cluster's resources).

In this lab, the yarn-site.xml file informs the NodeManager that it will have an auxiliary service
indicating to MapReduce how to perform its shuffling.
sudo nano $HADOOP_INSTALL/etc/hadoop/yarn-site.xml
 In the <configuration> tag, add the following lines:
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>

 Once finished, press Ctrl + X, then O, and then Enter to save the changes made.

2.7. Verify the Installation

Once you have completed the Hadoop configuration, you need to verify its installation. To do this,
you must first (only the first time) format the HDFS file system before starting Hadoop:
hdfs namenode -format
You should see an interface like the following:

6
2.8. Check Active Services
Before starting Hadoop, check the active services using the jps command jps
2.9. Start the Hadoop System
start-all.sh
2.10. Check Active Services After Startup
You should have the 5 daemons active apart from jps if the installation was successful.

If any of these services are missing, you may have a configuration error. Start by checking the log
files located in the $HADOOP_INSTALL/logs directory. For example, if the NameNode is not started,
check the files related to the NameNode.
If all services are present, Hadoop is functional on your machine.

View the NameNode web interface

Using your web browser on the virtual machine, you can access the NameNode web interface
at http://localhost:50070/

Hadoop Installation and YARN Setup Guide
No ratings yet
Hadoop Installation and YARN Setup Guide
11 pages
Hadoop Setup Guide for Linux Users
No ratings yet
Hadoop Setup Guide for Linux Users
23 pages
Hadoop Installation Guide for Ubuntu
No ratings yet
Hadoop Installation Guide for Ubuntu
8 pages
BDA Lab Manual UPDATED
No ratings yet
BDA Lab Manual UPDATED
45 pages
Install Hadoop: Step-by-Step Guide
No ratings yet
Install Hadoop: Step-by-Step Guide
4 pages
Install Hadoop: Standalone & Pseudo Modes
No ratings yet
Install Hadoop: Standalone & Pseudo Modes
13 pages
Hadoop Configuration
No ratings yet
Hadoop Configuration
12 pages
Install Sqoop
No ratings yet
Install Sqoop
7 pages
BDA Lab Manual-1
No ratings yet
BDA Lab Manual-1
60 pages
Ex 1
No ratings yet
Ex 1
5 pages
Hadoop Setup Guide for Ubuntu 16.04/18.04
No ratings yet
Hadoop Setup Guide for Ubuntu 16.04/18.04
20 pages
Lab 1
No ratings yet
Lab 1
12 pages
Week 1 Lab
No ratings yet
Week 1 Lab
8 pages
Week 1 in Terminal
No ratings yet
Week 1 in Terminal
10 pages
BDA Lab Manual
No ratings yet
BDA Lab Manual
49 pages
Installation of Hadoop
No ratings yet
Installation of Hadoop
6 pages
Big Data Analytics Lab Manual
No ratings yet
Big Data Analytics Lab Manual
80 pages
Hadoop Installation Step by Step
No ratings yet
Hadoop Installation Step by Step
8 pages
Unix Commands Part 2
No ratings yet
Unix Commands Part 2
37 pages
Experiment No - 1
No ratings yet
Experiment No - 1
13 pages
Big Data Analytics Lab Manual
No ratings yet
Big Data Analytics Lab Manual
33 pages
Hadoop Setup Guide for Developers
No ratings yet
Hadoop Setup Guide for Developers
7 pages
Single Node Hadoop Installation Guide
100% (1)
Single Node Hadoop Installation Guide
6 pages
Bda Lab Manual Print 3.6.24
No ratings yet
Bda Lab Manual Print 3.6.24
45 pages
Installing Standalone and Pseudocode Hadoop Cluster: 1. Setting Up Vmware Virtual Machine
No ratings yet
Installing Standalone and Pseudocode Hadoop Cluster: 1. Setting Up Vmware Virtual Machine
14 pages
Assignment Tanupriya BDDV
No ratings yet
Assignment Tanupriya BDDV
8 pages
Install Hadoop on Ubuntu Guide
No ratings yet
Install Hadoop on Ubuntu Guide
6 pages
Hadoop Installation Steps
No ratings yet
Hadoop Installation Steps
4 pages
Hive INstallation
No ratings yet
Hive INstallation
13 pages
Hadoop Installation Guide
No ratings yet
Hadoop Installation Guide
18 pages
Hadoop 2.x Single Node Setup Guide
No ratings yet
Hadoop 2.x Single Node Setup Guide
9 pages
Big Data Record
No ratings yet
Big Data Record
69 pages
DataVisuaization Lab
No ratings yet
DataVisuaization Lab
5 pages
Support of Hadoop Cluster Installation and Administration
No ratings yet
Support of Hadoop Cluster Installation and Administration
10 pages
$ Sudo Apt-Get Install Oracle-Java8-Installer
No ratings yet
$ Sudo Apt-Get Install Oracle-Java8-Installer
4 pages
Lab Manual
No ratings yet
Lab Manual
27 pages
Hbase Installationn
No ratings yet
Hbase Installationn
12 pages
Hadoop 0.20.2 Installation Guide
No ratings yet
Hadoop 0.20.2 Installation Guide
8 pages
Hadoop Installation and MapReduce Guide
No ratings yet
Hadoop Installation and MapReduce Guide
25 pages
Install Hadoop Single Node Cluster Guide
No ratings yet
Install Hadoop Single Node Cluster Guide
6 pages
Hadoop 2.7.0 Pseudo Node Setup Guide
No ratings yet
Hadoop 2.7.0 Pseudo Node Setup Guide
9 pages
Hadoop Installation and Configuration Guide
No ratings yet
Hadoop Installation and Configuration Guide
27 pages
Formatting Hadoop Namenode
No ratings yet
Formatting Hadoop Namenode
27 pages
Hadoop Installation and File Management Guide
No ratings yet
Hadoop Installation and File Management Guide
16 pages
Big Data Lab Record
No ratings yet
Big Data Lab Record
30 pages
Install Hadoop on Windows 11 Guide
No ratings yet
Install Hadoop on Windows 11 Guide
6 pages
Hadoop Multinode Cluster Installation
No ratings yet
Hadoop Multinode Cluster Installation
4 pages
Start Hadoop
No ratings yet
Start Hadoop
4 pages
Install Hadoop Single Node Cluster Guide
No ratings yet
Install Hadoop Single Node Cluster Guide
24 pages
Installation of Hadoop in Ubuntu
No ratings yet
Installation of Hadoop in Ubuntu
15 pages
Install and Configure Hadoop 3.2.1
No ratings yet
Install and Configure Hadoop 3.2.1
60 pages
Bdamanual
No ratings yet
Bdamanual
8 pages
Practical 5
No ratings yet
Practical 5
3 pages
Hadoop Cluster
No ratings yet
Hadoop Cluster
26 pages
Single Node Hadoop Cluster
No ratings yet
Single Node Hadoop Cluster
9 pages
Big Data Analytics Lab: Hadoop Setup Guide
No ratings yet
Big Data Analytics Lab: Hadoop Setup Guide
37 pages
BDA Practical Experiment 1
No ratings yet
BDA Practical Experiment 1
5 pages
Hadoop Cluster Installation Guide
No ratings yet
Hadoop Cluster Installation Guide
9 pages
The Life of A Log Segment - SAP Blogs PDF
No ratings yet
The Life of A Log Segment - SAP Blogs PDF
5 pages
Information Systems in Supply Chain Management: A Comparative Case Study of Three Organisations
No ratings yet
Information Systems in Supply Chain Management: A Comparative Case Study of Three Organisations
17 pages
The Complete Guide To RAD Server Ebook
No ratings yet
The Complete Guide To RAD Server Ebook
196 pages
Cover Letter
No ratings yet
Cover Letter
1 page
OSY2 Microproject
No ratings yet
OSY2 Microproject
18 pages
Plan and Implement Data Platform Resources
No ratings yet
Plan and Implement Data Platform Resources
6 pages
GA4 Migration Cheat Sheet
No ratings yet
GA4 Migration Cheat Sheet
2 pages
Basics of Ethical Hacking Explained
No ratings yet
Basics of Ethical Hacking Explained
13 pages
Strategy Deck - SDGH 02282025
No ratings yet
Strategy Deck - SDGH 02282025
29 pages
Solid Programming Abilities
No ratings yet
Solid Programming Abilities
4 pages
Data Engineer Marco G. Divito Profile
No ratings yet
Data Engineer Marco G. Divito Profile
2 pages
Presented By: A Paper Presentation On
No ratings yet
Presented By: A Paper Presentation On
7 pages
University Patch Management Policy
No ratings yet
University Patch Management Policy
2 pages
How To Crack Lumion
100% (2)
How To Crack Lumion
8 pages
Bhavya Sharma: Work Experience Technical Skills, Database & Server
No ratings yet
Bhavya Sharma: Work Experience Technical Skills, Database & Server
1 page
Splunk Core User Exam Guide
100% (1)
Splunk Core User Exam Guide
41 pages
Digital Solutions for Global Enterprises
No ratings yet
Digital Solutions for Global Enterprises
162 pages
Iv Ii Da PPT 1
No ratings yet
Iv Ii Da PPT 1
30 pages
Mahrashtra State Board of Technical Eduction, Mumbai.: A Microproject Report On
No ratings yet
Mahrashtra State Board of Technical Eduction, Mumbai.: A Microproject Report On
11 pages
Project Final Year On Password Management System
25% (4)
Project Final Year On Password Management System
52 pages
Cb3401-Unit 1
100% (1)
Cb3401-Unit 1
28 pages
Database Lockouts vs. Deadlocks Explained
No ratings yet
Database Lockouts vs. Deadlocks Explained
3 pages
Essential DOS Commands Guide
No ratings yet
Essential DOS Commands Guide
20 pages
DBMS vs File System: Key Differences
No ratings yet
DBMS vs File System: Key Differences
19 pages
Web Mining and Social Graph Analysis
No ratings yet
Web Mining and Social Graph Analysis
13 pages
Question Bank DDM
No ratings yet
Question Bank DDM
20 pages
BI Questions and Answers - 96677d2e f170 4eed A271 8b6cb6a5b01a
No ratings yet
BI Questions and Answers - 96677d2e f170 4eed A271 8b6cb6a5b01a
8 pages
Hostel Management System
60% (5)
Hostel Management System
42 pages
Get Last Inserted ID in PHP MySQL
No ratings yet
Get Last Inserted ID in PHP MySQL
4 pages
10 Disk Management
No ratings yet
10 Disk Management
39 pages

Installing Hadoop 3.2.4 Guide

Uploaded by

Installing Hadoop 3.2.4 Guide

Uploaded by

Subject: Framework et technologies Big Data

TP 2 – Hadoop Installation and configuration

sudo tar -zxvf hadoop-3.2.4.tar.gz

2.3. Create the hadoop folder in /usr/local

2.4. Move the downloaded Hadoop source folder

Ctrl X to save / Shift O / Enter

2.6.2. Modify the second file core-site.xml

sudo nano $HADOOP_INSTALL/etc/hadoop/core-site.xml

Modify this file by typing :

 At the end of the file, add the following lines:

 In the <configuration> tag, add the following lines:

- The property mapreduce.framework.name specifies the framework that MapReduce

2.6.5. Modify the fifth file yarn-site.xml

2.7. Verify the Installation

View the NameNode web interface

You might also like