Professional Documents
Culture Documents
N
REGISTRATION NO : 201081010
FINAL YEAR BTECH
BRANCH : IT
SUBJECT : BDA LAB
Aim:ComparedifferentversionsofHadoop(Hadoop1.x,Hadoop2.x,andHadoop3.x)
Also setup Hadoop 1.x single node cluster.
T heory :
verview:HDFSisadistributedfilesystemthatprovidesareliableandscalablestorage
O
infrastructure for Hadoop. It is designed to store and manage very large files across
multiple nodes in a Hadoop cluster.
Key Features:
calability: HDFS can scale horizontally by adding more nodes to the cluster,
S
accommodating the storage needs of big data applications.
verview:YARNistheresourcemanagementlayerofHadoop,responsibleformanaging
O
and allocating resources in a Hadoop cluster. It allows multiple applications to share
resources efficiently.
Key Components:
esourceManager: Manages and allocates resources to various applications in the
R
cluster.
Benefits:
fficient Resource Utilization: YARN allows dynamic allocation of resources, ensuring
E
that the available resources are utilized optimally.
ulti-Tenancy: Multiple applications can coexist on the same Hadoop cluster without
M
interfering with each other.
3. MapReduce:
verview: MapReduce is a programming model and processing engine for distributed
O
computing in Hadoop. It allows the processing of large datasets in parallel across a
Hadoop cluster.
Key Components:
educer: Aggregates and processes the intermediate key-value pairs produced by the
R
mappers.
Workflow:
ap Phase: Input data is divided into smaller chunks and processed by individual
M
mappers.
Reduce Phase:Reduced tasks process the sorted data and produce the final output.
Key Components:
adoopDistributedShell:AframeworkforrunningdistributedapplicationsonHadoop
H
clusters.
ole: Hadoop Common acts as the glue that binds different components together,
R
providing a common set of tools and rules for seamless integration and communication
within the Hadoop ecosystem.
Comparison of different versions of Hadoop (Hadoop 1.x, Hadoop 2.x,
and Hadoop 3.x)
Setup of Hadoop Single Node Cluster :
wget https://downloads.apache.org/hadoop/common/stable/hadoop-3.3.6.tar.gz
export JAVA_HOME=/usr/local/java
export PATH=$PATH:$JAVA_HOME/bin
export HADOOP_INSTALL=/usr/local/hadoop
export PATH=$PATH:$HADOOP_INSTALL/bin
export PATH=$PATH:$HADOOP_INSTALL/sbin
export HADOOP_MAPRED_HOME=$HADOOP_INSTALL
export HADOOP_COMMON_HOME=$HADOOP_INSTALL
export HADOOP_HDFS_HOME=$HADOOP_INSTALL
export YARN_HOME=$HADOOP_INSTALL
export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_INSTALL/lib/native
export HADOOP_OPTS="-Djava.library.path=$HADOOP_INSTALL/lib"
6. To change ports on which ssh runs change in sshd_config (this case PORT 2222) in
/etc/ssh/sshd_config
7. For Password less authentication ssh do following
b. core-site.xml :
<configuration>
<property>
<name>fs.default.name</name>
<value>hdfs://localhost:9000/</value>
</property>
</configuration>
c. yarn-site.xml :
<configuration>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>
</configuration>
d. mapred-site.xml :
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
</configuration>
e. hdfs-site.xml :
<configuration>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>
</configuration>
/usr/local/hadoop/sbin/start-dfs.sh
13.In browser go to → http://localhost:8088 to view Resource Manager and
http://localhost:8042for HDFS NameNode web interface.
14. To stop Hadoop daemons and ecosystem all at once execute stop-all.sh script in sbin
Conclusion :
And Also setting up Hadoop 1.x single node cluster in the assignment.