Professional Documents
Culture Documents
or
# sudo apt-get install openjdk-8-jdk
-----------------------------------------------------------------------------------
------------
4. Download hadoop-3.3.1 from apache website OR
# wget https://downloads.apache.org/hadoop/common/hadoop-3.3.1/hadoop-
3.3.1.tar.gz
Extract hadoop-3.3.1
# tar -xvf hadoop-3.3.1.tar.gz
-----------------------------------------------------------------------------------
------------
5. Configure JAVA_HOME variable in hadoop-env.sh
# gedit hadoop-env.sh
# source ~/.bashrc
-----------------------------------------------------------------------------------
------------
8. ssh keygen for localhost
# ssh-keygen -t rsa
Create this file empty so "Enter" in all asked field
Check localhost
type "yes" in the asked field because we generated the key
# ssh localhost
-----------------------------------------------------------------------------------
------------
9. a) Change the hostname to the master
# sudo gedit /etc/hostname (on master node)
give name as master
Repeat a) and b) for slave machine, Here we take 2 slave machine so on each
machine /etc/hosts configure with "master", "slave1" & "slave2" with respective ip
addresses
-----------------------------------------------------------------------------------
------------
Add the Workers name in the file (on master node)
type the full hadoop path
# sudo nano hadoop-3.3.1/etc/hadoop/workers
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://master:9000</value>
</property>
</configuration>
2) mapred-site.xml
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
</configuration>
3) hdfs-site.xml
<configuration>
<property>
<name>dfs.replication</name>
<value>2</value>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>file:///{path of namenode folder}</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>file:///{path of datanode folder}</value>
</property>
</configuration>
4) yarn-site.xml
<configuration>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.resourcemanager.hostname</name>
<value>master</value>
</property>
</configuration>
-----------------------------------------------------------------------------------
------------
12. Copy configuration files from master to all slave
# scp {path of hadoop-3.3.1/etc/hadoop} slave1:{path of hadoop-3.3.1/etc/hadoop
on slave1}
# scp {path of hadoop-3.3.1/etc/hadoop} slave2:{path of hadoop-3.3.1/etc/hadoop
on slave2}
-----------------------------------------------------------------------------------
------------
13. source the environment variable
# source /etc/environment
-----------------------------------------------------------------------------------
------------
14. # hdfs namenode -format
null file create:yes
Check cluster
https://master:8088