Hadoop Installation Without Newuser

===========================================================
Hadoop multinode cluster Configuration steps (without user)

===========================================================
1. # sudo apt-get update

-----------------------------------------------------------------------------------
------------
2. # sudo apt-get upgrade
-----------------------------------------------------------------------------------
------------
3. Check java version
# java -version
If not install then

# sudo apt install default-jdk default-jre -y
or
# sudo apt-get install openjdk-8-jdk
-----------------------------------------------------------------------------------
------------
4. Download hadoop-3.3.1 from apache website OR
# wget https://downloads.apache.org/hadoop/common/hadoop-3.3.1/hadoop-
3.3.1.tar.gz
Extract hadoop-3.3.1
# tar -xvf hadoop-3.3.1.tar.gz
-----------------------------------------------------------------------------------
------------
5. Configure JAVA_HOME variable in hadoop-env.sh
# gedit hadoop-env.sh
Add this in the code
export JAVA_HOME={java folder path /usr/lib/jvm/}

-----------------------------------------------------------------------------------
------------
6. Install ssh and pdsh
# sudo apt-get install ssh
# sudo apt-get install pdsh
-----------------------------------------------------------------------------------
------------
7. Update environment variable in .bashrc file
# sudo gedit ~/.bashrc
Add this in the code
export JAVA_HOME={java folder path /usr/lib/jvm/}

export PDSH_RCMD_TYPE=ssh
export HADOOP_HOME={path where hadoop file extracted}
export HADOOP_INSTALL=$HADOOP_HOME
export HADOOP_MAPRED_HOME=$HADOOP_HOME
export HADOOP_COMMON_HOME=$HADOOP_HOME
export HADOOP_HDFS_HOME=$HADOOP_HOME
export HADOOP_YARN_HOME=$HADOOP_HOME
export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native
export PATH=$PATH:$HADOOP_HOME/sbin:$HADOOP_HOME/bin
export HADOOP_OPTS="-Djava.library.path=$HADOOP_HOME/lib/native"
# source ~/.bashrc
-----------------------------------------------------------------------------------
------------
8. ssh keygen for localhost
# ssh-keygen -t rsa
Create this file empty so "Enter" in all asked field
# cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
# chmod 640 ~/.ssh/authorized_keys
Check localhost
type "yes" in the asked field because we generated the key
# ssh localhost
-----------------------------------------------------------------------------------
------------
9. a) Change the hostname to the master
# sudo gedit /etc/hostname (on master node)
give name as master
b) Update the hosts table

# sudo gedit /etc/hosts (on master node)
ip address master
ip address slave1
ip address slave2
Repeat a) and b) for slave machine, Here we take 2 slave machine so on each
machine /etc/hosts configure with "master", "slave1" & "slave2" with respective ip
addresses
-----------------------------------------------------------------------------------
------------
Add the Workers name in the file (on master node)
type the full hadoop path
# sudo nano hadoop-3.3.1/etc/hadoop/workers
Now add the two slaves name in the file

slave1
slave2
-----------------------------------------------------------------------------------
------------
10. copy ssh key to all machine (only on master node)
Don't configure the ssh-keygen for slave
# ssh-copy-id username@master
# ssh-copy-id username@slave1
# ssh-copy-id username@slave2
-----------------------------------------------------------------------------------
------------
11. move to hadoop-3.3.1/etc/hadoop for configure various services (only on master
node)
1) core-site.xml
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://master:9000</value>
</property>
</configuration>
2) mapred-site.xml
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
</configuration>
3) hdfs-site.xml
<configuration>
<property>
<name>dfs.replication</name>
<value>2</value>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>file:///{path of namenode folder}</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>file:///{path of datanode folder}</value>
</property>
</configuration>
4) yarn-site.xml
<configuration>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.resourcemanager.hostname</name>
<value>master</value>
</property>
</configuration>
-----------------------------------------------------------------------------------
------------
12. Copy configuration files from master to all slave
# scp {path of hadoop-3.3.1/etc/hadoop} slave1:{path of hadoop-3.3.1/etc/hadoop
on slave1}
# scp {path of hadoop-3.3.1/etc/hadoop} slave2:{path of hadoop-3.3.1/etc/hadoop
on slave2}
-----------------------------------------------------------------------------------
------------
13. source the environment variable
# source /etc/environment
-----------------------------------------------------------------------------------
------------
14. # hdfs namenode -format
null file create:yes
(on master node)

# start-all.sh
If services started type "jps" it will show running components

# jps
-----------------------------------------------------------------------------------
------------
15. Check datanode running on slave using web
https://master:9870
Check cluster
https://master:8088

Hadoop Installation Without Newuser

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Hadoop Installation Without Newuser

Uploaded by

Copyright:

Available Formats

===========================================================

Hadoop multinode cluster Configuration steps (without user)

1. # sudo apt-get update

If not install then

Add this in the code

export JAVA_HOME={java folder path /usr/lib/jvm/}

Add this in the code

export JAVA_HOME={java folder path /usr/lib/jvm/}

# cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys

# chmod 640 ~/.ssh/authorized_keys

b) Update the hosts table

Now add the two slaves name in the file

(on master node)

If services started type "jps" it will show running components

You might also like