Hadoop Installation Steps

Hadoop Installation for single node cluster
Note : In this document, steps for installing Hadoop-2.7.7 on ubuntu 14.04 (64 bit) is described.
Steps :
1. Install Java
2. Disable ipv6
3. Add a dedicated hadoop user
4. Install ssh
5. Give hduser sudo permission
6. Set up ssh certificates
7. Install hadoop
8. Configure hadoop
• .bashrc
• Hadoop env.sh
• core-site.xml
• mapred-site.xml
• hdfs-site.xml
9. Restart the ubuntu
10. Format hadoop file system
11. Start hadoop
12. Testing whether it is running or not
13. Stopping Hadoop
Detailed Step-wise Installation Procedure :===>

Step 1 : Install Java
sudo add-apt-repository ppa:openjdk-r/ppa
sudo apt-get update
sudo apt-get install openjdk-8-jdk
update-java-alternatives -l
sudo update-alternatives --config java
sudo update-alternatives --config javac
Step 2: Disable ipv6

sudo apt-get install vim
sudo vim /etc/sysctl.conf
#disable ipv6
net.ipv6.conf.all.disable_ipv6=1
net.ipv6.conf.default.disable_ipv6=1
net.ipv6.conf.lo.disable_ipv6=1
cat /proc/sys/net/ipv6/conf/all/disable_ipv6 ..........................(should return zero)
Step 3: Add a dedicated hadoop user and add to groups
sudo addgroup hadoop
sudo adduser --ingroup hadoop hduser
Step 4 : Install ssh
sudo apt-get install ssh
which ssh
which sshd
Step 5: Give hduser sudo permission
sudo adduser hduser sudo
Step 6: Set up ssh certificates
su hduser
ssh-keygen -t rsa -P ""
cat $HOME/.ssh/id_rsa.pub >> $HOME/.ssh/authorized_keys
ssh localhost
Step 7: Install Hadoop
su hduser
wget https://archive.apache.org/dist/hadoop/core/hadoop-2.7.7/hadoop-2.7.7.tar.gz
tar xvzf hadoop-2.7.7.tar.gz
cd hadoop-2.7.7
sudo mkdir /usr/local/hadoop
sudo mv * /usr/local/hadoop
cd ..
Step 8: Set up the configuration files
vim ~/.bashrc
• Hadoop variable start
export JAVA_HOME=/usr/lib/jvm/java-1.8.0-openjdk-amd64
export HADOOP_INSTALL=/usr/local/hadoop
export PATH=$PATH:$HADOOP_INSTALL/bin
export PATH=$PATH:$HADOOP_INSTALL/sbin
export HADOOP_MAPRED_HOME=$HADOOP_INSTALL
export HADOOP_COMMON_HOME=$HADOOP_INSTALL
export HADOOP_HDFS_HOME=$HADOOP_INSTALL
export YARN_HOME=$HADOOP_INSTALL
export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_INSTALL/lib/native
export HADOOP_OPTS="-Djava.library.path=$HADOOP_INSTALL/lib"
Hadoop variable ends
source ~/.bashrc .......... (on command prompt)
Hadoop env.sh (update existing JAVA_HOME)
vim /usr/local/hadoop/etc/hadoop/hadoop-env.sh
export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64
• core-site.xml
sudo mkdir -p /app/hadoop/tmp
sudo chown hduser:hadoop /app/hadoop/tmp
vim /usr/local/hadoop/etc/hadoop/core-site.xml
<configuration> (................. add below text under existing configuration)
<property>
<name>hadoop.tmp.dir</name>
<value>/app/hadoop/tmp</value>
<description>A base for other temporary directories.</description>
</property>
<property>
<name>fs.default.name</name>
<value>hdfs://localhost:54310</value>
<description>
The name of the default file system. A URI whose
scheme and authority determine the FileSystem implementation.
The uri's scheme determines the config property (fs.SCHEME.impl)
naming the FileSystem implementation class. The uri's authority
is used to determine the host, port, etc. for a filesystem.
</description>
</property>
</configuration>
• mapred-site.xml
cp /usr/local/hadoop/etc/hadoop/mapred-site.xml.template
/usr/local/hadoop/etc/hadoop/mapred-site.xml
vim /usr/local/hadoop/etc/hadoop/mapred-site.xml
<property>
<name>mapred.job.tracker</name>
<value>localhost:54311</value>
<description>The host and port that the MapReduce job tracker runs
at. If "local", then jobs are run in-process as a single map
and reduce task.
</description>
</property>
</configuration>
• hdfs-site.xml
sudo mkdir -p /usr/local/hadoop_store/hdfs/namenode
sudo mkdir -p /usr/local/hadoop_store/hdfs/datanode
sudo chown -R hduser:hadoop /usr/local/hadoop_store
vim /usr/local/hadoop/etc/hadoop/hdfs-site.xml
<property>
<name>dfs.replication</name>
<value>1</value>
<description>
Default block replication.
The actual number of replication can be specified when the file is created.
The default is used if replication is not specified in create time
</description>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>file:/usr/local/hadoop_store/hdfs/namenode</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>file:/usr/local/hadoop_store/hdfs/datanode</value>
</property>
</configuration>
9. Restart the ubuntu
10. Format hadoop file system
su hduser
hadoop namenode -format
11. Starting hadoop
cd /usr/local/hadoop/sbin
sudo chown -R hduser:hadoop /usr/local/hadoop/
start-all.sh
12. Testing if it is working
jps
netstat -plten | grep java
http://localhost:50070
13. Stopping Hadoop
stop-all.sh
jps

Hadoop Installation Steps

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Hadoop Installation Steps

Uploaded by

Copyright:

Available Formats

Hadoop Installation for single node cluster

Detailed Step-wise Installation Procedure :===>

sudo add-apt-repository ppa:openjdk-r/ppa

sudo apt-get update

sudo apt-get install openjdk-8-jdk

sudo update-alternatives --config java

sudo update-alternatives --config javac

Step 2: Disable ipv6

sudo vim /etc/sysctl.conf

cat /proc/sys/net/ipv6/conf/all/disable_ipv6 ..........................(should return zero)

Step 3: Add a dedicated hadoop user and add to groups

sudo addgroup hadoop

sudo adduser --ingroup hadoop hduser

Step 4 : Install ssh

sudo apt-get install ssh

Step 5: Give hduser sudo permission

sudo adduser hduser sudo

Step 6: Set up ssh certificates

ssh-keygen -t rsa -P ""

cat $HOME/.ssh/id_rsa.pub >> $HOME/.ssh/authorized_keys

Step 7: Install Hadoop

sudo mkdir /usr/local/hadoop

Step 8: Set up the configuration files

• Hadoop variable start

Hadoop variable ends

source ~/.bashrc .......... (on command prompt)

Hadoop env.sh (update existing JAVA_HOME)

sudo mkdir -p /app/hadoop/tmp

sudo chown hduser:hadoop /app/hadoop/tmp

<configuration> (................. add below text under existing configuration)

<description>A base for other temporary directories.</description>

The name of the default file system. A URI whose

scheme and authority determine the FileSystem implementation.

The uri's scheme determines the config property (fs.SCHEME.impl)

naming the FileSystem implementation class. The uri's authority

is used to determine the host, port, etc. for a filesystem.

at. If "local", then jobs are run in-process as a single map

and reduce task.

sudo mkdir -p /usr/local/hadoop_store/hdfs/namenode

sudo mkdir -p /usr/local/hadoop_store/hdfs/datanode

sudo chown -R hduser:hadoop /usr/local/hadoop_store

<configuration> (................. add below text under existing configuration)

Default block replication.

The default is used if replication is not specified in create time

9. Restart the ubuntu

10. Format hadoop file system

hadoop namenode -format

11. Starting hadoop

sudo chown -R hduser:hadoop /usr/local/hadoop/

12. Testing if it is working

netstat -plten | grep java

13. Stopping Hadoop

You might also like