Hadoop Multi-Node Installation Guide PDF

Hadoop Installation with Multiple Data Nodes
Go to the below link and download the image of ubuntu 12.04. Extract/UnZip
downloaded Zip file in some folder.
http://www.traffictool.net/vmware/ubuntu1204t.html
Open VMware Player and click open virtual machine and select path where you have
extracted image of Ubuntu. After that select the .vmx file and click ok.
Now you can see the below screen in VMware Player.

Double click on Ubuntu present in VMware Player. You will get a screen of the below
image.
Username : user
Password : password
Open a Terminal
Update the repository:
Command:
sudo apt-get update
Once the Update is complete :
Command:
sudo apt-get install openjdk-6-jdk
After Java has been Installed, To check whether Java is installed on your
system or not give the below command:
Command:
java -version
Install openssh-server:
Command:
sudo apt-get install openssh-server
Download and extract Hadoop:
Command:
wget http://archive.apache.org/dist/hadoop/core/hadoop-1.2.0/hadoop-
1.2.0.tar.gz
Note : Ensure to make changes as per the hadoop version you are installing
Command:
tar -xvf hadoop-1.2.0.tar.gz
Create an alias (soft link) for Hadoop folder as below so hadoop folder hadoop-
1.2.0 can be referred as hadoop.
Command:
ln -s hadoop-1.2.0 hadoop
Configuration
Add JAVA_HOME in hadoop-env.sh file:
Command:
sudo gedit hadoop/conf/hadoop-env.sh
Type:
export JAVA_HOME=/usr/lib/jvm/java-6-openjdk-i386
Uncomment the below shown export and add the below the path to your
JAVA HOME:
Now create another instance of Ubuntu VM and start it on VMPlayer
[Important- Do all the above steps for second Image of Ubuntu]
Getting ip addresses for both VMs:
Command:
ifconfig
Command:
sudo gedit /etc/hosts
Getting ip addresses for both VMs:
Add both the IP addresses in the both the instances in /etc/hosts file as shown
below. Add IP Address, machine-name.domain.com machine-name for
separated by TAB
example
192.168.1.1 nn1.cluster.com nn1

192.168.1.2 dn1.cluster.com dn1
Select a VM to assign as master and note its ip address.
I have chosen VM 1 whose ip address is 192.168.75.166
[Note: core-site.xml, mapred-site.xml and hdfs-site.xml are

same for both VMs, you need to set above 3 files on both VM]
Edit core-site.xml:
Command:
sudo gedit hadoop/conf/core-site.xml
<property>
<name>fs.default.name</name>
<value>hdfs://nn1.cluster.com:8020</value>
</property>

Edit hdfs-site.xml:
Command:
gedit hadoop/conf/hdfs-site.xml
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
<property>
<name>dfs.permissions</name>
<value>false</value>
</property>
Edit mapred-site.xml:
Command:
sudo gedit hadoop/conf/mapred -site.xml
<property>
<name>mapred.job.tracker</name>
<value>nn1.cluster.com:8021</value>
</property>
Now there is a slight difference in slaves and masters file for
both VM.
On master node, masters file contains master nodes ip address

only and slaves file contains ip addresses of both vms.
On slave node, master file is blank and slaves file contains slave
VMs ip address. See the image below.
Create a ssh key:
Command:
ssh-keygen -t rsa -P ""
Note: If this command is not working. Please check whether hyphen is appearing correctly or
not. Otherwise try deleting it and re-type - only.
Creating a password-less ssh login:
Command:
ssh-copy-id -i $HOME/.ssh/id_rsa.pub user@nn1.cluster.com
Command:
ssh-copy-id -i $HOME/.ssh/id_rsa.pub user@dn1.cluster.com
Run the below commands on Master node.
Format the name node
Command:
bin/hadoop namenode format
Start the namenode, datanode and job tracker

Command:
bin/start-dfs.sh
Start the task tracker

Command:
bin/start-mapred.sh
To check if Hadoop started correctly

Command: (to check for all the Runnning JAVA Processes)
Jps
Open browser and type http://localhost:50070/dfshealth.jsp to see the current live
node.
If you are able to see two Live nodesyou are done

If you want to avoid going to hadoop/bin folder again n againTake hadoop
location in path
Add below lines in your .bashrc file.

File .bashrc file exits in your home folder(/home/user/)
Its hidden file and can be seen using a option in ls command :
ls -ltra
sudo gedit /home/user/.bashrc
Go to the end of the file by pressing ctrl+end buttons.
Add following lines at the end.
export HADOOP_HOME=/home/user/hadoop
PATH=$PATH:$HADOOP_HOME/bin
export PATH
Save and exit.

Execute profile file, otherwise new changes wont be applied until machines is restarted
Command:
source .bashrc

Hadoop Multi-Node Installation Guide PDF

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Hadoop Multi-Node Installation Guide PDF

Uploaded by

Copyright:

Available Formats

Hadoop Installation with Multiple Data Nodes

Now you can see the below screen in VMware Player.

Once the Update is complete :

[Important- Do all the above steps for second Image of Ubuntu]

Getting ip addresses for both VMs:

sudo gedit /etc/hosts

Getting ip addresses for both VMs:

below. Add IP Address, machine-name.domain.com machine-name for

192.168.1.1 nn1.cluster.com nn1

[Note: core-site.xml, mapred-site.xml and hdfs-site.xml are

sudo gedit hadoop/conf/core-site.xml

sudo gedit hadoop/conf/mapred -site.xml

On master node, masters file contains master nodes ip address

Creating a password-less ssh login:

Start the namenode, datanode and job tracker

Start the task tracker

To check if Hadoop started correctly

If you are able to see two Live nodesyou are done

Add below lines in your .bashrc file.

Go to the end of the file by pressing ctrl+end buttons.

Add following lines at the end.

Save and exit.

You might also like