Professional Documents
Culture Documents
This document consists of six parts that illustrates entire Hadoop cluster
setup:
o Overview of Cluster setup
o Creating First VM
o Cloning First VM to get other Cluster machines
o Installation of Cloudera Manager Server(CMS)
o Installation of Cloudera Distribution of Hadoop(CDH)
o Uninstalling CMS and CDH
Part1: Overview of Cluster Setup
The overall approach is simple. We create a virtual machine, we configure it with the required
parameters and settings to act as a cluster node. This referenced virtual machine is then cloned as
many times as there will be nodes in the Hadoop cluster. Only a limited set of changes are then
needed to finalize the node to be operational (only the hostname and IP address need to be
defined).
1
©2015 algorithmica.co.in
In this article, I created a 4 nodes cluster. The first node, which will run most of the cluster
services, requires more memory (8GB) than the other 3 nodes (2GB). Overall we will allocate
14GB of memory, so ensure that the host machine has sufficient memory, otherwise this will
impact your experience negatively.
Install GuestAdditions
Guest Additions can be installed in a Virtual Machine, after installing the Operating System in it.
It consists of device drivers and other applications which can optimize the performance and
usability of a Virtual Machine.
2
©2015 algorithmica.co.in
o Reboot the machine to get the changes into effect
Install GuestAdditions
o Double click on GuestAdditions Icon on desktop which will open autorun
window.
o Click on open autorun prompt.
o groupadd hadoop
o useradd –G hadoop algo(adding a new user algo to hadoop group)
o passwd algo
o id algo(to check the details of created user algo)
o visudo(To add algo as part of sudoers)
Add the following line as part of opened file and save:
Network Configuration
Perform changes in the following files to setup the network configuration that will allow all
cluster nodes to interact.
3
©2015 algorithmica.co.in
o chkconfig iptables off(stopping firewall)
o chkconfig network on(start network service automatically on bootup)
o service network restart(to bring network service immediately up)
Define all the hosts in the /etc/hosts file in order to simplify the access, in case you do not have a
DNS setup where this can be defined. Obviously add more hosts if you want to have more nodes
in your cluster.
Setup SSH
To simplify the access between hosts(passwordless authentication), setup SSH keys and
Modify the ssh configuration file /etc/ssh/ssh_config by uncommenting the following line
and change the value to no; this will prevent the question when connecting with SSH to
the host:
StrictHostKeyChecking no
4
©2015 algorithmica.co.in
You should now update all the packages(optional) and reboot the virtual machine:
We will now create the remaining server nodes that will be members of the cluster. In
VirtualBox, clone the master server, using the ‘Linked Clone’ option and name the nodes
hadoop2, hadoop3 and hadoop4.
VM Customization
Modify the hostname of the server, change the following line in the file:
/etc/sysconfig/network
HOSTNAME=hadoop[n]
Modify the fixed IP address of the server, change the following line in the file:
/etc/sysconfig/network-scripts/ifcfg-eth0
IPADDR=192.168.1.10[n]
Let’s restart the networking services and reboot the server, so that the above changes
takes effect:
At this stage we have four running virtual machines with CentOS correctly configured.
5
©2015 algorithmica.co.in
Part4: Install the cloudera manager on one of the machines
The Cloudera Manager Free Edition installation program does the following tasks automatically:
a) Installs the Oracle JDK if it's not already installed
b) Installs the Cloudera Manager Server
c) Installs and configures an embedded PostgreSQL database
Download the CM4 bin installer from the following link on the first server(hadoop1).
wget http://archive.cloudera.com/cm4/installer/latest/ cloudera-manager-installer.bin
or
curl -0 http://archive.cloudera.com/cm4/installer/latest/cloudera-manager-installer.bin
6
©2015 algorithmica.co.in
7
©2015 algorithmica.co.in
8
©2015 algorithmica.co.in
After this step, your cloudera manager has been installed on the master server. Install CDH using
cloudera manager webapplication that is accessible at following link http://hadoop1:7180/ i.e. that’s the
name of the master server were we have installed the cloudera manager.
After you have installed the Cloudera Manager Server and when you run it for the first time, you can use
the Cloudera Manager wizard to automatically do the following on the cluster hosts.
Using SSH, discover the cluster hosts you specify via IP address ranges or hostnames
9
©2015 algorithmica.co.in
Configures the package repositories for Cloudera Manager, CDH and the Oracle JDK
Install the Cloudera Manager Agent and CDH (including Hue) on the cluster hosts
Install the Oracle JDK if it's not already installed on the cluster hosts
Determine mapping of services to host
Suggest a Hadoop configuration and start the Hadoop services
a) Default id and password is admin ,admin respectively for logging into cloudera manager webapp
10
©2015 algorithmica.co.in
d) Add your hostnames as defined in /etc/hosts file of all servers
e) This will check the hosts available now select all the available hosts by checking the text box
against it.
11
©2015 algorithmica.co.in
f) Click on CDH4 cloudera manager latest version of Hadoop.
Note: If you have local repository setup then select the custom Repository option for CDH and
cloudera manager and provide the url of local repository(like http://hadoop1/cloudera-repo).
g) Here either you can upload id_rsa.pub file generated through ssh-key gen command on the base
server and which is shared on all the nodes or you can use same username password on all the
nodes .Here I have chosen same user and password for all the nodes but it is recommended to
go with encrypted file .
12
©2015 algorithmica.co.in
h) Click on start installation button it will start the installation process on selected six nodes.
i) This process takes time for 10 to 15 minutes depending on your bandwidth of internet.
13
©2015 algorithmica.co.in
j) After this process is successfully completed click on continue button and that will run host
inspector to check the host for correctness. After the inspector is finished click on continue
k) At this stage select CDH4 as shown and it will ask for the services to install I have selected all
services.
14
©2015 algorithmica.co.in
l) Click on continue option it will configure all services and on the nodes
m) Below screen shot shows that cloudera manager is installing services on the nodes.
15
©2015 algorithmica.co.in
n) After this at this point we are done with installation and configuration of cloudera Hadoop .
o) Login in to the console through default id and password i.e. admin, admin respectively, you can
changer the password and id also as required. Below screen shot shows various services of
Hadoop running on the six nodes and managed by cloudera manager.
16
©2015 algorithmica.co.in
Part6: Uninstalling CMS & Cloudera Hadoop(CDHx.x)
http://www.cloudera.com/content/cloudera/en/documentation/cloudera-manager/v4-8-
2/Cloudera-Manager-Installation-Guide/cmig_uninstall_CM.html
17
©2015 algorithmica.co.in