You are on page 1of 17

Installation of CDH

This document consists of six parts that illustrates entire Hadoop cluster
setup:
o Overview of Cluster setup
o Creating First VM
o Cloning First VM to get other Cluster machines
o Installation of Cloudera Manager Server(CMS)
o Installation of Cloudera Distribution of Hadoop(CDH)
o Uninstalling CMS and CDH
Part1: Overview of Cluster Setup

High-level diagram of the VirtualBox VM cluster running Hadoop nodes

The overall approach is simple. We create a virtual machine, we configure it with the required
parameters and settings to act as a cluster node. This referenced virtual machine is then cloned as
many times as there will be nodes in the Hadoop cluster. Only a limited set of changes are then
needed to finalize the node to be operational (only the hostname and IP address need to be
defined).

1
©2015 algorithmica.co.in
In this article, I created a 4 nodes cluster. The first node, which will run most of the cluster
services, requires more memory (8GB) than the other 3 nodes (2GB). Overall we will allocate
14GB of memory, so ensure that the host machine has sufficient memory, otherwise this will
impact your experience negatively.

Part 2: Create the First VM with following configurations

VM creation & startup

 Create the reference virtual machine as explained in document “4.a-virtualbox, vm”.


 Change the “Boot Order” to boot from HDD
 Do the following changes to allow VM access from outside world
 Settings ->Network
o Set AttachedTo for BridgedAdapter
o Set Advanced->Promiscuous Mode to Allow All
 Click on start to boot VM from Disk.

Install GuestAdditions

Guest Additions can be installed in a Virtual Machine, after installing the Operating System in it.
It consists of device drivers and other applications which can optimize the performance and
usability of a Virtual Machine.

The Guest Additions can provide the following features:

 Mouse pointer integration


 Time synchronization
 Shared folders
 Seamless windows
 Shared clipboard

Do the following steps to install guest additions:

 Update kernel and install software required to build kernl.


o $> yum update kernel (to make sure we have up-to-date kernel)
o $> yum install gcc
o $> yum install kernel-dev
 Mount the GuestAdditions Image
o Click on Devices-> Insert GuestAdditions CD Image (This will mount
automatically)

2
©2015 algorithmica.co.in
o Reboot the machine to get the changes into effect
 Install GuestAdditions
o Double click on GuestAdditions Icon on desktop which will open autorun
window.
o Click on open autorun prompt.

Setup Users & Groups

You need to be root/super user to create users and groups

o groupadd hadoop
o useradd –G hadoop algo(adding a new user algo to hadoop group)
o passwd algo
o id algo(to check the details of created user algo)
o visudo(To add algo as part of sudoers)
 Add the following line as part of opened file and save:

algo ALL=(ALL) ALL

o reboot machine and login as algo user

Network Configuration

Perform changes in the following files to setup the network configuration that will allow all
cluster nodes to interact.

 Add the following to /etc/sysconfig/network


o NETWORKING=yes
o HOSTNAME=hadoop1
o GATEWAY=192.168.1.1(IP of router)

 Add the following to /etc/sysconfig/network-scripts/ifcfg-eth0


o DEVICE=eth0
o ONBOOT=yes
o BOOTPROTO=static
o IPADDR=192.168.1.101
o NETMASK=255.255.255.0
o ARPCHECK=no (to avoid repeated ip address conflict check)
o DNS1=192.168.1.1(IP of router)
o DNS2=8.8.8.8(IP of public google DNS server)
o NM_CONTROLLED=”yes”

 Disable the SElinux by editing the file /etc/selinux/config


o SELINUX=disabled ---- # -- change the value from enforcing to disabled

 Initialize the network by restarting the network services:

3
©2015 algorithmica.co.in
o chkconfig iptables off(stopping firewall)
o chkconfig network on(start network service automatically on bootup)
o service network restart(to bring network service immediately up)

Setup Cluster Hosts

Define all the hosts in the /etc/hosts file in order to simplify the access, in case you do not have a
DNS setup where this can be defined. Obviously add more hosts if you want to have more nodes
in your cluster.

192.168.1.101 hadoop1.example.com hadoop1


192.168.1.102 hadoop2.example.com hadoop2
192.168.1.103 hadoop3.example.com hadoop3
192.168.1.104 hadoop4.example.com hadoop4

Setup SSH

 Install ssh if not there:

$> yum install openssh

$> chkconfig sshd on

$> service sshd start

 Setup passwordless authentication

To simplify the access between hosts(passwordless authentication), setup SSH keys and

define them as already authorized

$> ssh-keygen (type enter, enter, enter)


$> cd ~/.ssh
$> cp id_rsa.pub authorized_keys

 Peform ssh configuration

Modify the ssh configuration file /etc/ssh/ssh_config by uncommenting the following line

and change the value to no; this will prevent the question when connecting with SSH to

the host:

StrictHostKeyChecking no

Shutdown the machine

4
©2015 algorithmica.co.in
You should now update all the packages(optional) and reboot the virtual machine:

$> yum update(keeps all package up-to-date)

$> init 0(shutdown)

Part3: Clone VM & Customization of Cloned VMs


Cloning VM

We will now create the remaining server nodes that will be members of the cluster. In
VirtualBox, clone the master server, using the ‘Linked Clone’ option and name the nodes
hadoop2, hadoop3 and hadoop4.

VM Customization

For every node, proceed with the following operations:

 Modify the hostname of the server, change the following line in the file:

/etc/sysconfig/network

HOSTNAME=hadoop[n]

where [n] = 2..4 (up to the number of nodes)

 Modify the fixed IP address of the server, change the following line in the file:

/etc/sysconfig/network-scripts/ifcfg-eth0

IPADDR=192.168.1.10[n]

where [n] = 2..4 (up to the number of nodes)

 Let’s restart the networking services and reboot the server, so that the above changes
takes effect:

$> service network restart

At this stage we have four running virtual machines with CentOS correctly configured.

5
©2015 algorithmica.co.in
Part4: Install the cloudera manager on one of the machines

The Cloudera Manager Free Edition installation program does the following tasks automatically:
a) Installs the Oracle JDK if it's not already installed
b) Installs the Cloudera Manager Server
c) Installs and configures an embedded PostgreSQL database

 Download the CM4 bin installer from the following link on the first server(hadoop1).
wget http://archive.cloudera.com/cm4/installer/latest/ cloudera-manager-installer.bin

or
curl -0 http://archive.cloudera.com/cm4/installer/latest/cloudera-manager-installer.bin

 Install the cloudera manager server


o cd <directory of bin file>
o chmod R 775 cloudera-manager-installer.bin
o sudo ./ cloudera-manager-installer.bin
(To use local repositories run the cloudera-manager-installer.bin with the –
skip_repo_package=1 option.)

6
©2015 algorithmica.co.in
7
©2015 algorithmica.co.in
8
©2015 algorithmica.co.in
After this step, your cloudera manager has been installed on the master server. Install CDH using
cloudera manager webapplication that is accessible at following link http://hadoop1:7180/ i.e. that’s the
name of the master server were we have installed the cloudera manager.

Part5: Install Cloudera Hadoop(CDHx.x)

After you have installed the Cloudera Manager Server and when you run it for the first time, you can use
the Cloudera Manager wizard to automatically do the following on the cluster hosts.
 Using SSH, discover the cluster hosts you specify via IP address ranges or hostnames

9
©2015 algorithmica.co.in
 Configures the package repositories for Cloudera Manager, CDH and the Oracle JDK
 Install the Cloudera Manager Agent and CDH (including Hue) on the cluster hosts
 Install the Oracle JDK if it's not already installed on the cluster hosts
 Determine mapping of services to host
 Suggest a Hadoop configuration and start the Hadoop services

a) Default id and password is admin ,admin respectively for logging into cloudera manager webapp

b) Click on just free installation

c) Click on continue button

10
©2015 algorithmica.co.in
d) Add your hostnames as defined in /etc/hosts file of all servers

e) This will check the hosts available now select all the available hosts by checking the text box
against it.

11
©2015 algorithmica.co.in
f) Click on CDH4 cloudera manager latest version of Hadoop.
Note: If you have local repository setup then select the custom Repository option for CDH and
cloudera manager and provide the url of local repository(like http://hadoop1/cloudera-repo).

g) Here either you can upload id_rsa.pub file generated through ssh-key gen command on the base
server and which is shared on all the nodes or you can use same username password on all the
nodes .Here I have chosen same user and password for all the nodes but it is recommended to
go with encrypted file .

12
©2015 algorithmica.co.in
h) Click on start installation button it will start the installation process on selected six nodes.

i) This process takes time for 10 to 15 minutes depending on your bandwidth of internet.

13
©2015 algorithmica.co.in
j) After this process is successfully completed click on continue button and that will run host
inspector to check the host for correctness. After the inspector is finished click on continue

k) At this stage select CDH4 as shown and it will ask for the services to install I have selected all
services.

14
©2015 algorithmica.co.in
l) Click on continue option it will configure all services and on the nodes

m) Below screen shot shows that cloudera manager is installing services on the nodes.

15
©2015 algorithmica.co.in
n) After this at this point we are done with installation and configuration of cloudera Hadoop .

o) Login in to the console through default id and password i.e. admin, admin respectively, you can
changer the password and id also as required. Below screen shot shows various services of
Hadoop running on the six nodes and managed by cloudera manager.

16
©2015 algorithmica.co.in
Part6: Uninstalling CMS & Cloudera Hadoop(CDHx.x)

Follow this link that explains the uninstalltion process in detail:

http://www.cloudera.com/content/cloudera/en/documentation/cloudera-manager/v4-8-
2/Cloudera-Manager-Installation-Guide/cmig_uninstall_CM.html

17
©2015 algorithmica.co.in

You might also like