You are on page 1of 30

CREATING HBASE CLUSTER AND REPLICATION ON

AWS
1 Setting up Amazon EC2 Instances
Creating two clusters on same regions with 3 node on one cluster and 3 nodes on
other Clusters with minimum volume of 8GB.

1.1 Launch Instance


Login to Amazon Web Services, click on My Account and navigate to Amazon
EC2 Console

1.2 Select AMI


Select the Ubuntu-precise-12.04 Server 64 bit OS

1.3 Select Instance Type


Select the `Instance Type` as `m3.medium.

1.4 Configure Number of Instances


Provide the instance details ,shutdown behavior and availability zone.

1.5 Add Storage


Use the default options in the below screen.

1.6 Instance Description


Provide instance name and description

1.7 Define a Security Group


It is very important to configure the EC2 firewall correctly. On the Configure
Firewall page choose Create a new Security Group, and authorize all the ports
listed below:

1.8 Review and Launch Instance.


Check the instance details and click launch

1.9 Launch Instance and Create Security Pair


Amazon EC2 uses publickey cryptography to encrypt and decrypt login information.
Publickey cryptography uses a public key to encrypt a piece of data, such as a
password, then the recipient uses the private key to decrypt the data. The public
and private keys are known as a key pair.

1.10 Define a Security Group


Create a new security group, and modify the security group with security rules.

1.11 Launching Instances


Once you click Launch Instance 6 instance should be launched with pending
state

Once in running state rename the instance name as below.


NameNode
Standby1
Standby2
Master
Slave1
Slave2

2 Setting up client access to Amazon Instances


Create a new keypair and give it a name Clusterkey and download the keypair
(.pem) file to your local machine. Click Launch Instance

2.1 Generating Private Key


Lets launch PUTTYGEN client and import the key pair which is already created
during launch instance step Clusterkey.pem
Navigate to Conversions and Import Key

Click Generate ,

Save Private Key


Now save the private key by clicking on Save Private Key and click Yes and
leave passphrase empty.

2.2 Connect to Amazon Instance


Launch Putty client and Load the ppk file

.
Repeat this for slave nodes.

2.3 Setup WinSCP access to EC2 instances:


In order to securely transfer files from your windows machine to Amazon EC2
WinSCP is a handy utility.
For User name, enter the default user name for your AMI. For Amazon Ubuntu AMIs,
the user name is Ubuntu
For Private key, enter the path to your private key, or click the "" button to browse
for the file.
Click Login to connect, and click Yes to add the host fingerprint to the host cache.

Select the pem file clusterkey.pem file and drag it to other right pane.

Repeat this for slave nodes.

3 Setup Password-less SSH on Servers


Master server remotely starts services on salve nodes, whichrequires password-less
access to Slave Servers. AWS Ubuntu server comes with pre-installed OpenSSh
server.
The public part of the key loaded into the agent must be put on the target system in
~/.ssh/authorized_keys. This has been taken care of by the AWS Server creation
process

Now we need to add the AWS EC2 Key Pair identity Clusterkey.pem to ssh profile In
order to do that we will need to use following ssh utilities

ssh-agent is a background program that handles passwords for SSH private


keys.

ssh-add command prompts the user for a private key password and adds it
to the list maintained by ssh-agent. Once you add a password to ssh-agent,
you will not be asked to provide the key when using SSH or SCP to connect to
hosts with your public key.

Amazon EC2 Instance has already taken care of authorized_keys on master server,
execute following commands to allow password-less SSH access to slave servers.

Steps:

In a command line shell, change directories to the location of the private key file
that you created when you launched the instance.

Use the chmod command to make sure your private key file isn't publicly
viewable. For example, if the name of your private key file is my-key-pair.pem, you
would use the following command:
chmod 400 Clusterkey.pem

Use the ssh command to connect to the instance. You'll specify the private key
(.pem) file and username@public_dns_name. For Amazon Ubuntu, the default user
name is ubuntu. For RHEL5, the user name is often root but might be ec2-user. For
Ubuntu, the user name is ubuntu. For SUSE Linux, the user name is root. Otherwise,
check with your AMI provider.

ssh -i Clusterkey.pem ubuntu@ec2-54-241-10-95.compute-1.amazonaws.com


You'll see a response like the following.
The authenticity of host 'ec2-198-51-100-1.compute-1.amazonaws.com
(10.254.142.33)'
can't be established.
RSA key fingerprint is
1f:51:ae:28:bf:89:e9:d8:1f:25:5d:37:2d:7d:b8:ca:9f:f5:f1:6f.
Are you sure you want to continue connecting (yes/no)?

(Optional) If you've launched a public AMI, verify that the fingerprint in the security alert
matches the fingerprint that you obtained in step 1. If these fingerprints don't match, someone
might be attempting a "man-in-the-middle" attack. If they match, continue to the next step
Enter yes.
You'll see a response like the following.
Warning: Permanently added 'ec2-54-241-10-95.compute-1.amazonaws.com' (RSA)
to the list of known hosts.
Sample screenshot for the password-less ssh,

4 Download the Cloudera Manager 4.5 installer and execute it on


the remote instance:
$ wget http://archive.cloudera.com/cm4/installer/latest/cloudera-managerinstaller.bin
$ chmod +x cloudera-manager-installer.bin
$ sudo ./cloudera-manager-installer.bin

Click Yes,

Note down the http://localhost:7180/ this is used to open the Cloudera Manager
Console using browser.

4.2 Installing a CDH Cluster with Cloud Express Wizard


After logging in, Cloudera Manager will detect that it runs on EC2, and it will greet
you with the welcome screen of the new wizard (see below). There is a warning that
the instances started by this installer are instance store-based, which implies that
stopping or terminating these instances results in losing all data stored on them.
Remember to back-up important data from the cluster before terminating the
instances!
Default username:admin
Default password:admin

Select Cloudera Enterprise Trial and click next,

Click Launch the classic wizard,

Click continue,

Enter the internal ips of each node on the clusters

Select the package,versoin and release ,

Login as Ubuntu user and click browse to upload the .pem file and click continue

Installation Progress Starts here,

If No issues with configurations installation will complete successfully.

Click Continue,

Choose the CDH services whichever required, and click inspect Assignments,

Assign appropriate services and its roles to the required hosts

Click test connection,

Click continue,

Cluster services starts here,

Check the health status and configuration issues it should shows good health

The Java Heap size recommended minimum size is 1G

HBase Replication:

Step1:
Enable the replication In the Cloudera Manager as below

Restart the HBase

Step2:
Add the following code to HBase's configuration file (hbase-site.xml) to enable
replication on the master cluster:
hadoop@master1$ vi $HBASE_HOME/conf/hbase-site.xml
<property>
<name>hbase.replication</name>
<value>true</value>
</property>
Sync the change to all the servers, including the client nodes in the cluster, and
restart HBase.
Repeat this to slave node.
Step3:

hbase(main):010:0> create 'emp', { NAME => 'Details', REPLICATION_SCOPE =>1}


0 row(s) in 1.1070 seconds
=> Hbase::Table - emp
hbase(main):011:0> disable 'emp'
0 row(s) in 1.2170 seconds
If you are using an existing table, alter it to support replication:

hbase(main):012:0> alter 'emp', NAME => 'cf1', REPLICATION_SCOPE => '1'


Updating all regions with the new schema...
1/1 regions updated.
Done.
0 row(s) in 1.5200 seconds

hbase(main):013:0> enable 'emp'


0 row(s) in 1.1860 seconds
Execute steps 2 to 3 on the peer (slave) cluster as well. This includes enabling
replication, restarting HBase, and creating an identical copy of the table.
Step4:
hbase(main):014:0> start_replication
0 row(s) in 0.1210 seconds
hbase(main):016:0> put 'emp', 'row1', 'Details:name','devaraj'
0 row(s) in 0.0180 seconds
hbase(main):017:0>put 'emp','row1','Details:Eid','1009'
0 row(s) in 0.0130 seconds

hbase(main):019:0>put 'emp','row1','Details:mobile','90000101011'
0 row(s) in 0.0140 seconds
hbase(main):021:0> put 'emp','row1','Details:Year','2013'

0 row(s) in 0.0110 seconds


hbase(main):022:0> put 'emp','row2','Details:Name','Prabu'
Step5:
To check peer is enabled or not:
hbase(main):001:0> list_peers
PEER_ID CLUSTER_KEY STATE
1 ip-10-202-169-141.us-west-1.compute.internal:2181:/hbase ENABLED
2 ip-10-190-147-97.us-west-1.compute.internal:2181:/hbase ENABLED
3 ip-10-249-0-249.us-west-1.compute.internal:2181:/hbase ENABLED

hbase(main):002:0> add_peer '2', 'ip-10-190-147-97.us-west1.compute.internal:2181:/hbase'


0 row(s) in 0.0290 seconds

hbase(main):003:0> add_peer '3', 'ip-10-249-0-249.us-west1.compute.internal:2181:/hbase'


0 row(s) in 0.0700 seconds.

Step6:
Connect to HBase Shell on the peer cluster and do a scan on the table to see if the
data has been replicated:
$HBASE_HOME/bin/hbase shell

hbase> scan ' emp'


ROW
row1
timestamp=1401702464224, value=Devaraj
row1
timestamp=1401703326645, value=1010

COLUMN+CELL
column=Details:name,
column=Details:Eid,

HADOOP_HOME/bin/hadoop jar $HBASE_HOME/hbase0.92.1.jar verifyrep 1 emp

Step6:
Stop the replication on the master cluster by running the following command:
hbase> stop_replication

Step7:
Remove the replication peer from the master cluster by using the following
command:
hbase> remove_peer '1'