You are on page 1of 36

How to make the best use of Live Sessions

• Please login on time

• Please do a check on your network connection and audio before the class to have a smooth session

• All participants will be on mute, by default. You will be unmuted when requested or as needed

• Please use the “Questions” panel on your webinar tool to interact with the instructor at any point during the
class

• Ask and answer questions to make your learning interactive

• Please have the support phone number (US : 1855 818 0063 (toll free), India : +91 90191 17772) and raise
tickets from LMS in case of any issues with the tool

• Most often logging off or rejoining will help solve the tool related issues

Copyright © 2017, edureka and/or its affiliates. All rights reserved.


Big Data & Hadoop Certification Training

Copyright © 2017, edureka and/or its affiliates. All rights reserved.


Course Outline
Understanding Big Data Kafka Monitoring &
and Hadoop Hive
Stream Processing

Hadoop Architecture Integration of Kafka


Kafka Producer Advance Hive and HBase
& HDFS with Hadoop & Storm

Hadoop MapReduce Integration of Kafka


Kafka Consumer Advance HBase
with Spark & Flume
Framework

Kafka Operation and Processing Distributed Data


Advance MapReduce
Performance Tuning with Apache Spark

Kafka Cluster Architectures Apache Oozie and Hadoop


Pig Kafka Project
& Administering Kafka Project

Copyright © 2017, edureka and/or its affiliates. All rights reserved.


Module 8: Advance HBase

Copyright © 2017, edureka and/or its affiliates. All rights reserved.


Objectives
At the end of this module, you will be able to:

• Understand HBase Attributes

• Understand Data Model and Physical Storage in HBase

• Execute basic commands on HBase shell

• Analyze Data Loading Techniques in HBase

• Implement HBase API

• Understand Zookeeper Data Model and its Services

• Analyze Relationship Between HBase and Zookeeper

• Perform Advance HBase Actions

Copyright © 2017, edureka and/or its affiliates. All rights reserved.


Simplifying HBase: It is Actually a Simple Map

{
"zzzzz" : "woot",
"xyz" : "hello",
"aaaab" :”world",
"1" : "x",
"aaaaa" : "y"
}

Copyright © 2017, edureka and/or its affiliates. All rights reserved.


Simplifying HBase: The Map is Sorted

{ {
"zzzzz" : "woot", "1" : "x",
"xyz" : "hello", "aaaaa" : "y",
"aaaab" :"world", "aaaab" : world",
"1" : "x", "xyz" : "hello",
"aaaaa" : "y" "zzzzz" : "woot"
} }

Copyright © 2017, edureka and/or its affiliates. All rights reserved.


Simplifying HBase: Data Co-ordinates
{ Column family
Rowkey “emp1”:{
“personal”:{
“name”:{
1329088321289 : “tejas“
Column },
Qualifiers “age”:{
1329088321289 : “44“
}, Versions
"password" : {
1329088818321 : "abc123",
1329088321289 : “shannu123“
}
}
}
}

Copyright © 2017, edureka and/or its affiliates. All rights reserved.


Simplifying HBase: Multi-dimensional Columns
{
"aaaaa" : {
"A" : {
"foo" : "y", Foo and Bar are columns of column
"bar" : "d" family A
},
"B" : {
"" : "w"
}
},
"aaaab" : {
"A" : {
"foo" : "world", Foo and Bar are also called as Qualifier
"bar" : or Label
"domination"
},
"B" : {
"" : "ocean"
}
},
}

Copyright © 2017, edureka and/or its affiliates. All rights reserved.


Simplifying Hbase: Time Stamps as Versions
{
"aaaaa" : {
"A" : {
"foo" : {
15 : "y",
15 and 14 are examples timestamps. “y” and “m” are
14 : "m"
the values at that timestamps
},
"bar" : {
15 : "d",
• row/column of "aaaaa"/"A:foo" will return "y“
}
}, • row/column/timestamp of "aaaaa"/"A:foo"/4 will return "m“
"B" : {
• row/column/timestamp of "aaaaa"/"A:foo"/2 will return a null
"" : {
6 : "w" • (Table, RowKey, Family, Column, Timestamp) → Value
3 : "o"
1 : "w"
}
}
},
}

Copyright © 2017, edureka and/or its affiliates. All rights reserved.


Data Model

Tables

R1 R2 R3 Rows
Sorted by
R1 R2 R3

Table Schema only defines it’s Column Families

Columns in a family
Consists of any Consists of any Columns exist when
are sorted & stored
number of columns number of versions inserted
together

Copyright © 2017, edureka and/or its affiliates. All rights reserved.


Data Model

Row key Personal_data Demographic


Persons ID Name Address Birth Date Gender

1 H. Houdini Budapest 1926-10-31 M

2 D. Copper 1956-09-16 M

3 Merlin 1136-12-03 F

4 ….. ….. ….. M


500,000,000 F. Cadillac Nevada 1964-01-07 M

Copyright © 2017, edureka and/or its affiliates. All rights reserved.


Physical Storage

Col(Birth date) –
Row 1 (1) Col(address) - Budapest
1926-10-31

Col(Name) - Houdini Col(Gender) – M

Row 2 (2)

Col5(address) – D. Copper Col3(Birth date) – val3

Row 3 (3) Col4(Gender) – val4

Family1 (Personal Data) Family2 (Demographic)

Copyright © 2017, edureka and/or its affiliates. All rights reserved.


How Does it Look Like?

What it means?

Column Family: Column


Row Key Values
Qualifier

• Unique for each row • Less number of families • Various versions of values
gives faster access are maintained
• Identifies each row
• Families are fixed column • Scan shows only recent
qualifiers are not version

Copyright © 2017, edureka and/or its affiliates. All rights reserved.


HBase Shell

hbase(main):003:0> create 'test', 'cf'


0 row(s) in 1.2200 seconds
hbase(main):004:0> put 'test', 'row1', 'cf:a',
'value1'
0 row(s) in 0.0560 seconds
hbase(main):005:0> put 'test', 'row2', 'cf:b',
'value2'
0 row(s) in 0.0370 seconds
hbase(main):006:0> put 'test', 'row3', 'cf:c',
'value3'
0 row(s) in 0.0450 seconds

Copyright © 2017, edureka and/or its affiliates. All rights reserved.


HBase Shell (Contd.)
hbase(main):007:0> scan 'test'
ROW COLUMN+CELL
row1 column=cf:a, timestamp=1288380727188, value=value1
row2 column=cf:b, timestamp=1288380738440, value=value2
row3 column=cf:c, timestamp=1288380747365, value=value3
3 row(s) in 0.0590 seconds

Copyright © 2017, edureka and/or its affiliates. All rights reserved.


HBase API

get(row)

put(row,Map<column,value>)

scan(key range, filter)

increment(row, columns)

Check and Put, delete etc.

Copyright © 2017, edureka and/or its affiliates. All rights reserved.


HBase Client API

Package

Org.apache.hadoop.hbase.client.Htable

It is recommended that you create Htable instances only once, one per thread and reuse that
instance for the rest of lifetime of your client application.

Copyright © 2017, edureka and/or its affiliates. All rights reserved.


Data Loading Techniques Used in HBase

Using HBASE SHELL

DataLoading
Using Client API Techniques In Using PIG
HBase

Using SQOOP

Copyright © 2017, edureka and/or its affiliates. All rights reserved.


Loading Data into HBase Using Sqoop

MYSQL SQOOP HBASE

Sqoop can be used to directly import data from RDBMS to Hbase:

Example:

sqoop import
--connect jdbc:mysql://<ip address>\<database name>
--username <username_for_mysql_user> --password
<Password>
--table <mysql_table name>
--hbase-table<hbase_target_table_name>
--column-family <column_family_name>
--hbase-row-key <row_key_column>
--hbase-create-table

Copyright © 2017, edureka and/or its affiliates. All rights reserved.


HBase JAVA Client Interfaces

Where to find the cluster and tuneable settings. Similar to JDBC


Configuration connection String

HBaseAdmin Helps to manage administrative tasks

HBaseDescriptor Has the details of the table

HTable Is a handle on a single table. Is used to issue Put, Get, Scan


commands to the table

Copyright © 2017, edureka and/or its affiliates. All rights reserved.


Why Use ZooKeeper?

To solve complex To avoid race conditions


distributed algorithms and dead locks

To apply reusable code


WHY USE To avoid management
libraries in common use
ZOOKEEPER? complexities
cases

Easy to use programming


model

Copyright © 2017, edureka and/or its affiliates. All rights reserved.


The Target Market

Target Market for ZooKeeper

Multi-Host Multi-Process C Java Based Systems

Copyright © 2017, edureka and/or its affiliates. All rights reserved.


ZooKeeper Data Model
/

Services
• Hierarchal namespace (like a File System)
YaView

• Each znode has data and children


Servers

Stupidname
• Data is read and written in its entity
morestupidity
locks

read-1

apps

users

Copyright © 2017, edureka and/or its affiliates. All rights reserved.


ZooKeeper Service

ZooKeeper Service

Leader

Server Server Server Server Server

Client Client Client Client Client Client Client Client

Copyright © 2017, edureka and/or its affiliates. All rights reserved.


Wishes

1 Wait Free

2 Simple, Robust, Good Performance

3 Tuned for Read Dominant Workloads

4 Familiar Models and Interfaces

5 Need to be able to Wait Efficiently

Copyright © 2017, edureka and/or its affiliates. All rights reserved.


ZooKeeper and HBase
• Master Failover

• Region Servers and Master discovery via ZooKeeper

o HBase clients connect to ZooKeeper to find configuration data

o Region Servers and Master failure detection

Copyright © 2017, edureka and/or its affiliates. All rights reserved.


HBase and ZooKeeper as of now!

shutdown
shutdown
▪ Master

/ ▪ If more than one master, they fight.

Root.region. ▪ Root Region Server


server ▪ This znode holds the location of the server hosting the root of all
tables in HBase.

▪ A directory in which there is a znode per Hbase region server

rs ▪ Region Servers register themselves with ZooKeeper when they


come online
master
On Region Server failure (detected via ephemeral znodes and notification
via ZooKeeper), the master splits the edits out per region.

Copyright © 2017, edureka and/or its affiliates. All rights reserved.


Release 3.3.0 What’s in for HBase?
Allow configuration of
Improved logging information
session timeout min/max
to detect issues
bounds

Queue implementation
Key Features Improved debugging tools
available

Improved performance and


Improved documentation
robustness

Copyright © 2017, edureka and/or its affiliates. All rights reserved.


Advance HBase Demo

Copyright © 2017, edureka and/or its affiliates. All rights reserved.


Assignment
Practice “Advance HBase Code” present in the LMS

Copyright © 2017, edureka and/or its affiliates. All rights reserved.


Pre-work
http://www.edureka.co/blog/apache-spark-vs-hadoop-mapreduce

http://www.edureka.co/blog/5-things-one-must-know-about-spark/

Copyright © 2017, edureka and/or its affiliates. All rights reserved.


Agenda for Next Class
• What is Apache Spark?
• Spark Ecosystem
• Spark Components
• History of Spark and Spark Versions/Releases
• Spark a Polyglot
• What is Scala?
• Why Scala ?
• Spark Context
• RDD

Copyright © 2017, edureka and/or its affiliates. All rights reserved.


Copyright © 2017, edureka and/or its affiliates. All rights reserved.
Copyright © 2017, edureka and/or its affiliates. All rights reserved.

You might also like