Professional Documents
Culture Documents
Week 1 Week 5
– Introduction to HDFS – HIVE
Week 2 Week 6
– Setting Up Hadoop Cluster – HBASE
Week 3 Week 7
– Map-Reduce Basics, types and formats – ZOOKEEPER
Week 4 Week 8
– PIG – SQOOP
What are we going to learn Today..!
• Huge Data
• Fast Random access
• Structured Data
• Variable Schema
• Need of Compression
• Need of Distribution (Sharding)
How Traditional RDBMS will solve
Users Follower
Id User_id
Name Follower_id
Sex type
age
Contd.
Users Connections
Id User_id
Name Connection_id
Sex type
age
Characteristics Of Probable Solution
• Distributed database
• Sorted data
• Sparse data store
• Automatic sharding
History of HBase
• Facebook monitored their usage and figured out what the really
needed.
• What they needed was a system that could handle two types of data
patterns:
– A short set of temporal data that tends to be volatile
– An ever-growing set of data that rarely gets accessed
Referred - http://wiki.apache.org/hadoop/Hbase/PoweredBy
Data Model
Versions Of Data
2 Dhawan 1956-09-16 M
3 Sana whitefield 1189-12-03 F
….. ….. ….. ….. …..
500,000,000 vineet delhi 1964-01-07 M
Physical storage
Col3(Birth date) ->
1926-10-31
Col1(address) ->Budapest
Row 1(1)
Col3(Gender) -> M
Col1(Name) -> H. Houdini
Col3(Birth date) ->
Row 2 (2) val3
What it means?
Column Family:
Row Key Values
Column Qualifier
• Unique for each row • Less number of families • Various versions of values
• Identifies each row gives faster access are maintained
• Families are fixed column • Scan shows only recent
qualifiers are not version
Three Major Components
Data Distribution
Row
s
Logical View – All rows in a
Region
A1
Null -> A3
A2
A22 Region
A3 A3 -> F34
…..
…..
K4 Region
….. F34 -> K80
….. Region
090 K80 -> 095
table
….. Region
….. 095 -> Null
…..
Z30
Z55 Region Region Region
Server Server Server
HBase Components
Zookeeper
Master /hbase/region
1
/hbase/region
2
…..
RegionServers
…..
memstore
/hbase/region
Each row in the ROOT and META tables is approximately 1KB in size.
At the default size of 256MB.
Compactions
Compactions
Time Column
Row key Column “anchor:”
Stamp “contents:”
t12 “<html>…”
t11 “<html>…”
“com.apache.www”
t10 “anchor:apache.com” “APACHE”
t9 “anchor:cnnsi.com” “CNN”
t8 “anchor:my.look.ca” “CNN.com”
“com.cnn.www”
t6 “<html>…”
t5 “<html>…”
t3 “<html>…”
Hstore1
Region Split
Region Splits
Time Column
Row key Column “anchor:”
Stamp “contents:”
t12 “<html>…”
t9 “anchor:cnnsi.com” “CNN”
t8 “anchor:my.look.ca” “CNN.com”
“com.cnn.www”
t6 “<html>…”
t5 “<html>…”
t3 “<html>…”
Hstore1
HBase Client API
HBase Client API
Scanner and Filters
Search
Time
Row key Column “anchor:”
Stamp
t12
“com.apache.www” t11
t10 “anchor:apache.com” “APACHE”
t9 “anchor:cnnsi.com” “CNN”
t8 “anchor:my.look.ca” “CNN.com”
“com.cnn.www” t6
t5
t3
Search
Time
Row key Column “anchor:”
Stamp
t12
“com.apache.www” t11
t10 “anchor:apache.com” “APACHE”
t9 “anchor:cnnsi.com” “CNN”
t8 “anchor:my.look.ca” “CNN.com”
“com.cnn.www” t6
t5
t3
Hbase API
• get(row)
• put(row,Map<column,value>)
• scan(key range, filter)
• increment(row, columns)
• Check and Put, delete etc.
Hbase Shell