You are on page 1of 30


Presenter: Alisha Arora

What is Hbase?
The Hadoop database, distributed, scalable, big data
An open-source, versioned, non-relational database
Random, realtime read/write access to your Big Data
Hosting of very large tables -- billions of rows X
millions of columns -- atop clusters of commodity
Difference Between HBase and Hadoop/HDFS?
HDFS is a distributed file system that is well suited
for the storage of large files.
Hbase,built on top of HDFS and provides fast record
lookups (and updates) for large tables.

Linear and modular scalability.
Strictly consistent reads and writes.
Automatic and configurable sharding of
Automatic failover support between
Easy to use Java API for client access.
Block cache and Bloom Filters for realtime queries.

HBase Overview
HBase Data Model

Table is
y sorted on


Column Family - Info














ts1 = 1
ts2 = 2

Each cell has

multiple versions
represented by
timestamp where

Identify your data (cell value) in the HBase table by

[1] rowkey, [2] column family, [3] column qualifier, [4] timestamp/ version]

Data Model Terminology

An HBase table consists of multiple rows.
A row key + Multiple columns.
Sorted alphabetically by the row key as they are stored.
A column family and a column qualifier, which are delimited by a :
(colon) character.

Data Model Terminology

Column Family
Physically colocate a set of columns and their values, often for performance
Set of storage properties - its values should be cached in memory, how its data is
compressed or its row keys are encoded, and others.
Each row in a table has the same column families, though a given row might not
store anything in a given column family.
Physically, all column family members are stored together on the filesystem.
Column Qualifier
A column qualifier is added to a column family to provide the index for a given
piece of data.
A cell is a combination of row, column family, and column qualifier, and contains a
value and a timestamp, which represents the values version.
A timestamp is written alongside each value, and is the identifier for a given
version of a value

Data Model: Conceptual


Data Model: Physical View

Hbase Run Modes

Standalone HBase
This is the default mode
It uses the local filesystem instead
It runs all HBase daemons and a local ZooKeeper all up in the same JVM.
All daemons run on a single node.
Used for testing and prototyping on HBase.
Local Filesystem or HDFS
Where the daemons are spread across all nodes in the cluster.
Multiple instances of HBase daemons run on multiple servers in the cluster.
HDFS only
Editing files in the HBase conf directory.

HBase Distributed Mode
























Client contacts
ZooKeeper, a
separate cluster
of ZK nodes

Retrieve RS
hosting ROOTregion
(Row/ Meta region)

(Row/ table region)

T1R2, T1R3





Table T1 is split into three

regions R1, R2, R3

Find Sumeets
role with HBase

Each region is served by a

RegionServer collocated
with the DataNode


Query the .Meta.

server that has
the row key

Master Server

The Master server is responsible for monitoring all RegionServer instances in the
cluster, and is the interface for all metadata changes.
Startup Behavior
If run in a multi-Master environment, all Masters compete to run the cluster.
If the active Master loses its lease in ZooKeeper (or the Master shuts down), then
the remaining Masters jostle to take over the Master role.
Runtime Impact
Client talks directly to the RegionServers, the cluster can still function in a "steady
However, the Master controls critical functions such as RegionServer failover and
completing region splits. So while the cluster can still run for a short time without
the Master, the Master should be restarted as soon as possible.
The Master runs several background threads:
LoadBalancer: Periodically, and when there are no regions in transition, a load balancer will
run and move regions around to balance the clusters load.
CatalogJanitor: Periodically checks and cleans up the hbase:meta table. See
<arch.catalog.meta>> for more information on the meta table.

HBase Overview

HBase High-level



Region Server Splitting

Write Requests accumulate in an in-memory storage
system called the memstore.
Memstore Flush: Once the memstore fills, its
content are written to disk as additional store files.
Compaction: As store files accumulate, the
RegionServer will compact them into fewer, larger
After each flush or compaction finishes, the amount
of data stored in the region has changed.
The RegionServer consults the region split policy to
determine if the region has grown too large or should
be split for another policy-specific reason.

WAL (Write Access Log)

Write Ahead Log (WAL)
The Write Ahead Log (WAL) records all changes to data in HBase, to filebased storage.
WAL ensures that the changes to the data can be replayed.
If writing to the WAL fails, the entire operation to modify the data fails.
Usually, there is only one instance of a WAL per RegionServer.
The RegionServer records Puts and Deletes to it, before recording them
to the MemStore for the affected Store.
With a single WAL per RegionServer, the RegionServer must write to the
WAL serially. This causes the WAL to be a performance bottleneck.
MultiWAL allows a RegionServer to write multiple WAL streams in parallel
which increases total throughput during writes.
This parallelization is done by partitioning incoming edits by their Region.
Thus, the current implementation will not help with increasing the
throughput to a single Region.

Regions: Basic Element of

Considerations for Number of Regions
Small (20-200) number of relatively large (5-20Gb) regions per server.
Why should I keep my Region count low?
Memory Usage:
Usually right around 100 regions per RegionServer has yielded the best results.
2MB per MemStore (thats 2MB per family per region).
1000 regions that have 2 families each is 3.9GB of heap used, and its not even storing data
Too Many Flushes:
If you fill all the regions at somewhat the same rate, the global memory usage makes it that it
forces tiny flushes when you have too many regions which in turn generates compactions.
Rewriting the same data tens of times
Assigning Regions:
The master will take a lot of time assigning them and moving them around in batches.
Its heavy on ZK usage
No. of MapReduce Jobs:
Usually have one mapper per HBase region. 1000 regions will generate far too many tasks.

The MemStore holds in-memory modifications to the Store
When a flush is requested, the current MemStore is moved to a
snapshot and is cleared.
HBase continues to serve edits from the new MemStore and backing
snapshot until the flusher reports that the flush succeeded.
At this point, the snapshot is discarded. Note that when the flush
happens, MemStores that belong to the same region will all be
MemStore Flush
When a MemStore reaches the size specified by flush.size
When the overall MemStore usage reaches the value specified by
upper limit.
When the number of WAL log entries in a given region servers WAL
reaches the value specified in hbase.regionserver.max.logs. The flush
order is based on time


Reduces the number of StoreFiles in a Store in order to increase performance on

read operations.
Resource-intensive to perform

Minor compactions
Select a small number of small, adjacent StoreFiles and rewrite them as a single
Do not drop (filter out) deletes or expired versions, because of potential side
The end result of a minor compaction is fewer, larger StoreFiles for a given Store.
Major Compaction
The end result of a major compaction is a single StoreFile per Store.
Major compactions also process delete markers and max versions
Major Compactions Can Impact Query Results
Major compactions improve performance.
Highly loaded system -> major compactions can adversely affect performance.
Default: run once in a 7-day period. This is sometimes inappropriate for systems
in production.

HBase Operations
put(<ROW>, Map<KEY,VALUE>)
check HTable class for further details on operations
No queries
No secondary indexes

Namespace Management
A namespace can be created, removed or altered.
Namespace membership is determined during table creation by
specifying a fully-qualified table name of the form:
Predefined namespaces
There are two predefined special namespaces:
hbase - system namespace, used to contain HBase internal tables
default - tables with no explicit specified namespace will
automatically fall into this namespace
#namespace=foo and table qualifier=bar
create 'foo:bar', 'fam'
#namespace=default and table qualifier=bar
create 'bar', 'fam'

Current Limitations
Deletes mask Puts
Delete masks puts even if Puts that happened after the delete was entered.
Deletes are handled by creating new markers called tombstones. These tombstones, along with
the dead values, are cleaned up on major compactions.
These issues should not be a problem if you use always-increasing versions for new puts to a row.
delete of everything T , put with a timestamp T.
Major compactions change query results
Create three cell versions at t1, t2 and t3, with a maximum-versions setting of 2. So when getting
all versions, only the values at t2 and t3 will be returned. But if you delete the version at t2 or t3,
the one at t1 will appear again. Obviously, once a major compaction has run, such behavior will
not be the case anymore
Column Metadata
No store of column metadata outside of the internal KeyValue instances for a ColumnFamily.
The only way to get a complete set of columns that exist for a ColumnFamily is to process all the
Denormalize the data upon writing to HBase,
Have lookup tables and do the join between HBase tables in your application

Table Schema Rules of

Regions between 10 and 50 GB.
Cells no larger than 10 MB. Consider storing a pointer to the data
in Hbase.
~100 regions for a table with 1 or 2 column families.
Column family names as short as possible. The column family
names are stored for every value.
Resource consumption is driven by the active regions only.
If only one column family is busy with writes, only that column
family accumulates memory. Be aware of write patterns when
allocating resources.

Column Families
HBase currently does not do well with anything above two or three
If one column family is carrying the bulk of the data bringing on
flushes, the adjacent families will also be flushed even though the
amount of data they carry is small.
When many column families exist, flushing and compaction
interaction can make for a bunch of needless i/o. Try to make do
with one column family if you can in your schemas. Only introduce a
second and third column family in the case where data access is
usually column scoped
If ColumnFamilyA has 1 million rows and ColumnFamilyB has 1 billion
rows, ColumnFamilyAs data will likely be spread across many, many
regions (and RegionServers). This makes mass scans for
ColumnFamilyA less efficient.

Rowkey Design
Hotspotting occurs when a large amount of client traffic is directed at one
node, or only a few nodes, of a cluster.
This can also have adverse effects on other regions hosted by the same
region server as that host is unable to service the requested load.
Add a randomly-assigned prefix to the row
Increase throughput on writes, but has a cost during reads.
Use a one-way hash and allow for predictability during reads.
Allows to reconstruct the complete rowkey to retrieve that row as normal.
Reversing the Key
reverse a fixed-width or numeric row key so that the part that changes the
most often (the least significant digit) is first.
This effectively randomizes row keys, but sacrifices row ordering properties.

Monotonically Increasing Row Keys/Timeseries Data
All clients pounding one of the tables regions (and thus, a single node), then next.
Solution: Randomize the input records
Avoid using a timestamp or a sequence (e.g. 1, 2, 3) as the row-key.
[metric_type][event_timestamp] : if there are dozens or hundreds (or more) of different metric
Minimize row and column sizes
If your rows and column names are large, especially compared to the size of the cell value, then
you may run up against some interesting scenarios.
Compression will make for larger indices
Patterns for ColumnFamilies, attributes, & rowkeys be repeated several billion times in your data.
Shorter Column Families
Preferably, one character (e.g. "d" for data/default).
Shorter Attributes
Although verbose attribute names (e.g., "myVeryImportantAttribute") are easier to read, prefer
shorter attribute names (e.g., "via") to store in HBase.
Rowkey Length
Keep them as short as is reasonable such that they can still be useful for required data access (e.g.
Get vs. Scan).
Expect tradeoffs when designing rowkeys.

RowKeys and Splits

Immutability of Rowkeys
Rowkeys cannot be changed.
The only way they can be "changed" in a table is if the row is
deleted and then re-inserted
Relationship Between RowKeys and Region Splits
Example: hex characters as the lead position of the key (e.g.,
"0000000000000000" to "ffffffffffffffff"). Split using Bytes.split()
All the data is going to pile up in the first 2 regions and the last
region thus creating a "lumpy" (and possibly "hot") region problem.
'0' is byte 48, and 'f' is byte 102, but there is a huge gap in byte
values (bytes 58 to 96) that will never appear in this keyspace
because the only values are [0-9] and [a-f].
Custom definition of splits is required.
Pre-splitting tables is generally a best practice, such that all the
regions are accessible in the keyspace.

Version, DataTypes and TTL

Maximum & Minimum Number of Versions
Supported Datatypes
Anything that can be converted to an array of bytes can be stored as a
Time To Live (TTL)
ColumnFamilies can set a TTL length in seconds
HBase will automatically delete rows once the expiration time is
Store files which contains only expired rows are deleted on compaction.
Recent versions of HBase also support setting time to live on a per cell
Cell TTLs are expressed in units of milliseconds instead of seconds.
A cell TTLs cannot extend the effective lifetime of a cell beyond a
ColumnFamily level TTL setting.

Is Hbase Right for you?

Lots of data - hundreds of millions or billions of rows.
Mutable Data
No extra features that an RDBMS provides
(e.g., typed columns, secondary indexes, transactions, advanced query languages,

Enough hardware
Uses a variable schema where each row is slightly different.
Sparse Schema.
Most of the columns are NULL in each row. Eg, web map.

Stores data in collections with different types.

For example: some meta data, message data or binary data is keyed on the same

Needs key-based access to data when storing or retrieving. Eg, cookie or

profile store
Sorted Scans. Eg, Metrics
High Write Rate

HBase Donts
Don't try to use as a MySQL replacement
Don't use it when you ONLY do large batch processing (raw
HDFS usually best)
May lose some data locality if major compaction has not recently run

Time series data

Data skew as all new data will go to the same RegionServer