HBase, Hadoop World NYC

Ryan Rawson, Stumbleupon.com, su.pr Jonathan Gray, Streamy.com

A presentation in 2 parts

Part 1

About Me
• Ryan Rawson • Senior Software Developer @

• HBase committer, core contributor

• Uses HBase in production • Behind features of our su.pr

• More later

Adventures with MySQL
• Scaling MySQL hard, Oracle
expensive (and hard)

• Machine cost goes up faster speed • Turn off all relational features to
scale (!!)

• Turn off secondary (!) indexes too!

MySQL problems cont.
• Tables can be a problem at sizes as
low as 500GB sizes

• Hard to read data quickly at these • Future doesn’t look so bright as we
contemplate 10x sizes

• MySQL master becomes a problem...

Limitations of masters
your write • What ifmachine? speed is greater than a single slaves • All mastermust have same write capacity as (can’t cheap out on slaves)

• • Can (sort of) solve this with sharding...
Single point of failure, no easy failover


Sharding problems
• Requires either a hashing function
or mapping table to determine shard complex large...

• Data access code becomes • What if shard sizes become too


What about schema changes?
• What about schema changes or

• MySQL not your friend here • Only gets harder with more data

HBase to the rescue
• Clustered, commodity(ish)

• Mostly schema-less • Dynamic distribution • Spreads writes out over the cluster

What is HBase?
• HBase is an open-source
distributed database, inspired by Google’s bigtable

• Part of the Hadoop ecosystem • Layers on HDFS for storage • Native connections to map reduce

HBase storage model
• Column-oriented database • Column name is arbitrary data, can
have large, variable, number of columns per row

• Rows stored in sorted order • Can random read and write

• Table is split into roughly equal
sized “regions”

• Each region is a contiguous range
of keys, from [start, to end)

• Regions split as they grow, thus

dynamically adjusting to your data set

Server architecture
• Similar to HDFS: • Master = Namenode (ish) • Regionserver = Datanode (ish) • Often run these alongside each

Server Architecture 2
• But not quite the same, HBase
stores state in HDFS

• HDFS provides robust data storage • Master and Regionserver fairly

across machines, insulating against failure stateless and machine independent

Region assignment
• Each region from every table is
assigned to a Regionserver

• The master is responsible for

assignment and noticing if (when!) regionservers go down

Master Duties
• When machines fail, move regions
from affected machines to others to balance cluster load

• When regions split, move regions • Could move regions to respond to • Can run multiple backup masters

What Master does NOT do

• Does not handle any write requests
(not a DB master!) requests

• Does not handle location finding • Not involved in the read/write path! • Generally does very little most the

Distributed coordination
• To manage master election and
server availability we use ZooKeeper

• Set up as a cluster, provides

distributed coordination primitives cluster management systems

• An excellent tool for building

Scaling HBase
• Add more machines to scale • Base model (bigtable) scales past

• No inherent reason why HBase

What to store in HBase?

•Maybe not your raw log data...

• ... but the results of processing it
with Hadoop!

• By storing the refined version in

HBase, can keep up with huge data demands and serve to your website

HBase & Hadoop
• Provides a real time, structured
to map-reduce. storage layer that integrates on your existing Hadoop clusters

• Provides “out of the box” hookups • Uses the same loved (or hated)
management model as Hadoop

HBase @

• Started investigating the field in Jan

Stumbleupon & HBase

• Looked at 3 top (at the time)

• Cassandra • Hypertable • HBase

cassandra didnt work, didnt like data model - hypertable fast but community and project viability (no major users beyond zvents) - hbase local and good community

Stumbleupon & HBase
• Picked HBase: • Community • Features • Map-reduce, cascading, etc • Now highly involved and invested

su.pr marketing
• “Su.pr is the only URL shortener
that also helps your content get discovered! Every Su.pr URL exposes your content to StumbleUpon's nearly 8 million users!”

su.pr tech features
• Real time stats • Done directly in HBase • In depth stats • Use cascading, map reduce and
put results in hbase

su.pr web access
• Using thrift gateway, php code
accesses HBase

• No additional caching other than
what HBase provides

Large data storage
• Over 9 billion rows and 1300 GB in

• Can map reduce a 700GB table in
~ 20 min

• That is about 6 million rows/sec • Scales to 2x that speed on 2x the

Micro read benches
• Single reads are 1-10ms
in dozens of ms depending on disk seeks and caching

• Scans can return hundreds of rows

Serial read speeds
• A small table • A bigger table • (removed printlns from the code)

Deployment considerations
• Zookeeper requires IO to complete

• Consider hosting on dedicated
machines co-exist

• Namenode and HBase master can

What to put on your nodes
• Regionserver requires 2-4 cores
and 3gb+

• Can’t run HDFS, HBase, maps,
reduces on a 2 core system

• On my 8 core systems I run

datanode, regionserver, 2 maps, 2 reduces

Garbage collection
• GC tuning becomes important. • Quick tip: use CMS, use
-Xmx4000m crashing)

• Interested in G1 (if it ever stops

Batch and interactive
• These may not be compatible • Latency goes up with heavy batch

• May need to use 2 clusters to
ensure responsive website

Part 2

HBase @ Streamy
• History of Data • RDBMS Issues • HBase to the Rescue • Streamy Today and Tomorrow • Future of HBase

About Me
• Co-Founder and CTO of

• HBase Committer • Migrated Streamy from RDBMS to
HBase and Hadoop in June 2008

History of Data
The Prototype

• Streamy 1.0 built on PostgreSQL ‣ All of the bells and whistles • Powered by single low-spec node ‣ 8 core / 8 GB / 2TB / $4k
Functionally powerful, Woefully slow

History of Data
The Alpha
• •
Streamy 1.5 built on optimized PostgreSQL

‣ ‣

Remove bells and whistles, add partitioning

Powered by high-powered master node 16 core / 64 GB / 15x146GB 15k RPM / $40k

Less powerful, still slow... Insanely expensive

History of Data
The Beta

• •

Streamy 2.0 built entirely on HBase

‣ ‣

Custom caches, query engines, and API

Powered by 10 low-spec nodes 4 core / 4GB / 1TB / $10k for entire cluster

Less functional but fast, scalable, and cheap

RDBMS Issues
• Poor disk usage patterns • Black box query engine • Write speed degrades with table

• Transactions/MVCC unnecessary

• Expensive

The Read Problem
• View 30 newest unread stories
from blogs

‣ Not RDBMS friendly, no early-out ‣ PL/Python heap-merge hack

‣ We knew what to do but DB
didn’t listen

The Write Problem
• Rapidly growing items table ‣ Crawl index from 1k to 100k

‣ Indexes, static content, dynamic

‣ Solutions are imperfect

RDBMS Conclusions
• Enormous functionality and flexibility • Stripped down RDBMS still not

‣ But you throw it out the door at scale

• Turned entire team into DBAs • Gets in the way of domain-specific

What We Wanted
• Transparent partitioning • Transparent distribution • Fast random writes • Good data locality • Fast random reads

What We Got
• • • • •
Regions Transparent partitioning RegionServers Transparent distribution MemStore Fast random writes Column Good data locality Families Fast random reads HBase 0.20

What Else We Got
• • • Column • Versioning Versions • Fast Sequential Reads
HDFS Transparent replication No SPOF High availability Input/OutputFor mats MapReduce


HBase @ Streamy

HBase @ Streamy

• All data stored in HBase • Additional caching of hot data • Query and indexing engines • MapReduce crawling and analytics • Zookeeper/Katta/Lucene

HBase @ Streamy

• Thumbnail media server • Slave replication for Backup/DR • More Cascading • Better Katta integration • Realtime MapReduce

HBase on a Budget
• HBase works on cheap nodes ‣ But you need a cluster (5+ nodes) ‣ $10k cluster has 10X capacity of
$40k node

• Multiple instances on a single cluster • 24/7 clusters + bandwidth != EC2

Lessons Learned
• Layer of abstraction helps tremendously ‣ Internal Streamy Data API ‣ Storage of serialized types • Schema design is about reads not writes • What’s good for HBase is good for

• Inter-cluster / Inter-DC replication ‣ Slave and Multi-Master • Master rewrite, more Zookeeper • Batch operations, HDFS uploader • No more data loss ‣ Need HDFS appends

What’s Next for HBase

HBase Information
• Home Page http://hbase.org • Wiki

• Twitter http://twitter.com/hbase • Freenode IRC #hbase • Mailing List hbaseuser@hadoop.apache.org