You are on page 1of 61

HBase, Hadoop World

NYC

Ryan Rawson, Stumbleupon.com, su.pr


Jonathan Gray, Streamy.com
A presentation in 2
parts
Part 1
About Me

• Ryan Rawson
• Senior Software Developer @
Stumbleupon
• HBase committer, core contributor
Stumbleupon

• Uses HBase in production


• Behind features of our su.pr
service
• More later
Adventures with
MySQL
• Scaling MySQL hard, Oracle
expensive (and hard)
• Machine cost goes up faster speed
• Turn off all relational features to
scale
• Turn off secondary (!) indexes too!
(!!)
MySQL problems
cont.
• Tables can be a problem at sizes as
low as 500GB

• Hard to read data quickly at these


sizes

• Future doesn’t look so bright as we


contemplate 10x sizes

• MySQL master becomes a problem...


Limitations of masters

• What if your write speed is greater than a


single machine?

• All slaves must have same write capacity


as master (can’t cheap out on slaves)

• Single point of failure, no easy failover

• Can (sort of) solve this with sharding...


Sharding
Sharding problems
• Requires either a hashing function
or mapping table to determine
shard
• Data access code becomes
complex
• What if shard sizes become too
large...
Resharding!
What about schema
changes?

• What about schema changes or


migrations?
• MySQL not your friend here
• Only gets harder with more data
HBase to the
rescue
• Clustered, commodity(ish)
hardware
• Mostly schema-less
• Dynamic distribution
• Spreads writes out over the cluster
What is HBase?
• HBase is an open-source
distributed database, inspired by
Google’s bigtable
• Part of the Hadoop ecosystem
• Layers on HDFS for storage
• Native connections to map reduce
HBase storage
model
• Column-oriented database
• Column name is arbitrary data, can
have large, variable, number of
columns per row
• Rows stored in sorted order
• Can random read and write
Tables
• Table is split into roughly equal
sized “regions”
• Each region is a contiguous range
of keys, from [start, to end)
• Regions split as they grow, thus
dynamically adjusting to your data
set
Server architecture

• Similar to HDFS:
• Master = Namenode (ish)
• Regionserver = Datanode (ish)
• Often run these alongside each
other!
Server
Architecture 2
• But not quite the same, HBase
stores state in HDFS
• HDFS provides robust data storage
across machines, insulating against
failure
• Master and Regionserver fairly
stateless and machine independent
Region assignment

• Each region from every table is


assigned to a Regionserver
• The master is responsible for
assignment and noticing if (when!)
regionservers go down
Master Duties
• When machines fail, move regions
from affected machines to others
• When regions split, move regions
to balance cluster
• Could move regions to respond to
load
• Can run multiple backup masters
What Master does NOT
do
• Does not handle any write requests
(not a DB master!)

• Does not handle location finding


requests

• Not involved in the read/write path!


• Generally does very little most the
time
Distributed
coordination
• To manage master election and
server availability we use
ZooKeeper
• Set up as a cluster, provides
distributed coordination primitives
• An excellent tool for building
cluster management systems
Scaling HBase

• Add more machines to scale


• Base model (bigtable) scales past
1000TB
• No inherent reason why HBase
couldn’t
What to store in
HBase?

•Maybe not your raw log data...


• ... but the results of processing it
with Hadoop!

• By storing the refined version in


HBase, can keep up with huge data
demands and serve to your
website
HBase & Hadoop
• Provides a real time, structured
storage layer that integrates on
your existing Hadoop clusters
• Provides “out of the box” hookups
to map-reduce.
• Uses the same loved (or hated)
management model as Hadoop
HBase
@
Stumbleupon &
HBase
• Started investigating the field in Jan
’09
• Looked at 3 top (at the time)
choices:
• Cassandra
• Hypertable cassandra didnt work,
didnt like data model
- hypertable fast but

• HBase
community and project
viability (no major users
beyond zvents)
- hbase local and good
community
Stumbleupon &
HBase
• Picked HBase:
• Community
• Features
• Map-reduce, cascading, etc
• Now highly involved and invested
su.pr marketing

• “Su.pr is the only URL shortener


that also helps your content get
discovered! Every Su.pr URL
exposes your content to
StumbleUpon's nearly 8 million
users!”
su.pr tech features

• Real time stats


• Done directly in HBase
• In depth stats
• Use cascading, map reduce and
put results in hbase
su.pr web access

• Using thrift gateway, php code


accesses HBase
• No additional caching other than
what HBase provides
Large data storage
• Over 9 billion rows and 1300 GB in
HBase
• Can map reduce a 700GB table in
~ 20 min
• That is about 6 million rows/sec
• Scales to 2x that speed on 2x the
hardware
Micro read
benches
• Single reads are 1-10ms
depending on disk seeks and
caching
• Scans can return hundreds of rows
in dozens of ms
Serial read speeds
• A small table

• A bigger table

• (removed printlns from the code)


Deployment
considerations
• Zookeeper requires IO to complete
ops
• Consider hosting on dedicated
machines
• Namenode and HBase master can
co-exist
What to put on
your nodes
• Regionserver requires 2-4 cores
and 3gb+
• Can’t run HDFS, HBase, maps,
reduces on a 2 core system
• On my 8 core systems I run
datanode, regionserver, 2 maps, 2
reduces
Garbage collection

• GC tuning becomes important.


• Quick tip: use CMS, use
-Xmx4000m
• Interested in G1 (if it ever stops
crashing)
Batch and
interactive
• These may not be compatible
• Latency goes up with heavy batch
load
• May need to use 2 clusters to
ensure responsive website
Part 2
HBase @ Streamy

• History of Data
• RDBMS Issues
• HBase to the Rescue
• Streamy Today and Tomorrow
• Future of HBase
About Me

• Co-Founder and CTO of


Streamy.com
• HBase Committer
• Migrated Streamy from RDBMS to
HBase and Hadoop in June 2008
History of Data
The Prototype
• Streamy 1.0 built on PostgreSQL
‣ All of the bells and whistles
• Powered by single low-spec node
‣ 8 core / 8 GB / 2TB / $4k

Functionally powerful, Woefully slow


History of Data
The Alpha

Streamy 1.5 built on optimized PostgreSQL



Remove bells and whistles, add partitioning

Powered by high-powered master node

16 core / 64 GB / 15x146GB 15k RPM / $40k

Less powerful, still slow... Insanely expensive


History of Data
The Beta

• Streamy 2.0 built entirely on HBase

‣ Custom caches, query engines, and API

• Powered by 10 low-spec nodes

‣ 4 core / 4GB / 1TB / $10k for entire cluster

Less functional but fast, scalable, and cheap


RDBMS Issues
• Poor disk usage patterns
• Black box query engine
• Write speed degrades with table
size
• Transactions/MVCC unnecessary
overhead
• Expensive
The Read Problem
• View 30 newest unread stories
from blogs
‣ Not RDBMS friendly, no early-out
‣ PL/Python heap-merge hack
helped
‣ We knew what to do but DB
didn’t listen
The Write Problem
• Rapidly growing items table
‣ Crawl index from 1k to 100k
feeds
‣ Indexes, static content, dynamic
statistics
‣ Solutions are imperfect
RDBMS
Conclusions
• Enormous functionality and flexibility
‣ But you throw it out the door at scale
• Stripped down RDBMS still not
attractive

• Turned entire team into DBAs


• Gets in the way of domain-specific
optimizations
What We Wanted

• Transparent partitioning
• Transparent distribution
• Fast random writes
• Good data locality
• Fast random reads
What We Got
Regions
• Transparent partitioning
RegionServers
• Transparent distribution
MemStore
• Fast random writes
Column
• Good data locality
Families
• Fast random reads
HBase 0.20
What Else We Got
HDFS
• Transparent replication
No SPOF
• High availability
Input/OutputFor
• MapReduce mats
• Versioning Column
Versions
• Fast Sequential Reads
Scanners
HBase @ Streamy
Today
HBase @ Streamy
Today

• All data stored in HBase


• Additional caching of hot data
• Query and indexing engines
• MapReduce crawling and analytics
• Zookeeper/Katta/Lucene
HBase @ Streamy
Tomorrow

• Thumbnail media server


• Slave replication for Backup/DR
• More Cascading
• Better Katta integration
• Realtime MapReduce
HBase on a Budget
• HBase works on cheap nodes
‣ But you need a cluster (5+ nodes)
‣ $10k cluster has 10X capacity of
$40k node

• Multiple instances on a single cluster


• 24/7 clusters + bandwidth != EC2
Lessons Learned
• Layer of abstraction helps tremendously
‣ Internal Streamy Data API
‣ Storage of serialized types
• Schema design is about reads not writes
• What’s good for HBase is good for
Streamy
What’s Next for
HBase
• Inter-cluster / Inter-DC replication
‣ Slave and Multi-Master
• Master rewrite, more Zookeeper
• Batch operations, HDFS uploader
• No more data loss
‣ Need HDFS appends
HBase Information
• Home Page http://hbase.org
• Wiki
http://wiki.apache.org/hadoop/Hbase
• Twitter http://twitter.com/hbase
• Freenode IRC #hbase
• Mailing List hbase-
user@hadoop.apache.org

You might also like