Professional Documents
Culture Documents
● Table
● Regions, defined by row [start key, end key)
– Store, 1 per family
● 1+ Store Files (Hfile format on HDFS)
● (table, rowkey, family, column, timestamp) = value
● Everything is byte[]
● Rows are ordered sequentially by key
● Special tables: -ROOT-, .META.
● Tell clients where to find user data
HBase Architecture
Courtesy of Lars George
from http://www.larsgeorge.com/2009/10/hbase-architecture-101-storage.html
What is HBase?
Data Access
ex. FeedItem-by_actor_member
Row Key info: __idx__:
0002851766-9223370783553935005-rowkey actor_member = 2851766 row = ch1143475-
item_type = new_rsvp ts9223370783553935005-rsvp-54704795
pub_date =
0004679998-9223370783650851832-rowkey actor_member = 4679998 row = ch1261585-
item_type = new_discussion ts9223370783650851832-disc-7369603
pub_date =
indexes FeedItem
Row Key info: content:
ch1143475-ts9223370783553935005-rsvp-54704795 actor_member = 2851766 comment = “See you there”
item_type = new_rsvp
pub_date =
ch1261585-ts9223370783650851832-disc-7369603 actor_member = 4679998 title = “Next month”
item_type = new_discussion body = “...”
pub_date =
Interacting with HBase
Meetup.Beeno
...
@HEntity(name="FeedItem")
public class FeedItem implements Externalizable {
...
@HRowKey
public String getId() { return this.id; }
public void setId(String id) { this.id = id; }
@HProperty(family="info", name="actor_member",
indexes = { @HIndex(date_col="info:pub_date", date_invert=true,
extra_cols={"info:item_type"}) } )
public Integer getMemberId() { return this.memberId; }
public void setMemberId(Integer id) { this.memberId = id; }
Interacting with HBase
Services
● Performance testing
● Product targeting 3 of our highest traffic pages, simulating load is hard
● Started with load scripts
● Moved to testing with live traffic
– Use AJAX calls to simulate requests
– Selective enable for X% of traffic
● Launched data collection/write traffic first
– Allowed tweaking configuration before impacting user experience
HBase @ Meetup
Issues along the way