You are on page 1of 34

Riak at Voxer

About Voxer

Why Riak?

SQL Breakup

CouchDB

Other Options

Riak

Riak User
So far, so good

Message Storage

> Timeline Indexing

> No SQL means No SQL


never fear Riak search is here except it was too slow and really buggy and brought down our cluster (many times) 2i is coming need an indexing scheme that will get us to 2i

> the big question


"what's new since I was last here"
hold over from couch days gmail does it an index per person with every message in it

> let's use Redis


sorted sets let's shard it x 32 let's mirror it entire timeline in memory, what could go wrong? managing 64 redis-server instances ended up being harder than we thought mirrors began drifting out of sync

> I got religion when we got users


I am now a devout Scaleist I worship at the altar of N

> shard or get off the pot


cut and run? where's 2i? bitcask limitations screw it we are building our own

> alas an index in Riak


/riak/timelines/<user>|head pageable index with a known head key JSON + CRLF "mostly" append operations archives and de-dupes itself objects are bounded in size b/c of pageable structure

> mo data mo problems


these indexes are getting big marking read and deleting is slow too much work/message min 2 writes per recipient TCP incast kind of cheating by using SSDs we need a big win

> thread indexing


relied on and enhanced the pageable indices reduced writes to max of 1/message simplified reads and deletes finally a win even on spinning disk phased rollout as we got perilously close to being out of space @ softlayer

> siblings
multiple writers to the same object use our own consistent hash to order writes too many siblings makes Riak very sad

> threads by user


{ "1328242142121_7828422941_b13bf918": { "deleted": 1336748754.287775, "left_chat": 1336748754.2394028 }, "1339805785991_3032758220_b478996c": { "deleted": 1339954795.5870001 }, "1339954375330_0772805937_bcb4657": { "deleted": 0 } ... }

> thread_index
{"next":{"id":1,"key":"1328242142121_7828422941_b13bf918|1"}} ["1331160443337_0838173159_56a5fb6",1331160447.088] ["1331160457136_0341775134_56a5fb6",1331160462.951] ["1331160604897_0522536365_56a5fb6",1331160612.348] ["1331160868521_0453734508_bcb4657",1331160873.694] ["1331173364385_0887618434_bcb4657",1331173368.456] ["1331188383514_3850959348_5ac4719c",1331188387.788] ["1331241908354_0011491758_56a5fb6",1331241969.03]

> answering the big question


fetch threads by user mget to Redis for last modified on all threads fan out requests to each modified thread_indexes build a comprehensive list of message ids none of the message data actually lives in this cluster, this is all index stream results from riak media

Media Storage

Gripes

Reads are Expensive

No Fancy Features

Backpressure

Love

Actually Works

Great Support

Upward Trajectory

You might also like