You are on page 1of 1

TITAN

A HIGHLY SCALABLE, DISTRIBUTED GRAPH DATABASE


HTTP://THINKAURELIUS.COM

GRAPH GRAPH DATA STRUCTURE


SCHEMA name:horatius
name:rome
id:1 tim name:lars
name:string e:1 id:3 tim
e:4 id:4
id:long fol
low fol
time:long s low

follows
s
USER
tim
e

tweets
m
:2

time:8
e:3
ea
foll

tim
ow

str

stream

time:8
e:1

s
s

low
tim

fol
time:long
time:long
stream

tweets
text:string :1 0
time:long it me name:spurius
id:2
weets text:your soldiers are weak...
t
time:8
TWEET
text:together we will defeat...
time:10

CAP THEOREM TECHNIQUES FOR SCALE THE AURELIUS OLAP STORY


g.makeType()
Partitionability .name("follows")
Data
.primaryKey(time) OLTP OLAP
Management
.makeEdgeLabel()

12345678 12345683
Edge follows
Compression
12345678 9 12345683 24 bytes

12345678 9 +5 7 bytes
TITAN
Ava
y c
n

Stores a massive-scale
FAUNUS FULGORA
e

il

Vertex-Centric horatius.query()
a
t

property graph
sis

Indices Generates a large-scale Stores compressed


ili

.direction(OUT)
n

ty

single-relational graph large-scale graph in memory


o

.labels("follows")
C

.has("time",1,GREATER_THAN)
Load into RAM
on a single-machine
TINKERPOP INTEGRATION
titan$ bin/gremlin.sh
Map/Reduce
\,,,/
(o o)
Blueprints -----oOOo-(_)-oOOo-----
Rexster
A Graph API gremlin> g = TitanFactory.open(...)
A Graph Server
==>titangraph[cassandra:66.66.66.66]
gremlin> g.v(1).out('follows').name To stats/algorithms package
Gremlin Frames Update graph with derived edges
==>rome
A Graph Traversal ==>spurius An Object-to-Graph
Language gremlin> Mapper Update element properties with algorithm results

A SOCIAL STRESS TEST The follows graph was generated from Twitter 2009. ARCHITECTURE COST
Kwak, H., Lee, C., Park, H.,
horatius: @lars: your siege on @rome will end now. 41,700,000 user vertices Moon, S., "What is Twitter, a 6 Amazon EC2 m1.large machines
1,470,000,000 follows edges Social Network or a News Titan/Cassandra cluster
lars: @horatius: unlikely as your soldiers are weak and few. Media?," WWW2010. $0.32 per hour x 6 machines x 24 hours
horatius: @spurius: let us assemble our remaining men on the bridge. Tweet vertex text data is randomly selected character = $46.08 per day
sequences from Homer's The Odyssey.
spurius: together we will defeat the 14 Amazon EC2 m1.small machines
#etruscan invaders. cc/ @rome Read/write servers
rome: history will look kindly on the $0.08 per hour x 14 machines x 24 hours
great roman empire. = $26.88 per day
horatius: RT @rome history will look
Cost for running this architecture: $72.96 per day
kindly on the great roman empire.

110 million transactions per day


(99.99% success rate)

$72.96 / 110 = $0.66 per million transactions

Average CPU Load (%)

14 machines x 50 threads x ~15 users simulated per thread 50 concurrent threads


= ~10,500 concurrent users while(true) {
Average disk reads (bytes) 1. create a user (~130ms)
2. follow 10 users (~270ms)
3. compose a tweet (~140ms)
4. recommend followers (~500ms)
5. read a user stream (~100ms)
}
1 of the 5 operations is randomly
chosen using a biased coin toss.

1. Name uniqueness check


2. Add a vertex with properties
3. Add vertex to index
Average disk writes (bytes)
1. Get 20 user vertices
2. Add 10 follows edges

1. Add tweet vertex with properties


2. Add stream edge to all followers

1. For a user, randomly sample followers


2. Find those followers' 50 most recent followers
3. Sort the count distribution and return top 3

1. Get the 10 most recent stream edges


2. Get the tweet vertex of those edges

You might also like