You are on page 1of 19

Record every Referral for Flickr

Realtime

While reading from the DB


By
Dathan Vance Pattishall
Who am I?
• Been working with mySQL since 1999
• Scaled many companies with mySQL
• Such as Friendfinder / Friendster / Flickr
• Now doing the same for RockYou
• Words I like: Federation, Partitioning,
Shards, Raid-10, Scale, Speed
(Throughput)
Flickr Stats Backend Look
Flickr Stats is a feature that reports on every
referrer an object receives, and stores that
information for as long as you’re a pro
member
Requirements
• Scale better then the number of page
views.
• Store the data forever
• Associate time with the data
• Allow for change
• Keep it cheap
• Oh and downtime is not an option
Spread the Data around
• Federate – not the scope of this talk
(RAID-10 for the DataLayer)
• All referrers are owned by the page owner
• Spread data out by that
• But Federate in a different direction
• Add a new column to the global account
lookup
What does the data look like?
Path_query Varchar(255) PK
Domain Varchar(50)
Owner Bigint
When Date
Object-ID Bigint
Object-Type Tinyint
Counts and stuff Various ints May be some keys
That didn’t work
• Strings as primary keys was not good.
• Every other index was a new page
referencing the larger then ideal primary
key size.
• Inserts slowed when the table grew larger
then memory size
• Not enough I/O to handle double the load
Start over
• Converted the URL into a 64-bit ID
• CONV(SUBSTR(MD5(Url),0,16),16,10)
• ==
• 64-bit number
• For added protection its unique for the
owner
• Reduced Primary key size to 8 bytes +
owner, object, object-type
Split data up by Week
• Since a time requirement and a chance to
drop data, lets drop it fast and move on
• 53 Tables that have 7 columns to denote
day
• 1 Year Table to cook data down to a year
view
• 1 Totals table to reference the single copy
of the string
INNODB & Strings
• Indexing a string takes a lot of space
• Indexing a large string takes even more
space
• Each index has its own 16KB page.
• Fragmentation across pages was hurting
the app – chewing up I/O
• That’s a lot of disk space chewed up per
day
INNODB & High Concurrency of
String Writes
• Requirement: 300 ms for total db access
FOR ALL Apps
• Writes when the datafile(s) are greater
then the buffer_size slow down at high
concurrency
• 10 ms to 20 seconds sometimes for the
full transaction
• Noticed replication keeping up
Solution buffer the writes
• Put a java daemon that buffers up to 4000
messages (transactions) and apply it
serially with one thread
• It does not go down
• It does not use much memory or cpu
• It was written by a Zen Java Master
• Even during peak messages do not
exceed 200 outstanding transactions
Reduce the use of big strings
• Store 1 copy of the referrer and reference
the big int
• Since you store the data for a pro user
forevar keep condensed formats
• Keep only the data that is displayed
Cook data down
• Created a state machine
• 99% of urls reduced by 30%
• Probably could of used zlib to do it better
thinking about it.
Keep even smaller amounts of data
• Use myISAM for non-pro users and keep
only X weeks of data. MyISAM keeps the
data 1/6 the size over InnoDB for my case
• Migrate the data when they activate the
product.
• But don’t do it more then once – because
that would suck.
Distributed Locks
• Use GET_LOCK & RELEASE_LOCK on
the same server for which a user operates
on.
• Sick
What you have is
• A system that can record every referrer from
every page on flickr
• Keeps track of time for each referrer and
associated with a page owner
• Done in a transaction across 3 tables
• Highly redundant
• Backed up regularly (Buffered writes on
MYISAM tables too baby!)
• Crazy through-put (federated 1 application on
dedicated hardware)
• On 6 servers x2 for redundancy == 12 
Questions? Yes?