This action might not be possible to undo. Are you sure you want to continue?
Marc Kwiatkowski memcache tech lead QCon 2010
How big is facebook?
400 million active users
More than 60 million status updates posted each day
More than 3 billion photos uploaded to the site each month
More than 5 billion pieces of content (web links, news stories, blog posts, notes, photo albums, etc.) shared each week
Average user has 130 friends on the site
50 Billion friend graph edges
Average user clicks the Like button on 9 pieces of content each month
Infrastructure ▪ Thousands of servers in several data centers in two regions ▪ Web servers ▪ ▪ ▪ DB servers Memcache Servers Other services ..
The scale of memcache @ facebook ▪ Memcache Ops/s ▪ over 400M gets/sec ▪ ▪ ▪ ▪ over 28M sets/sec over 2T cached items over 200Tbytes Network IO ▪ ▪ peak rx 530Mpkts/s 60GB/s peak tx 500Mpkts/s 120GB/s .
V.A typical memcache server’s P. ▪ Network I/O ▪ rx 90Kpkts/s 9.O.7MB/s ▪ ▪ tx 94Kpkts/s 19MB/s Memcache OPS ▪ ▪ 80K gets/s 2K sets/s ▪ 200M items .
Evolution of facebook’s architecture .
Scaling Facebook: Interconnected data Bob .
Scaling Facebook: Interconnected data Bob Brian .
Scaling Facebook: Interconnected data Felicia Bob Brian .
query database and SET object to memcache ▪ ▪ Update database row and DELETE object in memcache No derived objects in memcache ▪ Every memcache object maps to persisted data in database .Memcache Rules of the Game ▪ GET object from memcache ▪ on miss.
Scaling memcache .
Phatty Phatty Multiget QuickTime™ and a H.264 decompressor are needed to see this picture. .
Phatty Phatty Multiget (notes) ▪ ▪ PHP runtime is single threaded and synchronous To get good performance for data-parallel operations like retrieving info for all friends. ▪ ▪ Later we switched to true asynchronous I/O in a PHP C extension In both case the result was reduced latency through parallelism. it’s necessary to dispatch memcache get requests in parallel Initially we just used polling I/O in PHP. ▪ .
Pools and Threads PHP Client .
sp:12346 sp:12345 sp:12347 cs:12345 cs:12346 cs:12347 PHP Client .
sp:12345 sp:12346 sp:12347 cs:12345 cs:12346 cs:12347 PHP Client .
PHP Client .
Pools and Threads (notes) ▪ ▪ ▪ Privacy objects are small but have poor hit rates User-profiles are large but have good hit rates We achieve better overall caching by segregating different classes of objects into different pools of memcache servers Memcache was originally a classic single-threaded unix daemon ▪ ▪ This meant we needed to run 4 instances with 1/4 the RAM on each memcache server ▪ ▪ ▪ 4X the number of connections to each both 4X the meta-data overhead We needed a multi-threaded service .
.264 decompressor are needed to see this picture.Connections and Congestion ▪ [animation] QuickTime™ and a H.
Connections and Congestion (notes) ▪ As we added web-servers the connections to each memcache box grew. UDP allowed us to do congestion detection and admission- . ▪ ▪ ▪ Each webserver ran 50-100 PHP processes Each memcache box has 100K+ TCP connections UDP could reduce the number of connections ▪ As we added users and features. the number of keys permultiget increased ▪ Popular people and groups ▪ ▪ ▪ Platform and FBML We began to see incast congestion on our ToR switches.
Serialization and Compression ▪ We noticed our short profiles weren’t so short ▪ 1K PHP serialized object ▪ fb-serialization ▪ ▪ ▪ based on thrift wire format 3X faster 30% smaller ▪ gzcompress serialized strings .
Multiple Datacenters SC Web Memcache Proxy SF Web Memcache Proxy SC Memcache SF Memcache SC MySQL .
▪ Multiple ▪ Datacenters (notes) In the early days we had two data-centers ▪ The one we were about to turn off ▪ ▪ The one we were about to turn on Eventually we outgrew a single data-center ▪ ▪ Still only one master database tier Rules of the game require that after an update we need to broadcast deletes to all tiers The mcproxy era begins ▪ .
Multiple Regions West Coast SC Web Memcache Proxy East Coast VA Web SF Web Memcache Proxy SC Memcache SF Memcache VA Memcache Memcache Proxy SC MySQL MySql replication VA MySQL .
▪ Multiple Regions (notes) Latency to east coast and European users was/is terrible. So we deployed a slave DB tier in Ashburn VA ▪ ▪ ▪ Slave DB tracks syncs with master via MySQL binlog ▪ ▪ This introduces a race condition mcproxy to the rescue again ▪ Add memcache delete pramga to MySQL update and insert ops Added thread to slave mysqld to dispatch deletes in east coast via mcpro ▪ .
Replicated Keys Memcache Memcache Memcache key PHP Client key PHP Client key PHP Client .
Replicated Keys Memcache Memcache Memcache key#0 PHP Client key#1 key PHP Client key#3 PHP Client .
Replicated Keys (notes) ▪ ▪ Viral groups and applications cause hot keys More gets than a single memcache server can process ▪ ▪ ▪ (Remember the rules of the game!) That means more queries than a single DB server can process That means that group or application is effectively down ▪ Creating key aliases allows us to add server capacity. ▪ ▪ Hot keys are published to all web-servers Each web-server picks an alias for gets ▪ get key:xxx => get key:xxx#N ▪ Each web-server deletes all aliases .
Memcache Rules of the Game ▪ New Rule ▪ If a key is hot. pick an alias and fetch that for reads ▪ Delete all aliases on updates .
...Mirrored Pools Specialized Replica 1 Shard 1 Shard 2 Specialized Replica 2 Shard 1 Shard 2 General pool with wide fanout Shard 1 Shard 2 Shard 3 Shard n .
Mirrored Pools (notes) ▪ As our memcache tier grows the ratio of keys/packet decreases ▪ ▪ ▪ ▪ 100 keys/1 server = 1 packet 100 keys/100 server = 100 packets More network traffic More memcache server kernel interrupts per request ▪ Confirmed Info .critical account meta-data ▪ ▪ Have you confirmed your account? Are you a minor? ▪ Pulled from large user-profile objects .
Hot Misses ▪ [animation] .
very popular. and set When the object is very.Hot Misses (notes) ▪ Remember the rules of the game ▪ update and delete ▪ ▪ miss. that query rate can kill a database server We need flow control! ▪ . query.
on miss grab a mutex before issuing db query ▪ memcache-add a per-object mutex ▪ key:xxx => key:xxx#mutex If add succeeds do the query If add fails (because mutex already exists) back-off and try again After set delete mutex ▪ ▪ ▪ .Memcache Rules of the Game ▪ For hot keys.
Hot Deletes ▪ [hot groups graphics] .
and again . ▪ Each process that acquires a mutex finds that the object has been deleted again ▪ ▪ ....and again .Hot Deletes (notes) ▪ ▪ We’re not out of the woods yet Cache mutex doesn’t work for frequently updated objects ▪ like membership lists and walls for viral groups and applications..
Rules of the Game: Caching Intent ▪ Each memcache server is in the perfect position to detect and mitigate contention ▪ ▪ ▪ ▪ ▪ Record misses Record deletes Serve stale data Serve lease-ids Don’t allow updates without a valid lease id .
Next Steps .
Shaping Memcache Traffic ▪ mcproxy as router ▪ admission control ▪ tunneling inter-datacenter traffic .
Cache Hierarchies ▪ ▪ Warming up Cold Clusters Proxies for Cacheless Clusters .
5 ▪ ▪ UDP Proxy Facebook Architecture .Big Low Latency Clusters ▪ ▪ ▪ Bigger Clusters are Better Low Latency is Better L2.
Worse IS better ▪ Richard Gabriel’s famous essay contrasted ▪ ITS and Unix ▪ ▪ LISP and C MIT and New Jersey .
low latency with partial results is a better user experience memcache provides a few robust primitives ▪ ▪ ▪ ▪ ▪ key-to-server mapping parallel I/O flow-control traffic shaping ▪ that allow ad hoc solutions to a wide range of scaling issues .Why Memcache Works ▪ Uniform.
Inc. or its licensors..0 . "Facebook" is a registered trademark of Facebook. Inc. All rights reserved. 1.(c) 2010 Facebook.
This action might not be possible to undo. Are you sure you want to continue?
We've moved you to where you read on your other device.
Get the full title to continue listening from where you left off, or restart the preview.