You are on page 1of 87

The Dirty Work

Scaling Out Websites With Your Own Two Hands


Date: CWRU, October 27 2012 Author: Fred Hatfull

The Dirty Work


Scaling Out Websites With Your Own Two Hands

Who Am I?
CWRU Alumnus Software Developer at Yelp Infrastructure Engineer Availability Performance Productivity

PAGE: 2

CWRU, 27 October 2012 - Fred Hatfull

The Dirty Work


Scaling Out Websites With Your Own Two Hands

Overview
Whats Yelp? Scaling the Backend Accelerating Content Delivery Monitoring Performance

PAGE: 3

CWRU, 27 October 2012 - Fred Hatfull

The Dirty Work


Scaling Out Websites With Your Own Two Hands

Whats Yelp?
Help consumers find great local businesses Help businesses owners find more customers

PAGE: 4

CWRU, 27 October 2012 - Fred Hatfull

Numbers current as of 2012Q2.

The Dirty Work


Scaling Out Websites With Your Own Two Hands

Whats Yelp?

PAGE: 5

CWRU, 27 October 2012 - Fred Hatfull

Were hiring! yelp.com/careers

The Dirty Work


Scaling Out Websites With Your Own Two Hands

Whats Yelp?
Five Sites www.yelp.com biz.yelp.com api.yelp.com m.yelp.com admin.yelp.com

PAGE: 6

CWRU, 27 October 2012 - Fred Hatfull

www - consumer facing website biz - business owners website for managing ads, biz page, etc api - public and private APIs (mobile apps, too) m - mobile site (web browsers on mobile devices) admin - administrative tools

The Dirty Work


Scaling Out Websites With Your Own Two Hands

Whats Yelp?
Numerous Open-Source Projects: mrjob firefly testify tron many more: github.com/Yelp

PAGE: 7

CWRU, 27 October 2012 - Fred Hatfull

mrjob - python Map/Reduce framework rey - time-series statistics graphing testify - more python test framework tron - distributed cron

The Dirty Work


Scaling Out Websites With Your Own Two Hands

PAGE: 8

CWRU, 27 October 2012 - Fred Hatfull

The Dirty Work


Scaling Out Websites With Your Own Two Hands

In the Beginning...

Like most websites, it all started with a single server...

PAGE: 9

CWRU, 27 October 2012 - Fred Hatfull

You have probably set up a website like this....

The Dirty Work


Scaling Out Websites With Your Own Two Hands

In the Beginning

PAGE: 10

CWRU, 27 October 2012 - Fred Hatfull

Apache, Python, mySQL, Linux.

The Dirty Work


Scaling Out Websites With Your Own Two Hands

In the Beginning
16.32.64.128

PAGE: 11

CWRU, 27 October 2012 - Fred Hatfull

No load balancer, no internal DNS, no web framework. Just us, mod_python, and mySQL.

The Dirty Work


Scaling Out Websites With Your Own Two Hands

Up and Running
16.32.64.128

web1

db1

web2

* Names changed to protect the innocent


PAGE: 12
CWRU, 27 October 2012 - Fred Hatfull

Traffic starts picking up. One box doesnt cut it any more... time to scale out horizontally. Adding webs is the low hanging fruit.

The Dirty Work


Scaling Out Websites With Your Own Two Hands

Up and Running
16.32.64.128

web1

web2

web3

db1
web4

web5

web6

* Names changed to protect the innocent


PAGE: 13
CWRU, 27 October 2012 - Fred Hatfull

Ok, we begin to hit the limits of horizontal scaling. The webapp can always benet from having more machines (+HAproxy)

The Dirty Work


Scaling Out Websites With Your Own Two Hands

Up and Running
16.32.64.128

web1

web2

!
db1

web3

web4

web5

web6

* Names changed to protect the innocent


PAGE: 14
CWRU, 27 October 2012 - Fred Hatfull

But the database is now under very heavy load!

The Dirty Work


Scaling Out Websites With Your Own Two Hands

Scaling the Database


Options Find a faster data store? Use separate databases? Sharding? Replication?

PAGE: 15

CWRU, 27 October 2012 - Fred Hatfull

There are a few classic options for scaling up mySQL. We could switch datastores... maybe mySQL is just slow? How about Oracle? MSSQL? etc. We could introduce an entirely new database machine with its own mySQL instance... basically just a clone of the current one. Both DBs dont know anything about each other. We could shard the database by having multiple machines where each is responsible for a certain set of keys. Or we could just replicate the current database to accomodate more traffic, and hope writes dont get overwhelming.

The Dirty Work


Scaling Out Websites With Your Own Two Hands

Scaling the Database


Options Find a faster data store? Use separate databases? Sharding? Replication?

PAGE: 16

CWRU, 27 October 2012 - Fred Hatfull

Ok. Its 2004... noSQL isnt around, really, and our data is pretty relational anyway. mySQL is looking like the fastest store that meets our requirements.

The Dirty Work


Scaling Out Websites With Your Own Two Hands

Scaling the Database


Options Find a faster data store? Use separate databases? Sharding? Replication?

PAGE: 17

CWRU, 27 October 2012 - Fred Hatfull

We could just set up an entirely new database machine and run the new database in parallel. Have to make two read queries instead of one now, gure out where to send writes, and keeping schemas in sync is a nightmare. Clearly not scalable.

The Dirty Work


Scaling Out Websites With Your Own Two Hands

Scaling the Database


Options Find a faster data store? Use separate databases? Sharding? Replication

PAGE: 18

CWRU, 27 October 2012 - Fred Hatfull

We are actually doing a form of sharding here, but its not quite the conventional mastermaster sharding that usually comes to mind. master-master would work, but its a huge pain to get right and requires a lot of effort to make sure keys go to and are retrieved from the shards where they belong. Our read-heavy traffic patterns make us an ideal candidate for replication to massively increase read capacity while reducing engineering overhead.

The Dirty Work


Scaling Out Websites With Your Own Two Hands

PAGE: 19

CWRU, 27 October 2012 - Fred Hatfull

The Dirty Work


Scaling Out Websites With Your Own Two Hands

Database Replication
db1

webs

Reads/Writes

PAGE: 20

CWRU, 27 October 2012 - Fred Hatfull

Prior to replication. Just one database, handles all reads/writes. All webs are talking to this db.

The Dirty Work


Scaling Out Websites With Your Own Two Hands

Database Replication
db1

Reads/Writes webs Reads


readdb1
PAGE: 21
CWRU, 27 October 2012 - Fred Hatfull

Replicated Writes

Simple two-database replication scheme. All write traffic hits the master, `db1`. Read traffic is split between the master and a read-only database. When writes come through the master database, the master database informs the slave database that the write happens so that the slave replicates the action taken on the master.

The Dirty Work


Scaling Out Websites With Your Own Two Hands

Database Replication
db1

db2

Reads/Writes webs Reads


readdb1
PAGE: 22
CWRU, 27 October 2012 - Fred Hatfull

Replicated Writes

Its good practice to keep another master early in the replication stream that can be promoted to write master if the write master fails.

The Dirty Work


Scaling Out Websites With Your Own Two Hands

Database Replication
db1

db2

Reads/Writes webs Reads


readdb1
PAGE: 23
CWRU, 27 October 2012 - Fred Hatfull

Replicated Writes

Replication cant always happen immediately, depending on the load on the slave db and the master and how far apart they are/network congestion.

The Dirty Work


Scaling Out Websites With Your Own Two Hands

Slide or page title goes here


Title Page General Bullet Points Graph Page Image Page Closing Page

Life used to be so easy... :(

PAGE: 24

CWRU, 27 October 2012 - Fred Hatfull

The Dirty Work


Scaling Out Websites With Your Own Two Hands

Database Replication
Strong Consistency vs. Eventual Consistency

PAGE: 25

CWRU, 27 October 2012 - Fred Hatfull

A shift in thinking. Life is easy in strongly-consistent systems. Although scaling can be challenging, you get a guarantee that data is always up-to-date. Eventual consistency is a big change and has lots of nasty corner cases ready to bite.

The Dirty Work


Scaling Out Websites With Your Own Two Hands

Database Replication
Replication has lots of cons: Expensive/poorly formed queries have multiplicative effect Replication delay can lead to inconsistent views Figuring out when to hit master vs. slave etc...

PAGE: 26

CWRU, 27 October 2012 - Fred Hatfull

While horizontally-scalable read capacity is a big win, there are lots of new things to think about.

The Dirty Work


Scaling Out Websites With Your Own Two Hands

Replication: Gross Queries


1ms

4500ms

2ms

master

webs

slaves
PAGE: 27
CWRU, 27 October 2012 - Fred Hatfull

A normal replication stream. A lots of nice, small queries oating by. Master has no problem.

The Dirty Work


Scaling Out Websites With Your Own Two Hands

Replication: Gross Queries


2ms 1ms

4500ms

master

2ms

webs
2ms

slaves
PAGE: 28
CWRU, 27 October 2012 - Fred Hatfull

Small writes get replicated ne, but the big, nasty table-scan suddenly locks a whole bunch of rows and waits seconds for mySQL to gure out which rows should come back. This delays all the writes in the replication stream and causes an increase in replication delay, exacerbating the inconsistency

The Dirty Work


Scaling Out Websites With Your Own Two Hands

Replication: Gross Queries


slaves
4500ms
1ms

master

4500ms

webs
4500ms

PAGE: 29

CWRU, 27 October 2012 - Fred Hatfull

In the case of a big INSERT/UPDATE/DELETE, that query also needs to replicate to the slaves, causing big delays on all of the slaves

The Dirty Work


Scaling Out Websites With Your Own Two Hands

Replication: Consistency
db master web

db replica

PAGE: 30

CWRU, 27 October 2012 - Fred Hatfull

Heres an example of where replication can introduce inconsistency. Our user wants to know about the restaurant Happy Dog

The Dirty Work


Scaling Out Websites With Your Own Two Hands

Replication: Consistency
db master web

db replica

PAGE: 31

CWRU, 27 October 2012 - Fred Hatfull

As part of the request, the web server handling the request asks a DB replica for information about Happy Dog

The Dirty Work


Scaling Out Websites With Your Own Two Hands

Replication: Consistency
db master web

db replica

PAGE: 32

CWRU, 27 October 2012 - Fred Hatfull

The replica comes back with the requested information...

The Dirty Work


Scaling Out Websites With Your Own Two Hands

Replication: Consistency
db master web

db replica

PAGE: 33

CWRU, 27 October 2012 - Fred Hatfull

and its returned to the user. Fine. Everything here is as it used to be.

The Dirty Work


Scaling Out Websites With Your Own Two Hands

Replication: Consistency
db master web

db replica

PAGE: 34

CWRU, 27 October 2012 - Fred Hatfull

Now our user wants to write a review about Happy Dog

The Dirty Work


Scaling Out Websites With Your Own Two Hands

Replication: Consistency
db master web

db replica

PAGE: 35

CWRU, 27 October 2012 - Fred Hatfull

Since we have to write data, our web connects to the master database and issues the writes for the new review

The Dirty Work


Scaling Out Websites With Your Own Two Hands

Replication: Consistency
db master web

db replica

PAGE: 36

CWRU, 27 October 2012 - Fred Hatfull

Thats the end of our users web request. After she POSTs the review she gets redirected back to Happy Dogs page. Like before, her web hits a replica instead of the master because it only needs to do reads. However, her request gets through the web and to the replica before her review makes it to the replica in the replication stream...

The Dirty Work


Scaling Out Websites With Your Own Two Hands

Replication: Consistency
wheres my review??
db master web

db replica

PAGE: 37

CWRU, 27 October 2012 - Fred Hatfull

As a result, our user sees the stale information, even though she just contributed content! Now the user thinks her content has disappeared.

The Dirty Work


Scaling Out Websites With Your Own Two Hands

Replication: Consistency
Writes: always master Reads: can use either master or slave majority can use slave sometimes you want to hit the master for consistency

PAGE: 38

CWRU, 27 October 2012 - Fred Hatfull

Heres how DB access is split up based on what kind of activity you are doing. Writes (INSERTs/UPDATEs/DELETEs) always hit the master, since nothing else will accept writes. Reads (SELECTs) can use either the master or a slave, and usually only need a slave. Its up to the application to gure out if it needs to hit the master, and that can be tricky

The Dirty Work


Scaling Out Websites With Your Own Two Hands

Replication: Consistency
When Does Consistency Matter? Consistency only matters when its expected example: users writing reviews If the user doesnt know information is out of date... is it really out of date?

PAGE: 39

CWRU, 27 October 2012 - Fred Hatfull

The dirty secret is: consistency doesnt really always matter.

The Dirty Work


Scaling Out Websites With Your Own Two Hands

Replication: Consistency
Asking for the master: after writes, hit the master until replication catches up webs/load-balancers can remember user state but expensive, brittle instead, teach clients to ask for the master dirty session cookie

PAGE: 40

CWRU, 27 October 2012 - Fred Hatfull

Always hit the master for writes. After writes, hang on to a cookie for X seconds. While the user has the cookie, hit the master.

The Dirty Work


Scaling Out Websites With Your Own Two Hands

Replication: Dirty Session


After writes, issue dirty session cookie cookie contains a timestamp in the future webs check for cookie if time is before timestamp in cookie: redirect to master db else: remove cookie, continue via replica
PAGE: 41
CWRU, 27 October 2012 - Fred Hatfull

The Dirty Work


Scaling Out Websites With Your Own Two Hands

Slide or page title goes here


Title Page General Bullet Points Graph Page Image Page Closing Page

PAGE: 42

CWRU, 27 October 2012 - Fred Hatfull

Caches

The Dirty Work


Scaling Out Websites With Your Own Two Hands

Caches
Fastest thing since sliced bread Often seen as a drop-in performance enhancement Can be hard to get right Present hidden availability implications

PAGE: 43

CWRU, 27 October 2012 - Fred Hatfull

Take advantage of precomputed/pre-retrieved results in-memory. While it seems like a drop-in speed upgrade, they can be surprisingly hard to get right.

The Dirty Work


Scaling Out Websites With Your Own Two Hands

Caches: Types

webs

dbs

load balancer

PAGE: 44

CWRU, 27 October 2012 - Fred Hatfull

Several different types of caches in several different places...

The Dirty Work


Scaling Out Websites With Your Own Two Hands

Caches: Types
HTTP caches (varnish etc)

webs

dbs

load balancer

PAGE: 45

CWRU, 27 October 2012 - Fred Hatfull

HTTP Caches - cache full HTTP responses. Great for static sites or dynamic sites with content that changes infrequently. Ex: varnish, squid

The Dirty Work


Scaling Out Websites With Your Own Two Hands

Caches: Types
HTTP caches (varnish etc) in-memory caches

webs

dbs

load balancer

PAGE: 46

CWRU, 27 October 2012 - Fred Hatfull

Memoization of results from ... things. Functions, db queries, etc. Typically per-node (not shared between webs, for example)

The Dirty Work


Scaling Out Websites With Your Own Two Hands

Caches: Types
HTTP caches (varnish etc) in-memory caches

memcache

webs

dbs

load balancer

PAGE: 47

CWRU, 27 October 2012 - Fred Hatfull

Memcache! Frequently used to store computed results for faster lookup and load reduction. Used to cache anything from raw DB rows to larger queries (joins) to gzipd blobs to serialized data structures

The Dirty Work


Scaling Out Websites With Your Own Two Hands

Caches: Advantages
Primary cache in most places: memcache Takes advantage of fast in-memory key/value lookups Good for expensive operations Complex DB queries - 100s of ms Network roundtrip to memcache - 2-3ms Misses are cheap - only network roundtrip cost

PAGE: 48

CWRU, 27 October 2012 - Fred Hatfull

The Dirty Work


Scaling Out Websites With Your Own Two Hands

Caches: Pitfalls
Cache libraries can make it easy to cache weird things: Database models memcache connections (!) ??? - anything else that can be serialized (via pickle etc) Causes problems when object definitions change Cannot enumerate cache contents easily to fix polluted caches

PAGE: 49

CWRU, 27 October 2012 - Fred Hatfull

Especially in dynamic languages, it can be easy to say oh do some serialization or whatever and then cache it! However, many times youll have things like SQLAlchemy models (which have connections to your database!), the connection you are using to access memcache, and more. If any of those object denitions change (or you change/remove code that pickle/json expects to have to deserialize), you may end up with a polluted cache which contains entries that you cant decode. Memcache also doesnt allow you to enumerate cache entries, so programmatically invalidating certain subsets of keys is hard if not impossible.

The Dirty Work


Scaling Out Websites With Your Own Two Hands

Caches: Pitfalls
Makes exceeding failover capacity really easy If memcache cluster goes down what happens? How do you handle additional web and DB load? Solution: Build in additional capacity Be able to isolate and turn off expensive features Have an emergency maintenance mode
PAGE: 50
CWRU, 27 October 2012 - Fred Hatfull

Memcache helps to reduce load, but its another point of failure. Memcache outages can cause increased load proportional to what it offloaded for you, which can easily cause cascading failures if not handled correctly.

The Dirty Work


Scaling Out Websites With Your Own Two Hands

Datacenters

PAGE: 51

CWRU, 27 October 2012 - Fred Hatfull

The Dirty Work


Scaling Out Websites With Your Own Two Hands

Datacenters
Geographic distribution helps you mitigate the speed of light Replication problems expand to more systems: memcache code deployments offline batch processing database slaves see non-trivial replication delay

PAGE: 52

CWRU, 27 October 2012 - Fred Hatfull

Its like database replication for your whole system. Out-of-sync caches can be problematic, and database replication becomes a super-non-trivial problem.

The Dirty Work


Scaling Out Websites With Your Own Two Hands

Datacenters
Solutions: Each datacenter gets write master Still only One True Master, other write masters replicate Read/reporting slaves replicate from local write master Replicate cache inserts, invalidations Take advantage of existing mySQL replication stream

PAGE: 53

CWRU, 27 October 2012 - Fred Hatfull

Provide utilities for monitoring replication delay for services using the replication stream

The Dirty Work


Scaling Out Websites With Your Own Two Hands

Slide or page title goes here


Title Page General Bullet Points Graph Page Image Page Closing Page

PAGE: 54

CWRU, 27 October 2012 - Fred Hatfull

The Dirty Work


Scaling Out Websites With Your Own Two Hands

Front-End: Principles
Reduce HTTP Round-Trips Reduce download sizes Dont do things browsers dont like

PAGE: 55

CWRU, 27 October 2012 - Fred Hatfull

Lots of front-end performance tips/tricks/hacks. Most of them are based on these guidelines.

The Dirty Work


Scaling Out Websites With Your Own Two Hands

CDNs
Content Delivery Networks Maintains copies of your assets Probably serves your assets faster than you do Examples: Akamai Cloudfront (Amazon Web Services) Cotendo
PAGE: 56
CWRU, 27 October 2012 - Fred Hatfull

Like a big, giant, mega-cache.

The Dirty Work


Scaling Out Websites With Your Own Two Hands

CDNs: Why?
Huge networks of globally distributed edge nodes e.g. Akamai at > 100,000 Easy to setup and drop in Transparent layer, just change hostnames to CDN Much lower bandwidth and equipment costs Asset gets uploaded to CDN once (ish)

PAGE: 57

CWRU, 27 October 2012 - Fred Hatfull

[1] http://www.datacenterknowledge.com/archives/2009/05/14/whos-got-the-most-webservers/

The Dirty Work


Scaling Out Websites With Your Own Two Hands

Subdomain Sharding
RFC 2616 (HTTP 1.1): A single-user client SHOULD NOT maintain more than 2 connections with any server or proxy.

PAGE: 58

CWRU, 27 October 2012 - Fred Hatfull

http://www.ietf.org/rfc/rfc2616.txt

The Dirty Work


Scaling Out Websites With Your Own Two Hands

Subdomain Sharding
Parallel Connections by Browser: Firefox 4.x: 6 Firefox 3.6.x: 6 Internet Explorer: 2-6 Chrome: 6 Opera 11.x: 8 Safari 5.x: 6
PAGE: 59
CWRU, 27 October 2012 - Fred Hatfull

http://stackoverow.com/questions/5751515/official-references-for-default-values-ofconcurrent-http-1-1-connections-per-se

The Dirty Work


Scaling Out Websites With Your Own Two Hands

Subdomain Sharding
Distribute assets traffic across sharded subdomains Before: media.yelp.com -> 16.32.64.128 After: media[1-4].yelp.com -> media.yelp.com -> 16.32.64.128

PAGE: 60

CWRU, 27 October 2012 - Fred Hatfull

The Dirty Work


Scaling Out Websites With Your Own Two Hands

Cache Me If You Can


Reduce round trips by caching assets Use HTTP Cache-Control headers Use very long times e.g. 10 years Version assets via URL path or query params

PAGE: 61

CWRU, 27 October 2012 - Fred Hatfull

There are alternatives... e.g. ETag and If-Modied-Since. These require HTTP round-trips to compute, though, so even though you dont end up needing to re-download the asset you still end up with more TCP connections.

The Dirty Work


Scaling Out Websites With Your Own Two Hands

Cache Me If You Can


GET /assets/js/1/32dce72546/main.js HTTP/1.1 200 OK Cache-Control: max-age=315360000 Content-Encoding: gzip, deflate Content-Length: 8437 ...

PAGE: 62

CWRU, 27 October 2012 - Fred Hatfull

The Dirty Work


Scaling Out Websites With Your Own Two Hands

Cache Me If You Can


GET /assets/js/1/32dce72546/main.js HTTP/1.1 200 OK

Global version + hash of asset

Cache-Control: max-age=315360000 Content-Length: 8437 ...

10 years Content-Encoding: gzip, deflate

PAGE: 63

CWRU, 27 October 2012 - Fred Hatfull

The Dirty Work


Scaling Out Websites With Your Own Two Hands

Cookieless Domains
GET / HTTP/1.1 Host: www.yelp.com Connection: keep-alive Cache-Control: max-age=0 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_7_5) AppleWebKit/537.11 (KHTML, like Gecko) Chrome/23.0.1271.52 Safari/537.11 Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8 Accept-Encoding: gzip,deflate,sdch Accept-Language: en-US,en;q=0.8 Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.3 Cookie: yuv=moFlBQmJAAfti607vjGInzN7FFk8_DSjAfHQ_lW4YaGwZEJqQqZmcEyNyeNTQam7Rqm2q6EOieOhxmRTZiNuMNmm_G7pet1m; __qca=P0-2101128917-1317837280654; __gads=ID=91ec4bf2b3418833:T=1321911452:S=ALNI_MbtB7iIIKDvemgnlX95Ywi4BsWEPg; bse=63c34cdcaef7b20a01cc89cc34ccfff5; fd=0; searchPrefs=%7B%22seen_pop%22%3Atrue%2C%22seen_crop_pop %22%3Atrue%2C%22prevent_scroll%22%3Afalse%2C%22maptastic_mode%22%3Atrue%2C%22mapsize%22%3A%22large %22%2C%22rpp%22%3A40%7D; fbm_97534753161=base_domain=.yelp.com; s=YGc7FduEf1Wv2m5hE1sMWU5pMolrEG8x; hl=en_US; recentlocations=New+York%2C+NY%2C+USA%3B%3B706+Mission+St%2C+San+Francisco%2C+CA%2C+USA%3B %3BLower+Pac+Heights%2C+San+Francisco%2C+CA%2C+USA%3B%3BFillmore%2C+MO%2C+USA%3B%3B1251+Waller+St%2C +San+Francisco%2C+CA%2C+USA%3B%3BAnn+Arbor%2C+MI%2C+USA%3B%3BHaight-Ashbury%2C+San+Francisco%2C+CA%2C +USA%3B%3BSOMA%2C+San+Francisco%2C+CA%2C+USA%3B%3BPittsburgh%2C+PA%2C+USA%3B%3BUnion+Square%2C+San +Francisco%2C+CA%2C+USA%3B%3BLondon%2C+UK%3B%3B706+Mission%2C+Kingsburg%2C+CA%2C+USA%3B%3BMiami%2C+FL %2C+USA%3B%3B510+Central+Ave%2C+Hot+Springs%2C+AR%2C+USA; location=%7B%22unformatted%22%3A+%22San +Francisco%2C+CA%22%2C+%22city%22%3A+%22San+Francisco%22%2C+%22state%22%3A+%22CA%22%2C+%22country %22%3A+%22US%22%7D; __utma=165223479.655521012.1316285892.1350862721.1351317047.69; __utmb=165223479.3.10.1351317047; __utmc=165223479; __utmz=165223479.1338263014.44.13.utmcsr=google| utmccn=(organic)|utmcmd=organic|utmctr=(not%20provided); fbsr_97534753161=Le92MKZfPPnUrIyfRghoVKhIdhfzAt4wW1jpJHo3fIk.eyJhbGdvcml0aG0iOiJITUFDLVNIQTI1NiIsImNvZ GUiOiIzNWM5OTViNzIxYTgyYjQzMzVmZjNlNDAuMS0xNDczNjMwMTIyfDEzNTEzMTczNDl8Vm1mLU5DLXh1RkswSHRmdF9wQWRDR0N RTGNZIiwiaXNzdWVkX2F0IjoxMzUxMzE3MDQ5LCJ1c2VyX2lkIjoiMTQ3MzYzMDEyMiJ9 PAGE: 64
CWRU, 27 October 2012 - Fred Hatfull

This is big request...

The Dirty Work


Scaling Out Websites With Your Own Two Hands

Cookieless Domains

2128

bytes = 2k!

PAGE: 65

CWRU, 27 October 2012 - Fred Hatfull

Could be as big as the asset itself!

The Dirty Work


Scaling Out Websites With Your Own Two Hands

Cookieless Domains
Only assign cookies to domains which need them Put static assets/other cookie-less content elsewhere *.yelp.com vs. *.yelp-cdn.com

PAGE: 66

CWRU, 27 October 2012 - Fred Hatfull

Previous request becomes 399 bytes of overhead.

The Dirty Work


Scaling Out Websites With Your Own Two Hands

Assorted Front-End Tips


Use gzip Always include styles in the <head> Always put scripts at the end of <body> Avoid inline styles Avoid manipulating the DOM Try not to trigger repaints Load non-critical content via AJAX (if applicable)
PAGE: 67
CWRU, 27 October 2012 - Fred Hatfull

Lots more... web is abundant with tips. These are some of the ones we use.

The Dirty Work


Scaling Out Websites With Your Own Two Hands

Slide or page title goes here

[Cat]

PAGE: 68

CWRU, 27 October 2012 - Fred Hatfull

The Dirty Work


Scaling Out Websites With Your Own Two Hands

Monitoring
Mission-critical Needs to be simple, easy-to-understand, durable Strategies vary widely based on application requirements Drop-in products only get you so far Exposes: what is broken when what works and how well it works
PAGE: 69
CWRU, 27 October 2012 - Fred Hatfull

The Dirty Work


Scaling Out Websites With Your Own Two Hands

Monitoring: Our Strategy


Nagios - Alerts Firefly - Performance and availability analysis Tertiary tools Smokeping Pharos Ganglia etc...
PAGE: 70
CWRU, 27 October 2012 - Fred Hatfull

The Dirty Work


Scaling Out Websites With Your Own Two Hands

Monitoring: Alerts
Nagios Off-the-shelf solution Flexible custom reporting Well-understood for monitoring systems e.g. load, memory/disk usage, hardware failures, etc Needs well-known states (CRITICAL/OK)

PAGE: 71

CWRU, 27 October 2012 - Fred Hatfull

The Dirty Work


Scaling Out Websites With Your Own Two Hands

Monitoring: Performance
Firefly Graphical front-end to time-series data Extensible data ingestion API Open-source: github.com/Yelp/firefly Statmonster Code-name for data collection and preprocessing Upstream of Firefly Turns log lines into stats, analyzes, and stores
PAGE: 72
CWRU, 27 October 2012 - Fred Hatfull

The Dirty Work


Scaling Out Websites With Your Own Two Hands

Monitoring: Logging
Brief Aside: Logging Incredibly important for application development Often only source of information when everything blows up Useful for pulling data out of the webapp on the fly Huge volumes of log data require special infrastructure

PAGE: 73

CWRU, 27 October 2012 - Fred Hatfull

The Dirty Work


Scaling Out Websites With Your Own Two Hands

Monitoring: Scribe
Distributed log aggregation system Composed of leaf and aggregator nodes Leaves collect log lines Aggregators aggregate incoming lines based on channel Each line associated with a channel e.g. (yelp_timings, Homepage render took 32.9ms Eventually Consistent
PAGE: 74
CWRU, 27 October 2012 - Fred Hatfull

Note: eventual consistency != strong consistency. Lines may be delayed/out-of-order

The Dirty Work


Scaling Out Websites With Your Own Two Hands

Monitoring: Scribe
(channel2, message)

[channel1]
(channel3, message)

webs

(channel1, message) local consumers

(channel2, message)

[channel2, channel3]
(channel1, message)

aggregators

leaves
PAGE: 75
CWRU, 27 October 2012 - Fred Hatfull

The Dirty Work


Scaling Out Websites With Your Own Two Hands

Monitoring: Performance
scribe
RRD Data Chunks

log lines (json etc)

log digestion windowing stats stat generation (performance.home, 1.2) additional statistics
PAGE: 76
CWRU, 27 October 2012 - Fred Hatfull

The Dirty Work


Scaling Out Websites With Your Own Two Hands

Monitoring: Performance
scribe
RRD Data Chunks

log lines (json etc)

log digestion windowing stats stat generation (performance.home, 1.2) additional statistics
PAGE: 77
CWRU, 27 October 2012 - Fred Hatfull

The Dirty Work


Scaling Out Websites With Your Own Two Hands

Monitoring: Performance
{ time_start: 10, time_dispatch: 12, time_end: 44, checkpoints: { user_details: 22, review_collection: 37, template_render: 43 } } def digest(e): time_start = e[time_start] checkpoints = e[checkpoints] total_time = e[time_end] - time_start compute_time = e[template_render] - time_start reviews_time = checkpoints[review_collection] - time_start emit([performance, total_time], total_time) emit([performance, compute_time], compute_time) emit([checkpoint_times, reviews], reviews_time)

([performance, total_time], 34, 10) ([performance, compute_time], 32, 10) ([checkpoint_times, reviews], 27, 10)

PAGE: 78

CWRU, 27 October 2012 - Fred Hatfull

Trailing 10 in the emitted stats is the timestamp

The Dirty Work


Scaling Out Websites With Your Own Two Hands

Monitoring: Performance
scribe
RRD Data Chunks

log lines (json etc)

log digestion windowing stats stat generation (performance.home, 1.2) additional statistics
PAGE: 79
CWRU, 27 October 2012 - Fred Hatfull

The Dirty Work


Scaling Out Websites With Your Own Two Hands

Monitoring: Performance
([performance, total_time], 34, 10) ([performance, compute_time], 32, 10) ([checkpoint_times, reviews], 27, 10)

[performance, total_time]
(10, 23) (11, 29) (10, 35) (11, 28) (12, 39) (8, 32) (8, 40) (9, 36) (9, 33)

(10s buer)

PAGE: 80

CWRU, 27 October 2012 - Fred Hatfull

The Dirty Work


Scaling Out Websites With Your Own Two Hands

Monitoring: Performance
([performance, total_time], 34, 10) ([performance, compute_time], 32, 10) ([checkpoint_times, reviews], 27, 10)

[performance, total_time]
(10, 23) (11, 29) (10, 35) (11, 28) (12, 39) (8, 32) (8, 40) (9, 36) (9, 33)

(10s buer)

PAGE: 81

CWRU, 27 October 2012 - Fred Hatfull

The Dirty Work


Scaling Out Websites With Your Own Two Hands

Monitoring: Performance
([performance, total_time], 34, 10) ([performance, compute_time], 32, 10) ([checkpoint_times, reviews], 27, 10)

[performance, total_time]
(10, 23) (11, 29) (10, 35) (10, 34) (11, 28) (12, 39) (8, 32) (8, 40) (9, 36) (9, 33)

(10s buer)

PAGE: 82

CWRU, 27 October 2012 - Fred Hatfull

The Dirty Work


Scaling Out Websites With Your Own Two Hands

Monitoring: Performance
(10, 23) (11, 29) (10, 35) (10, 34) (11, 28) (12, 39) (8, 32) (8, 40) (9, 36) (9, 33)

stats
50th, 75th, 95th, 99th, mean, count, etc...

PAGE: 83

CWRU, 27 October 2012 - Fred Hatfull

The Dirty Work


Scaling Out Websites With Your Own Two Hands

Slide or page title goes here


Title Page General Bullet Points Graph Page Image Page Closing Page

PAGE: 84

CWRU, 27 October 2012 - Fred Hatfull

The Dirty Work


Scaling Out Websites With Your Own Two Hands

Questions?

PAGE: 85

CWRU, 27 October 2012 - Fred Hatfull

The Dirty Work


Scaling Out Websites With Your Own Two Hands

Were Hiring!
Full-Time Interns Front-End and Back-End Engineering http://www.yelp.com/careers

PAGE: 86

CWRU, 27 October 2012 - Fred Hatfull

The Dirty Work


Scaling Out Websites With Your Own Two Hands

Slide or page title goes here


Title Page General Bullet Points Graph Page

Thanks! Image Page


Closing Page
PAGE: 87
CWRU, 27 October 2012 - Fred Hatfull

You might also like