You are on page 1of 46

Continuous Deployment at : A Tale of Two Approaches

Ross Snyder ross@etsy.com @beamrider9

March 9, 2013

A quick primer on

is:
The global marketplace we make together.

is:
The premier destination for handmade goods, vintage items, and craft supplies.

simplertimestoys

lacklusterco

norwesterseaglass

quick facts:
22+ million members 800,000+ active shops

(as of March 2013)

18+ million items currently for sale 20 cents to list item, 3.5% transaction fee 400+ employees (majority in Brooklyn)

Since opening its doors in June 2005, Etsy has grown virtually non-stop.
$1,000 $800 $600 $400 $200 $0

2005

2006

2007

2008

2009

2010

2011

2012

Gross Merchandise Sales ($MM)

A nice problem to have: Our site is so successful, how can we move fast enough to keep up with demand?

CONTINUOUS DEPLOYMENT

:
The Early Years
(2005 - 2008)

: The Early Years

1. Spend signicant time writing code

: The Early Years


1. Weeks writing code

2. Painful source control merge

: The Early Years


1. Weeks writing code 2. Painful merge

3. Hand o to someone else to deploy

: The Early Years


1. Weeks writing code 2. Painful merge 3. Hand o to deployers

4. Deploy, site goes down

: The Early Years


1. Weeks writing code 2. Painful merge 3. Hand o to deployers 4. Deploy, site down

5. Roll back deploy

: The Early Years


1. Weeks writing code 2. Painful merge 3. Hand o to deployers 4. Deploy, site down 5. Roll back deploy

6. Spend hours (days?) xing bugs

: The Early Years


1. Weeks writing code 2. Painful merge 3. Hand o to deployers 4. Deploy, site down 5. Roll back deploy 6. Fix bugs

7. Go back to step 2

: The Early Years

WATERFALL!

: The Early Years


Pros: Early Etsy engineers used this release cycle to bootstrap the marketplace from nothing. Forever grateful.

: The Early Years


Cons: Large changesets Infrequent deploys Weak condence in deploy success Signicant time spent deploying Low ability to experiment/iterate/react Developer stress/unhappiness

: The Early Years


By late 2008, Etsy is still a startup, but has the deploy process of a much bulkier company.

Popularity is on the verge of outpacing capacity.

:
Today

: Today

1. Small changesets, deployed frequently

: Today
1. Small changesets

2. Engineers deploy the site

: Today
And not just engineers, but also: Designers Product Folks Upper Management Board Members Dogs

: Today
1. Small changesets 2. Engineers deploy

3. Deploys are fast and near-eortless

: Today
1. Small changesets 2. Engineers deploy 3. Deploys are fast

4. Most changes behind cong ags (safer deploys)

: Today
1. Small changesets 2. Engineers deploy 3. Deploys are fast 4. Changes behind ags

5. Graphs/metrics to assess deploy

: Today
1. Small changesets 2. Engineers deploy 3. Deploys are fast 4. Changes behind ags 5. Copious graphs/metrics

6. If issues, x immediately & roll forward

: Today
This isnt license to break stu, quickly.

Engineer-driven QA and solid unit testing are integral parts of the process.

: Today
1. Small changesets 2. Engineers deploy 3. Deploys are fast 4. Changes behind ags 5. Copious graphs/metrics 6. Fix fast & roll forward

7. Repeat 25+ times per day, every day

Then:
1. Weeks writing code 2. Painful merge 3. Hand o to deployers 4. Deploy, site down 5. Roll back deploy 6. Fix bugs, go to step 2

Now:
1. Small changesets 2. Engineers deploy 3. Deploys are fast 4. Changes behind ags 5. Copious graphs/metrics 6. Fix fast & roll forward

Etsy Deploy Stats: 2012


Deployed to production 6,419 times On average, 535/month, 25/day Additional 3,851 cong-only deploys 196 dierent people deployed to prod Nov/Dec 2012: deployed 752 times

Why does it work?

Continuous Deployment Math


N = # of deploys P = probability of site degradation S = average severity of degradation T = time to detect/resolve

Expected = N*P*S*T Downtime

Continuous Deployment Math


N = # of deploys P = prob. of degradation S = avg. severity of degradation T = time to detect/resolve

Before:
N=1 P = 0.5 S = 0.7 T = 100

Now:
N = 250 P = 0.1 S = 0.05 T = 5

E.D. = 35

E.D. = 6.25

(all numbers completely arbitrary)

Big Takeaway
Etsy circa 2013 (400+ employees) acts, in some ways, more like a startup than Etsy circa 2008 (40+ employees).

Continuous Deployment makes possible: Continuous Experimentation

http://etsy.me/continuous-experimentation

Continuous Experimentation
1. Small changes 2. Run experiment (A/B test) 3. Analyze data 4. Re-examine assumptions

Repeat continuously in pursuit of larger goals.

Heard since 2010: Neat experiment, but this will never scale.

As of 2013, Etsy has 100+ engineers still going strong.

Some Etsy Customizations


Deploying is a rst-class feature. Inability to deploy is a P1 incident (same as site down).

Some Etsy Customizations


We continuously deploy not just the main Etsy website, but as much as possible: Internal admin site API Big data Search Blog Deployinator itself

Some Etsy Customizations


In the rare case we cant continuously deploy, we create alternative tools: Database schema changes PCI-DSS environment (credit cards) We do continuously deploy as much of our payment processing as is safe & legal (98%).

Some Etsy Customizations


Keeping deploys fast is paramount and worth the investment in manpower & hardware.

Some Etsy Customizations


Continuous deployment is all about moving forward, sometimes at the expense of the past. Our solution: engineering-wide bug rotation, one day a month, every engineer participates.

Fun Fact:
Continuous Deployment is a fantastic recruitment tool for attracting engineers who like to move fast and get stu done.

Learn more: http://codeascraft.etsy.com/ Etsy open source (Deployinator, StatsD) http://etsy.github.com/ Join the fun: http://www.etsy.com/careers