You are on page 1of 16

Keeping the Site Up Under Extreme Traffic

Who am I? Kevin Diamond

CTO of HauteLook Oversee all custom applications built in-house and out Major focus on customer experience based applications Utilize Agile SCRUM and Lean Kanban SDLC Manage single data center cage Private cloud environment Open source technology stack

Who is HauteLook

Private sale, members-only limited-time sale events Premium fashion and lifestyle brands at exclusive prices of 50-75% offOver 8 million active members Acquired by Nordstrom in 2011 Increased sales by over 60% in 2011 and on pace to do the same in 2012 Over 20 new sale events begin each morning at 8am PST

Why do we know about extreme traffic?

Every morning at 8am when our new sale events go live, we get the Black Friday equivalent spike in traffic

And on special event days (really big brands) we can have spikes that are 3x higher!

So how do you plan for spikes in traffic?

Measure everything! Free tools like Ganglia and Cacti Cacti is great for measuring single services and servers Ganglia is great for measuring clusters

Graphs galore!

The right people make all the difference

Experienced Systems Administrators (Infrastructure Gurus) will save you money in the long run Even to properly identify your bottlenecks and ensure your monitoring is right you need the right people They then need to evaluate how to solve your bottlenecks What solutions exist and what the cost Run the RFP process for things you need to buy Know the open source tools that can you get for free Implement your solutions And again monitor the results and work vendors to tweak until perfect

Build to scale horizontally

The only way to scale quickly and cheaply is horizontally Build and use smaller applications Virtualize your environment into small VMs Load Balance across many VMs Ensure your load balancer can quickly add and remove nodes Also ensure your load balancer can detect when a VM is operational

Advanced scaling
For extreme traffic scaling you need a cloud If you have a lot of server hardware available, build a private cloud
Scale up and down number of VMs running when reaching certain thresholds Have priority levels to allow certain VMs to even be shutoff if more resources are needed than are free Requires centralized storage to move VM Images around to hardware

If you dont, scale to the public cloud


Amazon AWS, Microsoft Azure, etc This requires Global Load Balancing But provides infinite growth potential Watch out for Latency

And/Or go Dynamic Site Acceleration


Akamai product to scale dynamic page caching to their edge network Almost infinite growth potential with no latency issues

Know your threshold

Set a threshold that your system should be able to expand to handle Keep raising the threshold as your traffic continues to grow At HauteLook that is 3x our last PEAK Plan for that threshold Buy for that threshold Test to that threshold Load testing is a must, dont trust that all things WILL scale like planned Identify your bottlenecks at scale

10

Most common bottlenecks

Bandwidth CPU Memory Hard Disk I/O

11

Bandwidth

Reserve your bandwidth for things that change Get a CDN to offload static and cached objects Get a burstable pipe (ex: commit to 1GigE but on a 10GigE port) Ensure you are billed on 95/5 for that pipe

12

CPU & Memory

CPU & Memory can be solved in much the same way Often can find huge savings in just tweaking your services
fastcgi.server = ( ".php" => ( "localhost" => ( "max-procs" => 4, "min-procs" => 1, "bin-environment" => ( "PHP_FCGI_CHILDREN" => "4", "PHP_FCGI_MAX_REQUESTS" => "5000"))))

Or switching to newer/better services


Apache vs. Lighttp vs. NGINX Mod_PHP vs. FastCGI PHP vs. PHP-FPM Websphere vs. Tomcat vs. Glassfish

Then tweak your applications Go VM to encapsulate your application environments to run leaner Lastly buy more physical hardware

13

Hard disk I/O

More hard drives


RAID-0 or RAID-10 Get a SAN, also will provide centralized storage to do private cloud EMC or NetApp

Faster hard drives


SSD is quickly coming down in price Either loaded in-server or in your SAN

Specialty hardware
Fusion-IO in-server flash storage Fusion-IO ioTurbine middleware flash caching between VM and SAN PureStorage Flash Array, all SSD SAN

14

Disaster Recovery

So you couldnt scale, what now? Put up a static page Were sorry!! Put up a queue system
Static page refreshes that slowly allow traffic through to your dynamic site Delivered from a CDN or the cloud Can be made to prioritize best customers using cookies (only if planned for in advance)

Dont let it happen again!

15

Thank you!

Thanks for taking the time with me today If you have questions, please email me kevin@hautelook.com

16

You might also like