You are on page 1of 34

Building Scalable, Global, and

Highly Available Web Apps


Name
Title
Microsoft Corporation
Agenda

Design for High Availability


Design for High Scalability
Design for Performance
Assumptions
You know the basics
Windows Azure Web/Worker Roles
SQL Database
Windows Azure Storage
Asynchronous Programming
Windows Azure diagnostics

You have deployed a service to Windows Azure


Everything can and will (eventually) break
Availability
Why do services fail?
Increased workload
Failure
Hardware
Network
Platform Service
Transient conditions

Human
Upgrades
What do we mean by available?
Same functionality
Degraded functionality
Failsafe
Basics – what you get for free
Elasticity
Easily deploy compute resources and scale up and down

Automated Service Management


Windows Azure will (automatically) recover bad nodes

Fault Domains
Windows Azure deploys services across fault boundaries

Storage Resilience
3 copies of storage maintained
Fault Tolerance
When Windows Azure breaks, it fixes itself!
Can your service?

Codifying Operations
Upgrade Domains
Configure in ServiceDefinition.csdef
<ServiceDefinition name="RedDir"xmlns="http://schemas.microsoft.com/
ServiceHosting/2008/10/ServiceDefinition" upgradeDomainCount="3">

Transient Datacenter Conditions


Do you have Retry Logic?
What did you mean, retry logic?
Transient conditions in the datacenter/network/service
Example:
SQL Azure Error 40501
The service is currently busy. Retry the request after 10 seconds.

Transient Fault Handling Framework


http://windowsazurecat.com/2011/02/transient-fault-handling-framework/
Retry against anything that might be external
and have transient conditions*:
SQL Database
Windows Azure Storage
Service Bus
3rd Party Services
Retry

demo
Service Specific Implementations
Does your service fail without that
platform service?
Can your service use the same platform
services from another data center?
Can your service not use that platform
service temporarily?
Site Failover
If a site specific dependency is out,
fail over to another site
Easy: Use Traffic Manager
Hard: Code your own
Site Failover

demo
Upgrade Strategies: VIP Swap
V1
DNS Foo.cloudapp.net
Load Balancer
foo.com (Production)

V2

GUID.cloudapp.net
Test (Staging)
Upgrade Strategies: Upgrade
DNS Foo1.cloudapp.net
Load Balancer
foo.com (Production)

WEB WORKER WEB WORKER

V1 V1 V2 V2

V1 V1 V2 V2

V1 V1 V2 V2
Upgrade Strategies
New Service & Swap DNS

Foo1.cloudapp.net
(Production)
DNS
foo.com
Foo2.cloudapp.net
(Production)
Scalability
What is wrong with this?

Web Role SQL Scale me


Database out too

n 1

It is better to have 50 x 1GB database than 1 x 50GB database


What about this?
Everything needs to scale
Load Balancer

SQL
Web Role
Database

Table
Storage
Worker Role
Q
Blob
Storage
Synchronous Design Pattern
Each thread dedicated to one outstanding request
Block on each step of “the work” done for each request, then respond & repeat

Web App Front End “The Work” #1 SQL Azure


Client Request #1

Client Response #1 Response #1 Middle Tier


Thread Thread
blocks
Waiting… WA Storage
Client Request #2 Time passes…

This approach scales poorly


Each outstanding request is stored on a thread stack
Threads block even when there is work to be done
Adding a thread enables only one additional concurrent request
Asynchronous Design Pattern
Each thread picks up work whenever it is ready
A thread handling one request may handle another before the first one completes

Web App Front End “The Work” #1 SQL Azure


Client Request #1
Context
Client Response #1 Response #1 Middle Tier
Client Request #2 Thread Thread “The Work” #2
WA Storage
Client Response #2 Response #2

This approach scales well


Client requests tracked explicitly in app’s data structures
Threads never block while there is work to be done
Each thread can handle possibly many concurrent requests

But bookkeeping & synchronization can be difficult…


Performance
What’s Windows Azure Cache?
• Use spare memory on your VMs as high-
performance cache
• Distributed cache cluster co-located with existing
roles, or use dedicated roles
• Named caches with high availability option and
notifications
• Support Memcached protocol
Why Windows Azure Cache?
Faster
No external service calls (additional network hops)
Co-located in roles

Cheaper
No external service calls (additional cost)
Use spare memory that you already paid for

More reliable
Your service is running = cache is available
No throttling as in cotenant environment
Cache

demo
Why Performance Matters
More responsive applications
Faster page load times
8 seconds vs. 3 seconds?

Higher interactivity – new type of applications


Better user experience – more $$$
Thinking Globally
Network latency
Put compute closer to user.
Put data closer to user.

Global availability
Datacenter outages.
Synchronizing data.
Network Latency
Content Delivery Network (CDN)
High-bandwidth global blob content delivery
24 locations globally (US, Europe, Asia, Australia and South America), and growing
Same experience for users no matter how far they are from the geo-location where
the storage account is hosted

Blob service URL vs CDN URL:


Windows Azure Blob URL: http://images.blob.core.windows.net/
Windows Azure CDN URL: http://<id>.vo.msecnd.net/
Custom Domain Name for CDN: http://cdn.contoso.com/
Windows Azure CDN
GET
http://guid01.vo.msecnd.net/images/pic.1jpg

Edge
Location
404
To Enable CDN: Edge
Location
Edge
Location

Register for CDN via Dev Portal TTL Content


http://sally.blob.core.windows.net/ Delivery
Set container images to public 
http://guid01.vo.msecnd.net/ Network

pic1.jpg
pic1.jpg
pic1.jpg Windows
Azure
http://sally.blob.core.windows.net/images/pic1.jpg
Blob
Service
Windows Azure Traffic Manager
Direct users to the service in the closest region
with the Windows Azure Traffic Manager

Traffic Manager foo-us.cloudapp.net


foo.cloudapp.net

Policies Monitoring foo-europe.cloudapp.net


DNS response
1.2.3.4
foo-asia.cloudapp.net
Traffic Manager

demo
Summary
Windows Azure gives you high
availability capabilities for free
Think about scaling out
Handle transient conditions

Design for scalability


Asynchronous pattern
Scale out
Design for maximum
performance & reach
Caching, CDN, Traffic Manager, etc.
© 2011 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries.
The
The information
information herein
herein is
is for
for informational
informational purposes
purposes only
only and
and represents
represents the
the current
current view
view of
of Microsoft
Microsoft Corporation
Corporation asas of
of the
the date
date of
of this
this presentation.
presentation. Because
Because Microsoft
Microsoft must
must respond
respond to
to changing
changing market
market conditions,
conditions, itit should
should not
not be
be interpreted
interpreted to
to be
be aa commitment
commitment on
on the
the part
part of
of
Microsoft,
Microsoft, and
and Microsoft
Microsoft cannot
cannot guarantee
guarantee the
the accuracy
accuracy of
of any
any information
information provided
provided after
after the
the date
date of
of this
this presentation.
presentation. MICROSOFT
MICROSOFT MAKES
MAKES NO
NO WARRANTIES,
WARRANTIES, EXPRESS,
EXPRESS, IMPLIED
IMPLIED OR
OR STATUTORY,
STATUTORY, AS AS TO
TO THETHE INFORMATION
INFORMATION IN IN THIS
THIS PRESENTATION.
PRESENTATION.

You might also like