• Embed Doc
  • Readcast
  • Collections
  • CommentGo Back
Download
 
 John Dilley,Bruce Maggs, Jay Parikh,Harald Prokop,Ramesh Sitaraman,and Bill Weihl
 Akamai Technologies
Globally DistributedContent Delivery
 Using more than 12,000 servers in over 1,000 networks,Akamai’s distributed content delivery system fights service bottlenecks and shutdowns by delivering content from theInternet’s edge.
A
s Web sites become popular,they’re increasingly vulnerable tothe flash crowd problem, in whichrequest load overwhelms some aspect of the site’s infrastructure, such as the front-end Web server, network equipment, or bandwidth, or (in more advanced sites) theback-end transaction-processing infra-structure. The resulting overload can crasha site or cause unusually high responsetimes — both of which can translate intolost revenue or negative customer atti-tudes toward a product or brand.Our company, Akamai Technologies,evolved out of an MIT research effortaimed at solving the flash crowd problem(www.akamai.com/en/html/about/history.html). Our approach is based on the obser- vation that serving Web content from asingle location can present serious prob-lems for site scalability, reliability, andperformance. We thus devised a system toserve requests from a variable number of surrogate origin servers at the networkedge.
1
By caching content at the Internet’sedge, we reduce demand on the site’sinfrastructure and provide faster servicefor users, whose content comes from near-by servers. When we launched the Akamai systemin early 1999, it initially delivered only Web objects (images and documents). Ithas since evolved to distribute dynami-cally generated pages and even applica-tions to the network’s edge, providingcustomers with on-demand bandwidthand computing capacity. This reducescontent providers’ infrastructure require-ments, and lets them deploy or expandservices more quickly and easily. Our current system has more than 12,000servers in over 1,000 networks. Operat-ing servers in many locations posesmany technical challenges, includinghow to direct user requests to appropri-ate servers, how to handle failures, howto monitor and control the servers, andhow to update software across the sys-
50
SEPTEMBER • OCTOBER 2002
http://computer.org/internet/
1089-7801/02/$17.00©2002 IEEEIEEEINTERNETCOMPUTING
   G   l  o   b  a   l    D  e  p   l  o  y  m  e  n   t  o   f   D  a   t  a   C  e  n   t  e  r  s
 
tem. Here, we describe our system and how we’vemanaged these challenges.
Existing Approaches
Researchers have explored several approaches todelivering content in a scalable and reliable way.Local clustering can improve fault-tolerance andscalability. If the data center or the ISP providingconnectivity fails, however, the entire cluster isinaccessible to users. To solve this problem, sitescan offer mirroring (deploying clusters in a fewlocations) and multihoming (using multiple ISPsto connect to the Internet). Clustering, mirroring,and multihoming are common approaches for siteswith stringent reliability and scalability needs.These methods do not solve all connectivity prob-lems, however, and they do introduce new ones:
I
It is difficult to scale clusters to thousands of servers.
I
 With multihoming, the underlying networkprotocols — in particular the border gatewayprotocol (BGP)
2
— do not converge quickly tonew routes when connections fail.
I
Mirroring requires synchronizing the siteamong the mirrors, which can be difficult.In all three cases, excess capacity is required: Withclustering, there must be enough servers at eachlocation to handle peak loads (which can be anorder of magnitude above average loads); withmultihoming, each connection must be able tocarry all the traffic; and with mirroring, each mir-ror must be able to carry the entire load. Each of these solutions thus entails considerable cost,which could more than double a site’s initial infra-structure expense and ongoing operation costs.The Internet is a complex fabric of networks.Congestion and failures occur at many places,including
I
the “first mile” (which is partially addressed bymultihoming the origin server),
I
the backbones,
I
peering points between network serviceproviders, and
I
the “last mile” to the user.Deploying independent proxy caches throughoutthe Internet can address some of these bottlenecks.Transit ISPs and end-user organizations haveinstalled proxy caches to reduce latency and band-width requirements by serving users directly froma previously requested content cache. However, Web proxy cache hit rates tend to be low — 25 to40 percent — in part because Web sites are usingmore dynamic content. As a result, proxy cacheshave had limited success in improving Web sites’scalability, reliability, and performance. Akamai works closely with content providers todevelop features that improve service for their Websites and to deliver more content from the networkedge. For example, features such as authorization,control over content invalidation, and dynamic con-tent assembly let us deliver content that would oth-erwise be uncacheable. Although ISP caches couldinclude similar features, to be useful they wouldhave to standardize the features and their imple-mentation across most cache vendors and deploy-ments. Until such a feature is widely deployed, con-tent providers have little incentive to use it. Because Akamai controls both its network and software, wecan develop and deploy features quickly.
Akamai’s Network Infrastructure
 Akamai’s infrastructure handles flash crowds byallocating more servers to sites experiencing highload, while serving all clients from nearby servers.The system directs client requests to the nearestavailable server likely to have the requested con-tent. It determines this as follows:
I
 Nearest 
is a function of network topology anddynamic link characteristics: A server with alower round-trip time is considered nearer thanone with a higher round-trip time. Likewise, aserver with low packet loss to the client is near-er than one with high packet loss.
I
 Available
is a function of load and networkbandwidth: A server carrying too much load or a data center serving near its bandwidth capac-ity is unavailable to serve more clients.
I
Likely 
is a function of which servers carry thecontent for each customer in a data center: If all servers served all the content — by round-robin DNS, for example — then the servers’ diskand memory resources would be consumed bythe most popular set of objects.In the latter case, an Akamai site might hold adozen or more servers within any data center; thesystem distributes content to the minimum num-ber of servers at each site to maximize systemresources within the site.
Automatic Network Control
The direction of requests to content servers isreferred to as mapping. Akamai’s mapping tech-
IEEEINTERNETCOMPUTING
http://computer.org/internet/
SEPTEMBER • OCTOBER 2002
51
Content Delivery 
 
nology uses a dynamic, fault-tolerant DNS system.The mapping system resolves a hostname based onthe service requested, user location, and networkstatus; it also uses DNS for network load-balancing.The “DNS Resolution” sidebar describes the stan-dard DNS resolution process for an Akamai edgeserver name, corresponding to steps 1 and 2 in Fig-ure 1. In step 3, the client makes an HTTP requestto the edge server, which then retrieves the contentby requesting it from either another Akamai server or the content provider’s server (step 4). The server then returns the requested information to the clientand logs the request’s completion. Akamai name servers resolve host names to IPaddresses by mapping requests to a server usingsome or all of the following criteria.
I
Service requested
. The server must be able tosatisfy the request. The name server must notdirect a request for a QuickTime media streamto a server that handles only HTTP.
I
Server health
. The content server must be upand running without errors.
I
Server load
. The server must operate under acertain load threshold and thus be available for additional requests. The load measure typical-ly includes the target server’s CPU, disk, andnetwork utilization.
I
 Network condition
. The client must be able toreach the server with minimal packet loss, andthe server’s data center must have sufficientbandwidth to handle additional networkrequests.
I
Client location
. The server must be close to theclient in terms of measures such as networkround trip time.
I
Content requested
. The server must be likely tohave the content, according to Akamai’s con-sistent hashing algorithm.Internet routers use BGP messages to exchange net-work reachability information among BGP systemsand compute the best routing path among the Inter-net’s autonomous systems.
2
 Akamai agents com-municate with certain border routers as peers; themapping system uses the resulting BGP informationto determine network topology. The number of hopsbetween autonomous systems is a coarse but usefulmeasure of network distance. The mapping systemcombines this information with live network statis-tics — such as traceroute data
3
— to provide adetailed, dynamic view of network structure andquality measures for different mappings. Imple-menting this mapping system on a global scaleinvolves several challenges, as we discuss later.
Network Monitoring
Our DNS-based load balancing system continuous-ly monitors the state of services, and their serversand networks. Each of the content servers — for theHTTP, HTTPS, and streaming protocols — frequent-ly reports its load to a monitoring application,which aggregates and publishes load reports to thelocal DNS server. That DNS server then determineswhich IP addresses (two or more) to return whenresolving DNS names. If a server’s load exceeds acertain threshold, the DNS server simultaneouslyassigns some of the server’s allocated content toadditional servers. If the load exceeds another threshold, the server’s IP address is no longer avail-able to clients. The server can thus shed a fractionof its load when it is experiencing moderate to highload. The monitoring system also transmits datacenter load to the top-level DNS resolver to directtraffic away from overloaded data centers.To monitor the entire system’s health end-to-end, Akamai uses agents that simulate end-user behavior by downloading Web objects and measuring their failure rates and download times. Akamai uses thisinformation to monitor overall system performanceand to automatically detect and suspend problem-atic data centers or servers.In addition to load-balancing metrics, Akamai’smonitoring system provides centralized reportingon content service for each customer and contentserver. This information is the basis of Akamai’sreal-time customer traffic analyzer application.The information is useful for network operationaland diagnostic purposes, and provides real-time
52
SEPTEMBER • OCTOBER 2002
http://computer.org/internet/
IEEEINTERNETCOMPUTING
Global Deployment of Data Centers
Content providerTop-level DNSUser321
Internet4
Figure 1.Client HTTP content request.Once DNS resolves the edgeserver’s name (steps 1 and 2),the client request is issued to the edgeserver (step 3),which then requests content (if necessary) from thecontent provider’s server,satisfies the request,and logs its completion.
of 00

Leave a Comment

You must be to leave a comment.
Submit
Characters: ...
You must be to leave a comment.
Submit
Characters: ...