• Embed Doc
  • Readcast
  • Collections
  • CommentGo Back
Download
 
SwaminathanSivasubramanian,Guillaume Pierre,and Maarten van Steen
Vrije Universiteit
Gustavo Alonso
ETH Zurich
Analysis of Cachingand Replication Strategiesfor Web Applications
Developers often use replication and caching mechanisms to enhance Webapplication performance. The authors present a qualitative and quantitative analysisof state-of-the art replication and caching techniques used to host Webapplications. Their analysis shows that selecting the best mechanism dependsheavily on data workload and requires a careful review of the application’scharacteristics. They also propose a technique for Web practitioners to comparedifferent mechanisms’ performance on their own.
W
eb sites can be slow for manyreasons, but the most prevalentone is the dynamic generation of  Web documents. Modern Web sites suchas Amazon.com and Slashdot.org don’tdeliver static pages — they generate con-tent on the fly each time they receive arequest, customizing their pages for eachuser. Clearly, generating a Web page inresponse to every request takes more timethan simply fetching static HTML pagesfrom a server. Dynamic generation of a Web page typically requires issuing oneor more queries to a database, so accesstimes to the database can easily get out of hand when the request load is high.Industry and academia have developedseveral techniques to overcome this prob-lem. The most straightforward one is
Web page caching
, in which (fragments of) theHTML pages the application generates arecached to serve future requests.
1
Content-delivery networks (CDNs) such as Akamaido this by deploying edge servers aroundthe Internet to locally cache Web pagesand then deliver them to clients. By deliv-ering pages from edge servers located closeto the clients, CDNs reduce each request’snetwork latency. Page-caching techniqueswork well if the same cached HTML pagecan answer many requests to a particular  Web site. These techniques have proven tobe effective,
1,2
but with the growing drivetoward personalized Web content, gener-ated pages tend to be unique for each user,thereby reducing the benefits of page-caching techniques.Page caching’s limitations have trig-gered the CDN and database research com-munity to investigate new approaches for 
60 Published by the IEEE Computer Society 1089-7801/07/$25.00 © 2007 IEEE IEEE INTERNET COMPUTING
   E  n  g   i  n  e  e  r   i  n  g   t   h  e   W  e   b   T  r  a  c   k
 
scalable Web application hosting. We can classifythese approaches broadly into four techniques:application code replication,
3
cache database re-cords,
4,5
cache query results,
6
and entire databasereplication.
7,8
 Although numerous research effortshave focused on these approaches, very few workshave analyzed their pros and cons and examinedtheir performance. This article’s objective is to pres-ent an overview of various scalability techniquesand compare and analyze their features and per-formance. To do so, let’s first consider some well-known scaling techniques for Web applications.
Techniques to Scale Web Applications
Instead of caching the dynamic pages generatedby a central Web server, various techniques aim toreplicate the
means
of generating pages over mul-tiple edge servers. Despite their differences, thesetechniques often rely on the assumption that appli-cations don’t require strict transactional semanticsfor their data accesses (as, for example, bankingapplications do). They typically provide “read- your-writes” consistency, which guarantees thatwhen an application at an edge server performs anupdate, any subsequent reads from the same edgeserver will return that update’s effects (and possi-bly others). Scalable techniques that provide trans-actional semantics are beyond this article’s scope.
Edge Computing
The simplest way to generate user-specific pages isto replicate the application code at multiple edgeservers and keep the data centralized (see Figure 1a).This technique is the heart of the edge computing(EC) products at Akamai and ACDN.
3
EC lets eachedge server generate user-specific pages accordingto context, session, and information stored in thedatabase, thereby spreading the computational loadacross multiple servers. However, this data central-ization can also pose several problems. First, if theedge servers are located worldwide, each data accessincurs wide-area network (WAN) latency; second,the central database quickly becomes a performancebottleneck because it needs to serve the entire sys-tem’s database requests. These properties restrictEC’s use to Web applications that require relativelyfew database accesses to generate content.
Data Replication
The solution to EC’s database bottleneck problem isto place the data at each edge server so that gener-ating a page requires only local computation anddata access. Database replication (REPL) techniquescan help here by maintaining identical copies of thedatabase at multiple locations.
7–9
However, in Webenvironments, database replicas are typically locat-ed across a WAN, whereas most REPL techniquesassume the presence of a local-area network (LAN)between replicas. This can be a problem if a Webapplication generates many database updates. If thishappens, each update must be propagated to all theother replicas to maintain the replicated data’s con-sistency, potentially introducing a huge networktraffic and performance overhead. In our study, wedesigned a simple replication middleware solutionthat serializes all updates at an origin server andpropagates them to the edges in a lazy fashion. Theedge servers answer each read query locally.
Content-Aware Data Caching
Instead of maintaining full copies of the databaseat each edge server, content-aware caching (CAC)systems cache database query results as the appli-cation code issues them. In this case, each edgeserver maintains a partial copy of the database,and each time the application running at the edgeissues a query, the edge-server database checks if itcontains enough data locally to answer the querycorrectly. This process is called a
query contain-ment check
. If the containment check result is pos-itive, the edge server can execute the query locally;otherwise, it must be sent to the central database.In the latter case, the edge-server database insertsthe result in its local database so that it can servefuture identical requests locally. To insert cachedtuples into the edge database, the caches createinsert/update queries on the fly. Examples of CACsystems include DBCache
5
and DBProxy.
4
CAC stores query results in a storage-efficientway — for example, the queries “
select * fromitems where price < 50
” (Q1) and “
select *from items where price < 20
” (Q2) have over-lapping results. By inserting both results into thesame database, the overlapping records are storedonly once. Another interesting feature of CAC is thatonce Q1’s results are stored, Q2 can execute locally,even though that particular query hasn’t been issuedbefore. In this case, the query containment proce-dure recognizes that Q2’s results are contained inQ1’s results. CAC systems are beneficial when theapplication’s query workload has range queries or queries with multiple predicates (for example, tofind items that satisfy <
clause
1> OR <
clause
2>).
JANUARY FEBRUARY 2007 61
Strategies for Web Applications
of 00

Leave a Comment

You must be to leave a comment.
Submit
Characters: ...
You must be to leave a comment.
Submit
Characters: ...