You are on page 1of 6

International Conference and Workshop on Emerging Trends in Technology (ICWET 2011) – TCET, Mumbai, India

Distributed Database Caching for Web Applications and


Web Services
H K Mehta P Kanungo M Chandwani
School of Computer Science, Devi Patel Group of Institutions, Department of Computer Engineering
Ahilya University, Indore, India Ralamandal, Indore, India Institute of Engineering and
+919425077901 +919406623744 Technology
Devi Ahilya University, Indore, India
mehtahk@yahoo.com priyeshkanungo@hotmail.com +919303227853
Chandwanim1@rediffmail.com

ABSTRACT web applications use database at various level ranging from,


The demand and usage of web application are increasing simple search based use to display the information stored in the
exponentially. Generally, these web applications are dynamic in database, heavy database transaction, sites with heavy commercial
nature. These applications retrieve the data from databases based transaction involving secure transaction, web sites with
on user’s requests. The high demand servers receive thousands of multimedia content, web services develop to be accessed from the
requests in a second. This overloads the database and degrades the various web application etc. These different levels generate load
overall performance of the web applications. If the database and on the websites at different levels from lower to higher
web server are implemented on the same server, the database may respectively. Web services are web applications that use the hyper
become the bottleneck. Hence, these applications are implemented text transfer protocol (HTTP) for communication, similar to the
as multitier application. In multitier web applications and web web sites. However, the web services differ from the web
services the web based frontend, business logic and databases are application in the manner they have been accessed by the clients.
deployed over the different servers. Although this will introduce The web sites are implemented for human users while the web
the network delay as web server, application server and database services are intended to be used by web applications. Web
server are connected through network. To avoid the network delay services are not collection of static and dynamic web pages like
and to improve the performance of multitier applications a new the web sites, instead, they hosts various functionalities that are
distributed hash-based database cache implementation is accessible through the various HTTP client applications.
presented. Centralized control of the distributed cache is The smaller web applications are deployed over the single server
implemented for overall improved performance. running both the web server and database server. The web
application with large number of users and requests cannot be
Categories and Subject Descriptors effectively hosted on the single server as it may become
C.2.4 [Computer-Communication Networks]: Distributed overloaded in handling large number of requests and their
Systems – Distributed applications processing by including results from the database. To cope up
with this problem, the functionalities of these applications are
General Terms divided among the multiple tiers. Depending on the load on the
Design, Experimentation, Performance web application, each function may be deployed on a separate
server. In this manner, the specialized hardware may be used for
Keywords processing. Web applications divided into tiered form is called
multitier web application as shown in the Figure 1. Web
Database, Database Caching, DHT, Multitier Web Applications, application depicted in Figure 1 is a three-tier application. The
Web Service client sends the request to access some resource on web
application. This request is received by a frontend web server.
1. INTRODUCTION This web server then transfers this request to the application
The popular web applications have observed explosive growth in server having the business logic of the application. This
terms of number of requests from different users. Now a days, application server interacts with the database server to fulfill the
various kinds of applications are implemented as web
user’s request. This division of the requests into multiple levels
applications. These applications vary in the load they generate and
avoids the overloading of the single server. However, the multitier
the level of usage of the database in the request processing. These web applications introduces extra load on network bandwidth and
introduces the network delay as the frontend server, business logic
Permission to make digital or hard copies of all or part of this work for on application server and database servers are three different
personal or classroom use is granted without fee provided that copies are systems.
not made or distributed for profit or commercial advantage and that
copies bear this notice and the full citation on the first page. To copy
Web caching is a solution for the bandwidth and network delay
otherwise, or republish, to post on servers or to redistribute to lists, problems in the multitier applications. Frequently accessed data
requires prior specific permission and/or a fee. items are stored in the main memory of the either frontend web
ICWET’11, February 25–26, 2011, Mumbai, Maharashtra, India. server or the application server. The data item requested by the
Copyright © 2011 ACM 978-1-4503-0449-8/11/02…$10.00. client is first searched in the cache memory, if found then data is
510
International Conference and Workshop on Emerging Trends in Technology (ICWET 2011) – TCET, Mumbai, India

returned from the cache memory. If data is not available in cache 2. PROBLEM ADDRESSED
memory then the data is searched into the database. There are The multitier applications divide the responsibility of the web
several issues associated with the cache handling. These issues are applications into multiple layer called tiers. Each tier is handling a
selection of the candidate data to be cached, the location of the single task. This will improve the performance of each tier,
caching, freshness, consistency of the data etc. The candidate for thereby increasing the performance of overall web application.
caching must be the dynamic data that is changing but stable for However, this division introduces a new problem related to
some time period so that it is meaningful to be cached. Similarly, network delay. The database caching at some stage can reduce the
if the access frequency is too high then the data with short life actual database access to certain large extent. The database
span may also be cached. Ideally the data should be cached at the caching imposes several challenges for successful
nearest location to the users. However, the data security issues implementation. Most significant challenges are data invalidation,
must be considered while selecting the location of the caching. data expiration, caching granularity, data selection for caching and
The data freshness and consistency is the most challenging task in the location to store the cache.
the caching mechanism. An effective mechanism is required for
the identification of the data that have been changed or updated. The data invalidation means the freshness and consistency of the
This process is called cache validation. data. Data expiration means purging of unnecessary data from
cache memory either based on aging or using some data
The databases used in the multitier application are heavy weight replacement algorithm. Caching granularity means the providing
databases and therefore it is kept away from the application different level of caching the data e.g. option to cache complete
servers. The database caching is the process of the caching of data database table or caching of selected rows and columns of the
of the main database server into a light weight database at table. Data selection involves the selection of the data to be
application server. These light weight databases keep the cached so that the caching achieves high cache hit rate. The
frequently accessed data at the application server. This location of the cache is important for the data privacy and
implementation of database caching improves the throughput of security. The proper location of caching will affect the
the web application by avoiding the network overheads. The performance of the entire web application. The database is
distributed implementation of database caching can further accessed by the business logic stored at application server.
improve the performance of the web server by handling the bursty Therefore the database caching can be stored at application server.
load of the users’ requests. The distributed database caching can If request processing does not involve complex business logic
be implemented on web server or application cluster, where then database caching can also be done at web server level for best
several servers are being used for services users’ requests. The response time.
cache can be replicated at all the servers in the web cluster, which
means that the data is duplicated at all the servers. Another There are several alternatives to the database cache. These are
implementation can be division of the data among the servers in hardware cache available with CPU and caching provided by
the web clusters. database venders. However, these alternatives are having limited
size for data storage, while multitier web applications require
In this paper, a distributed hash based database caching solution is large spaces for caching.
presented that can be used with multitier web applications and
web services. Section 2 discusses the problem addressed in this The main memory databases (MMDB) are scalable than hardware
paper. Section three presents our solution. The experimentation cache and faster solution to the caching provided by database
and results are described in section 4. After discussion of related venders. MMDB requires efficient data structure to store large set
work in section 5, the section 6 presents concluding remark and of data required by multitier web applications. The hash table is
future directions. used for storing the data in the main memory. If the data set is too
large then a single hash table may be a bottleneck. Distributed
main memory database (DMMDB) can be used if the dataset is
Client large and highly demanding to avoid single point of failure and
bottleneck problems. The DMMDB keeps the caching records at
more than one node in either replicated or division mode.
DMMDB requires efficient request-to-server mapping mechanism
Internet for better response time.

3. THE DISTRIBUTED HASH-BASED


DATABASE CACHING AS A SOLUTION
The distributed hash-based database caching (DHBDC) is
Web Server presented here as a solution to the problems discussed in the
section 2. This solution is implemented using Java based
technologies. This is a generic caching policy that can be
configured to work with any database. However, in the
Application Server experimentation MySQL database has been used. The larger hash
tables are used as a database caching mechanism. The hash table
is distributed across in a server cluster. Here, both the replicated
and non replicated model are implemented for data caching where
Database Server user may select any option as configuration parameter.
Centralized caching control is implemented for efficient and better
Figure 1. The multitier web applications management of the caching [18].

511
International Conference and Workshop on Emerging Trends in Technology (ICWET 2011) – TCET, Mumbai, India

A
Client 1
P
P
W L
Client 2
E I
C
B
A
Client 3
T
S I
E O Database Caching
Client 4
R N Database Server

V
S
E E
R R
V
E MySQL
Client 50 Apache R Enterprise
Server
Apache Tomcat JAVA RMI Servers
Apache JMeter

Figure 2. The implementation architecture of DHBDC

In replication model, the central controller manages the load  Data expiration
distribution among the cache servers. In non-replicated model all  Centralized cache management
caching items are equally divided among the candidate servers.  Hash-table based cache for efficient execution
The central controller manages a distributed hash table (DHT) for  Replicated and non-replicated cache model
the mapping of cache item to the server containing the cached
items [4].
Architecture of the DHBDC is depicted in Figure 2. The client
3.1 Algorithm of DHBDC
request is sent to the frontend server. The request is then The algorithm of the DHBDC is presented in this subsection. The
transferred to the application server. The central cache controller user requests are received by application server and it fetches data
is also kept at application server. Now, application server uses from the database. DHBDC is implemented between the database
DHT to determine the cache node possibly having the data. The server and application server as follows:
cache lookup is performed at the cache node. If it is cache-hit, Algorithm
then the request is fulfilled from the cache. If cache lookup is 1. Receive the request from the application server and
cache-miss then the actual database is used to fetch the requested based on type of request if it is search or select request
data and the cache is updated with the data. If the cache is full go to step 2. if it is update request go to step 3.
then, the least recently used (LRU) algorithm is used for the cache 2. Performs the following
item replacement otherwise the data is updated in the cache [8]. a. Determine the cache server having the
The data expiration is also implemented based on the age or requested data using DHT.
Time-to-live (TTL) and the idle time. b. Prepare the request and retrieve the data from
The data update request is fulfilled from the cache level i.e. all the cache server, if data is not available in cache
updates are first performed at cache level and later on a then retrieve the data from database.
synchronization process is executed to finally update the database. c. Send back the data to the application server.
An update buffer is maintained for keeping the track of update 3. Performs the following
operation. The synchronization process is executed periodically as a. Determine the cache server having the
well as triggered on the event when the update buffer is full. In requested data using DHT.
this manner, no explicit data invalidation is required in the b. Send the update request to cache server.
DHBDC and data consistency and freshness is maintained. All c. Cache server locally updates the data. The
this is effectively manageable because of centralized cache final update will be sent to actual database
management. Various features supported are either periodically or when buffer is full of
updates.
 Centralized and distributed cache implementation End of the Algorithm

512
International Conference and Workshop on Emerging Trends in Technology (ICWET 2011) – TCET, Mumbai, India

4. EXPERIMENTAIONS AND ANALYSIS The cache is implemented in two manners viz. the distributed and
the standalone cache at application server. In case of distributed
OF RESULTS cache four servers are used in the cache cluster. On the other hand
Extensive experimentations are performed to test the validity and the standalone cache is created at application server only.
practicality of proposed approach. The experiments are performed
using open source software. The MySQL database is used along A website is tested in this environment this website is developed
with JEE, Apache web server and Apache Tomcat application using Servlet and JSP interacting with MySQL database. The
server [10, 5, 2, 1]. The requests are generated using a load testing static part of the website is deployed on Apache web server. The
tool named as Apache JMeter [7]. Following two subsections dynamic part along with business logic is deployed on Apache
describe the experimental setup and analysis of the results Tomcat application server. The web site is an online book store
obtained. deployed over the web server and the application server. The
database consists of the following five tables:
4.1 Experimental Setup
An experimental setup is created consisting of seven servers.  Customer: Table having the details regarding the customers.
Separate servers are used as web server, application server and  Country: The table contains country and their economy
database server. Remaining four servers are used for building information.
cache cluster. The cache central controller uses remote method
invocation (RMI) to interact with the servers in cache cluster [15].  Order: The table keeps the details of orders placed by the
All the servers use RedHat Enterprise Linux Advance Server 4.0 customer.
operating system. Table 1 summarizes the details of experimental
 Order_Line: The table contains the detail of the books
setup.
included in the order.
The JMeter is configured to generate requests from 50 different
clients (Threads). Each client will generate 1000 request. Total  Author: The table stores information about the authors of
50,000 requests are generated for testing. The request to fetch the the books.
details of the order by joining three tables namely Customer,
Order, Order_Line. The searching is performed on the total
4.2 Analysis of Results
The graph in Figure 3 is plotted from the results of the
1,00,000 orders. Total ten repetition of the same experimentation
experiments performed. The graph is plotted between four points
are performed. The average processing time of all the requests
namely DCH, DCM, ASCH, ASCM. The DCH and DCM
generated for an experiment is used in the graph. The JMeter
represent the values when the distributed cache is used and data
sends the request to the web server.
hit (DCH) in cache and data miss in cache (DCM) is occurred
respectively. In case of cache miss, the data is retrieved from
database server. Another two points contains the results when the
Table 1. Experimental Setup of Seven Servers standalone cache is implemented at application server. The ASCH
Server Software Hardware represents the application server cache hit and ASCM represents
application server cache miss. Again the cache miss requests are
Web Server Apache IBM server having Intel sent to the database server.
web server dual Quad Core
processor with 4 GB From the results, it is observed that around 38% performance
RAM. enhancement is achieved if it is a cache hit in distributed cache
instead of the retrieval of the data from the database. The retrieval
Application Apache IBM server having Intel from database involves additional request generation to database
Server Tomcat dual Quad Core over the network.
processor with 4 GB
RAM.

Database MySQL IBM server having Intel


Server database dual Quad Core
processor with 8 GB
RAM.

Caching Java RMI IBM server having Intel


Cluster dual Quad Core
(Four processor with 8 GB
Servers) RAM.

Client JMeter HP Desktop having


Machines Intel Core 2 Due, 2.80
GHz processor with
2GB.
Figure 3. Results of experiments

513
International Conference and Workshop on Emerging Trends in Technology (ICWET 2011) – TCET, Mumbai, India

The results are similar if the standalone cache is maintained at the and DBCache are proprietary products that can only work with
application server. In case of cache miss the response time in both specific databases [3].
the cases is higher since it involves the database operation on the
network. The worst case result is obtained if there is a cache miss 6. CONCLUSION
in the cache maintained at application server, due to the additional This paper presents a distributed hash-based database caching
responsibility of interaction with database along with the business mechanism for multitier web applications and web services. This
logic processing at application server. is a generic cache policy that can work with any database. The
centralized cache control is implemented for better overall
5. RELATED WORK performance. The database products are having built-in cache
There exists a large body of research work on various caching implementation. However, these built-in caching in not a scalable
techniques. However, the focus of this paper is database caching. solution while DHBDC presented here is a scalable solution with
Few research work and industrial products are available to better performance. Extensive experimentations are performed to
improve the performance of web-based database applications test the validity of the proposed method. Around 38%
using database caching. However, most of these databases caching enhancement is observed from the experimentation performed.
work mainly focused on the specific databases.
In future, it is planned to test it on TPC-E [19] online truncation
Memcached is among the most popular object caching tool. This processing workload benchmark, RUBBoS [17], RUBiS [16]. All
tool uses large hash tables for caching of the objects. Memcached these benchmark are now part of Java Middleware Open
stores strings and objects from the results of database calls, library Benchmarking (JMOB) [6]. It is also proposed to deploy the
calls and rendering of web pages. There is no centralized control DHBDC on some real online application.
in Memcached [9].
Oracle Times Ten In-Memory database is also another popular 7. REFERENCES
product providing faster access to the relational databases using [1] Apache Tomcat Application Server,
the same programming interface. Time ten improves performance http://tomcat.apache.org/.
because all the data is stored in the memory and no disk based [2] Apache Web Server, http://www.apache.org/.
operations are required to access the data. Times ten can
[3] Bornhövd, C., Altinel, M., Krishnamurthy, S., Mohan, C.,
optionally be configured to store checkpoint and logging data to
Pirahesh, H. and Reinwald, B. 2003. DBCache: Middle-tier
the disk [12, 20].
Database Caching for Highly Scalable e-Business
Paul and Fei have presented a distributed architecture for caching Architectures. In Proceedings of ACM SIGMOD
with centralized control (DEC3). They have evaluated this International Conference on Management of Data (San
architecture using three parameters namely hit ratio, response time Diego, California, USA, June, 2003). SIGMOD’03, 662-662.
and the traffic on backbone link. They demonstrated that the
[4] Distributed Hash Tables, Available Online at,
centralized control in this architecture utilizes the caching
http://en.wikipedia.org/wiki/Distributed_hash_table.
resource to improve the above parameters. However, they have
implemented the caching for the web objects like HTML pages [5] Java Enterprise Edition, http://java.sun.com/javaee/.
and images [18]. [6] Java Middleware Open Benchmarking (JMOB),
Tolia and Satyanarayanan have implemented hash-based http://jmob.ow2.org/jvm.html.
technique for caching of database results to improve the response [7] JMeter Load Testing Tool, http://jakarta.apache.org/jmeter/.
time and throughput of the web applications. They named the
project as Ganesh having two components. The first component of [8] Least Recently Used, Available Online at,
Ganesh is called Ganesh Java JDBC driver that maintains an in- http://en.wikipedia.org/wiki/Cache_algorithms#Least_Recenl
memory cache for the previous query result. The second Ganesh y_Used.
component is a Ganesh proxy that uses original JDBC driver to [9] Memcached Distributed Memory Cashing System,
connect to database. The proxy is also having mechanism to http://memcached.org/.
detect the similarity in the query results. In this manner Ganesh do [10] MySQL Enterprise Server, http://www.mysql.com.
not affect web server, application server and database server at all
[11]. Although Ganesh is hash-based but this is not the database [11] Tolia, N. and Satyanarayanan, M. 2007. No-Compromise
caching mechanism it caches the part response of dynamic web Caching of Dynamic Content from Relational Databases. In
pages. Proceedings of 16th International World Wide Web
Conference (Banff, Canada, May 2007). WWW’07, 311-320.
Larson et al. have developed a mid-tier database caching policy
for Microsoft SQL Server database called MTCache. They [12] Oracle Times Ten In-Memory Database,
developed this caching policy to improve the response time and http://www.oracle.com/database/timesten.html.
system throughput by sharing some of the load of database server [13] Larson, P.A., Goldstein, J. and Zhou J. MTCache:
with the cache servers. MTCache can only work with Microsoft Transparent Mid-Tier Database Caching in SQL Server. In
SQL Server not with the other databases however, the DHBDC is Proceedings of 20th International Conference on Data
a generic policy that can be configured for any database [13]. Engineering (Boston, Mar-Apr 2004). ICDE’04, 177-189.
DBCache is another proprietary product developed by IBM for [14] Luo, Q., Krishnamurthy, S., Mohan, C., Woo, H., Pirahesh,
their DB2 UDB database. This is also a middle-tier database H., Lindsay B. and Naughton, J. 2002. Middle-tier Database
caching policy [14]. This will support the data access from cache Caching for e-Business. In Proceedings of ACM SIGMOD
or database or both in a distributed manner. Times-Ten, MTCache International Conference on Management of Data,
(Madison,WI, Jun 2002). SIGMOD’02, 600-611.

514
International Conference and Workshop on Emerging Trends in Technology (ICWET 2011) – TCET, Mumbai, India

[15] Remote Method Invocation, http://java.sun.com/


javase/technologies/core/basic/rmi/index.jsp.
[16] Rice University Bidding System (RUBiS),
http://rubis.ow2.org/.
[17] RUBBoS: Bulletin Board Benchmark
http://jmob.ow2.org/rubbos.html.
[18] Paul, S., and Fei, Z. 2001. Distributed Caching with
Centralized Control. Computer Communications, Elsevier
Press. 24, 2 (Feb 2001), 256-268.
[19] TCP-E Online Transaction Processing Benchmark,
http://www.tpc.org/tpce/.
[20] TimesTen Team. 2002 Mid-Tier Caching: The TimesTen
Approach. In Proceedings of ACM SIGMOD International
Conference on Management of Data (Madison,WI, Jun
2002), SIGMOD’02, 588-593.

515

You might also like