You are on page 1of 5

JOURNAL OF COMPUTING, VOLUME 3, ISSUE 7, JULY 2011, ISSN 2151-9617 HTTPS://SITES.GOOGLE.COM/SITE/JOURNALOFCOMPUTING/ WWW.JOURNALOFCOMPUTING.

ORG

86

Object Oriented Caching Mechanism for Efficient Computing over Web


Dr. Pushpa R. Suri and Harmunish Taneja
Abstract One of the major inventions of the 1990s is the World Wide Web (WWW). Since its advent in 1991, the Web has evolved as global interconnection of individual networks operated and used by both public and private sectors. Its origin can be traced back when Internet was the tool to interconnect laboratories busy in government research. WWW has been exponentially expanding reaching every remote corner of the earth. The web is not alone to expand; the rising load on the typical web server accompanies it. This rapid extension comes as a package of facilities and limitations. The major problem is of growing network traffic leading to congestion. It can be relaxed if requested documents are cached. From the users point of view, the response time for the query should be low as if independent of increased web content demands. Caching is the most ubiquitous mechanism in modern information computing over web for improving application performance and scalability. Conventional web based caching mechanism prove deficient when it comes to shared information access. Added functionality and flexibility are the major attractions of object oriented technology. Object oriented framework reduces the response time and complexity of computation. In this paper object oriented caching mechanism is proposed that promises flexibility and effectiveness in web. Index Terms Web Objects, Caching, Object Oriented (OO), Information Retrieval.

1 INTRODUCTION
Caching facilitates a web application with previously generated object instances stored in the cache, and thereby decreasing the overhead of generating fresh object instances to supply each new request. Traditionally cache concept is exploited in file systems but it does not perform well for the WWW traffic [1][2]. Information on web has a different granularity. File system deal with fixed size blocks of data. But web servers read and cache entire files of variable size complicating the memory management in web server caches. Caching is not mandatory in web servers and major issue in framing a cache policy is to shortlist the information that when cached will lead to increased cache performance. Web servers experience sequential scans of a large number of data items in cases of communication failures or browser cache malfunctioning only. Cached web objects enjoy faster retrieval than those on disk. But it is impossible to cache all the content desired by user on web. Web caching leads to faster delivery of web objects at lower bandwidth consumption and therefore helps to reduce the load on the website servers. Web caching is carried at three levels of hierarchy [1][3][4][5][6], i.e., at the browser, proxy and server levels. The browser and proxy level caching reduce the network traffic and the average delay experienced by the user. At the server end, caching reduce the time taken to fetch the web object from the file system. Primary and proxy web
Dr. Pushpa R. Suri is working as Associate Professor in the department of Computer Science and Applications at Kurukshetra University, Kurukshetra. Harmunish Taneja is a Research Scholar at Kurukshetra University Kurukshetra and working as Asstt. Prof. in the Department of Information Technology, Maharashi Markandeshwar University, Mullana, Haryana India.

servers cache have significant difference in working. Proxy servers cache huge set of requested web information on a proxy servers disk only while primary web servers need to cache a comparatively small set of local documents in main memory. The popular web browsers such as internet explorer, Netscape navigator cache the web pages very frequently. Time stamps determine the life time of a web page in the cache. Traditional web caching techniques fail to support the dynamic content and are dependent on having cache tags set properly on web objects. Hence web caches treat most objects as being non-cacheable, thereby reducing cache efficiency. Also it suffers coherence problems and proves incapable for future network loads. The OO approaches have been used in wide variety of application areas. The selection of web objects to be cached is an important decision as application performance is major ruled by this. This problem is formulated by incorporating the concept of association in terms of inheritance among web objects. Remainder of the paper is organised as follows. Section 2 presents the models of caching structure and comparative study of web cache replacement algorithms. Section 3 motivates the concept of Object Oriented (OO) cacheability decision for enhanced computing over web and presents the OO web cache framework and algorithm. Finally, section 4 concludes the paper.

2 RELATED WORK
There are many studies on Web client access characteristics [7][8], web caching algorithms [3][5][9][10][11] and web cache consistency [12]. Major limitations and advantages of representative web

2011 Journal of Computing Press, NY, USA, ISSN 2151-9617 http://sites.google.com/site/journalofcomputing/

JOURNAL OF COMPUTING, VOLUME 3, ISSUE 7, JULY 2011, ISSN 2151-9617 HTTPS://SITES.GOOGLE.COM/SITE/JOURNALOFCOMPUTING/ WWW.JOURNALOFCOMPUTING.ORG

87

cache replacement algorithm is given in Table 1.1. One of the first ideas regarding caching on web was of static cache which stores queries identified as from an analysis of a query log file. But the performance of the static caches is generally poor mainly due to the actual infrequency of majority of the queries leading to a low number of hits [13]. Another category is dynamic caching technique primarily ruled by replacement policies like LRU or LFU. A hybrid caching structure maintains a static collection called Static-Dynamic Caching (SDC) that stores frequent queries and a dynamic one handles replacement techniques like LRU or LFU, thereby achieving good results [14]. Most studies however consider caching algorithms for web proxies for web servers. The simple taxonomy of cache policies viewed as sorting problems that vary in the sorting key used are studied [3]. Independent Reference Model [15][16] is the working basis of most conventional caching techniques. It advocates the removal of document with the smallest probability of occurrence among the documents in the cache. But the statistical information contained in the stream of requests is not considered and the technique fails to capture temporary locality. TABLE 1.1. COMPARING REPRESENTATIVE WEB CACHE REPLACEMENT POLICIES CACHE REPRECAT
EGO RY

The Markov Reference Model (MRM) [16][20] for caching applications models user requests a stationary and ergodic Markov chain. Correlations are tracked through the single-step transition probabilities. But performance is limited to independent distribution and request streams with less contact. A location cache to allow search node selection based on semantic caching and machine learning methods achieves good precision results [21]. Query processing cost as a determining factor for entry into cache is also studied [22]. A co-clustering algorithm [23] for construction P clusters of web pages and Q clusters of queries from a large query log exists that define a matrix with each entry giving the measure of query cluster importance to a document cluster. The method is used as a cluster selection strategy achieving accurate results

3 PROPOSED OBJECT CACHING MODEL

ORIENTED

RESENTING POLICY

TIME COM
PLEXITY

ADVANTAGES

DISADVANTAGES

RECENTLY

USED

LRU

FREQUENCY

O(1)

SIMPLE IMPLEMENTATION

SIZE, FREQUENCY AND DOWNLOAD FREQUENCY MISSED. SIZE, AGE AND DOWNLOAD FREQUENCY MISSED. FREQUENCY CONSIDERATION MISSED.

SIZE BASED

O(LOG 2 N)

BASED

SIMPLE IMPLEMENTATION

O(LOG 2 N)

SIZE

SIMPLE IMPLEMENTATION

Caching techniques are exploited for performance increase in several areas of computing. OO concepts are indulged in not only academics but also in practice for the last two decades [24]. This paper attempts to blend the power of OO concepts to the caching mechanism for enhanced information computing over WWW. The OO concept is particularly mapped to the design issue of decision for selecting the cacheable objects. The selection of web objects to be cached in the design process is highly relevant as it is crucial for the performance at the run time and a bad choice may nullify the effect of caching itself. Web cache is an application bridging the web servers and web users such that it keep track of requests for web objects identified as HTML pages, images, web pages, music and other downloadable files. The major purpose of OO caching is that all the web objects derived from a super class will be cached so that cached documents which are already hyperlinked from the inherited objects will be correlated strongly or loosely and prove to be relevant. Figure 1 specifies various types of web objects. As all retrieval objets are not relevant, similarly the proposed work suggest that all relevant objects are not cacheable. The proposed OO caching framework allows web users to share objects across requests and coordinates the objects life cycle across the processes. Essential components of caching framework are Query Handler, Cache Viewer (CV), Cacheable Object Administrator (COA) and Cache Object Replacer (COR). Figure 2 elaborates the basic configuration and working of the proposed model.

LFU

CAPABILITY TO OPTIMISE PERFORMANCE MEASURES

FUNCTION BASED

O(N)

MIX

TIME COMPLEXITY
Cacheable

Retrieved

Probability Driven Caching (PDC) [17] suggests computing a numeric values associated with the probability of query usage in future. Caching pairs of frequent terms based on the co-occurrence in the query logs results in increase in the hit ratio [18]. Caching posting lists rather than just doing results and/or terms higher hit rates [19].

Relevant

Figure 1. Types of Web Objects

JOURNAL OF COMPUTING, VOLUME 3, ISSUE 7, JULY 2011, ISSN 2151-9617 HTTPS://SITES.GOOGLE.COM/SITE/JOURNALOFCOMPUTING/ WWW.JOURNALOFCOMPUTING.ORG

88

Reli = F ((f1,f2, . . . . . , fn), )


Local Cache K

Local Cache 1

Where (f1,f2, . . . . . , fn) are the features of the ith web object matched with the domain feature set and threshold < m. is the minimum number of features of the web object i to be matched with domain feature set for it to be a member of the inheritance network graph of that domain. IF THEN ELSE <nm Reli = 1 =0

User 1

User K

Central Web

Cache

Cache Object Replacer

Query Handler Cache Viewer

Domain 1

Domain 2

Cache Object Administrator

Each web object is assigned a relationship function Reli based on its association with the feature set of the specific domain. Then the web object is labelled as below: IF Reli =0 Web object i not cacheable = 1 COA saves a copy of web object in the cache. IF another request refers to the same web object, CV will use the copy it has in cache.

Domain n

Figure 2. Framework for Object Oriented Caching.

Query Handler: When the users request based on query terms, the invocation goes to the Query Handler. It extracts one or more web objects from the entered query and filters similar requests as single and forward it to the Cache Viewer (CV). Cache Viewer: The requested web objects are checked in central web cache and if found it replies to the users with similar requests by sending the retrieved web objects copy to the local cache of the users. If the object is not found then the request is forwarded to the Cacheable Object Administrator. Cacheable Object Administrator (COA): COA search the desired web objects from the various domains repositories over web. It stores the domain specific feature set as (F1,F2, . . . . . , Fm) where m is the total features under consideration. The relevant web object is further handled and labelled as type cacheable by the COA as shown in Figure 3. COA stores inheritance network graph corresponding to the specific feature set reflecting the relatedness among the web objects sharing attributes inherited from its super classes. The retrieved web objects defined as set of features is matched for their association with the generated inheritance graph. The relationship function Rel i function for the ith web object is defined as

Cache Object Administrator


F1 F2 F3 . . . . Fm
Feature set of Domain Cacheable

Feature Set Matching

Web object Collection in Domain C Defined by Feature Set O1(f1,f2, . . . . . , fn) O2(f1,f2, . . . . . , fk) . . On(f1,f2, . . . . . , fl)

Yes

Objects

No

Central Web C h

Local Cache 1

Local Cache k

Figure 3. Working of Cache Object Administrator

JOURNAL OF COMPUTING, VOLUME 3, ISSUE 7, JULY 2011, ISSN 2151-9617 HTTPS://SITES.GOOGLE.COM/SITE/JOURNALOFCOMPUTING/ WWW.JOURNALOFCOMPUTING.ORG

89

Algorithm: Web_Object_Cache_Entry Step 1 Step 2


User 1

............

Local User K Cache K

Query Handler Processes Client Request

. . . .
Local Cache 1

Step 3

Step 3(a)

Cache viewer check for availability of

Step 4 Step 5

Central Web Cache

The Users / Clients request for desired web information Query handler Processes the Clients requests and filters similar requests as single and forward it to the Cache Viewer (CV) CV Checks The Central Web Cache IF (Web Object Is Available In Cache) THEN Send (Desired information to local cache of the users with same request) EXIT; ELSE (Forward request to COA) COA searches the desired web objects from the various domains repositories. COA generates the relationship function Rel i for ith retrieved web object as Reli = F((f1,f2, . . . . . , fn), ) where web object feature set (f1,f2, . . . . . , fn) are the features matched with the domain feature set (F1,F2, . . . . . , Fn) and a constant threshold is the minimum number of features of the web object i to be matched with domain feature set for it to be a member of the inheritance network graph of that domain. IF n (number of features of the web object i matched with domain feature set) > THEN Reli = 1 ELSE =0 COA further handles and labels the retrieved web objects as type cacheable. IF Reli = 0 THEN ith web object is not cacheable ELSE (web object is cacheable: COA saves a copy of web object in Central Web Cache ) GOTO Step 3 (a)

Not Cacheable

Cacheable

Web Object Found

Yes

No COA retrieves web objects from domain

Step 6

Generate Relationship function for each retrieved web object

Step 7

Yes

Reli= 1
No

Is

Step 8
Figure 4: Activity flow of Object Oriented Caching

Cached Object Replacer: In case of inadequate space there is a need to remove one or more web objects from the cache in order to free sufficient space. COR defines the cache replacement process to guarantee enough space for the incoming web objects. Each object is checked for the usage and LRU [10][18] is exploited to remove from cache.

The Algorithm for the activity flow of the proposed Object Oriented Caching framework is as under:

4. Conclusion In real time information computing over web, users often experience high delay. The reason is due to the enormous raise of the traffic on the web resulted from exponential growth of web usage. Caching plays an important role in the performance of any large-scale distributed system and, and web is the most eligible example with increasing Web applications round the clock. Web caching ensures non redundant network traffic thereby optimising application performance. Web applications like online application filling for visa, admissions to study courses through online counselling, flight booking, and online banking suffers the most with the access time delays. The two main advantages of the proposed model are the la-

JOURNAL OF COMPUTING, VOLUME 3, ISSUE 7, JULY 2011, ISSN 2151-9617 HTTPS://SITES.GOOGLE.COM/SITE/JOURNALOFCOMPUTING/ WWW.JOURNALOFCOMPUTING.ORG

90

tency reduction as the request is satisfied by the series of local web caches closer to the user and lowered congestion. Also as each web object is extracted from the web once, the traffic and bandwidth usage is minimised.

REFERENCES
[1] M. Abrams, C. Stanbridge, G. Abdulla, S. Williams, and E. Fox. Caching Proxies: Limitations and Potentials, Proc. 4th Intl Conf. on the WWW, Boston, USA, pp. 119-133, December 1995. ( Conference proceedings) E.P. Markatos, Main Memory Caching of Web Documents, Proc. of 5th Intl WWW Conference, Paris, France, pp. 893-905, May 1996. ( Conference proceedings) S. Williams, M. Abrams, C. R. Standridge, G. Abdulla and E. A. Fox, Removal Policies in network caches for World Wide Web documents, Proc. of SIG COMM 96, pp. 293-305, 1996. (Conference proceedings) H. Bahn, S. H. Noh, S. L. Min, and K. Koh, Using full reference history for efficient document replacement in web caches, Proc. 2nd USENIX Symposium on Internet Technologies and Systems 99, Boulder, Colorado, USA, pp. October 1114, 1999. (Conference proceedings) P. Cao and S. Irani, Cost-Aware WWW Proxy Caching Algorithms, Proc. USENIX Symposium on Internet Technology and Systems (USITS 97) 97 pp. 1- 14, 1997. (Conference proceedings) L. Rizzo and L. Vicisano, Replacement Policies for a Proxy Cache, IEEE / ACM Trans. on Networking 00, vol.8, no. 2, pp. 158-170, April, 2000. (IEEE / ACM Tranactions) M. Arlitt and C. Williamson, Web server workload characterization, Proc. 1996 ACM SIGMETRICS Intl Conf. on Measurement and Modelling of Computer Systems, vol. 5, no. 5, pp. 631-645, May 1996. (Conference proceedings) Thomas M. Kroeger, Darrell D. E. Long, and Jeffrey C. Mogul, Exploring the Bounds of Web Latency Reduction from Caching and Perfecting, Proc. USENIX Symposium on Internet Technology and Systems, December 1997. (Conference proceedings) A. Balamash, M. Krunz, An Overview of Web Caching Replacement Algorithms, IEEE Communication Surveys & Tutorials, vol.6, no.2, pp.44-56, 2004. (Journal Citation) K. Wong, Web Cache Replacement Policies: A Pragmatic Approach, IEEE Network, vol.20, no.1, pp.28-34, 2006. (Journal Citation) Lei Shi, Yingjie Han, Xiaoguang Ding, Lin Wei, Zhimin Gu, An SPN based Integrated Model for Web Perfecting and Caching, J. of Computer Science and Technology, vol. 21, no.4, pp.482-489, 2006. (Journal Citation) Fred Douglis, Anja Feldmann, Balachander Krishnamurthy, and Jeffrey Mogul, Rate of Change and Other Metrics: A live Study of the World Wide Web, Proc. of USENIX Symposium on Internet Technology and Systems, December 1997. (Conference proceedings) E. Markatos, On Caching Search Engine Query Results, Computer Communications, vol. 24, no. 7, pp. 2000. (Journal Citation) T. Fagni, R. Perego, F. Silvestri, and S. Orlando, Boosting the Performance of Web Search Engines: Caching and Perfecting Query Results by Exploiting Historical Usage Data, ACM Transactions on Information Systems (TOIS), vol. 24, no. 1, pp.51-78, 2006. (ACM Tranactions)

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[15] D. Starobinski and D. Tse, Probabilistic Methods for Web Caching, Performance Evaluation, vol. 46, no. 2-3, pp.125137, 2001. (Journal Citation) [16] O. Bahat and A.M. Makowski, Optimal Replacement Policies for Non-uniform Cache Objects with Optional Eviction, Proc. INFOCOM 2003, San Francisco, pp.427-437, April 2003. (Conference proceedings) [17] R. Lempel and S. Moran, Predictive Caching and Prefetching of Query Results in Search Engines, Proc. Intl World Wide Web Conf. (WWW), pp. 19-28, 2003. [18] X. Long, and T. Suel, Three-level Caching for Efficient Query Processing in Large Web Search Engines, Proc. Intl World Wide Web Conf. (WWW), Chiba, Japan, pp. 257-266, May 2005. [19] R. Baeza-Yates, A. Gionis, F. Junqueira,V. Murdock, V. Plachouras, and F. Silvestri, Design Trade-offs for Search Engine Caching, ACM transactions on the Web (TWEB), vol. 2, no. 4, pp.128, 2008. (ACM Tranactions) [20] K. Psounis, A. Zhu, B. Prabhakar, R. Motwani, Modeling Correlations in Web Traces and Implications for Designing Replacement Policies, Computer Networks, vol.12, no.4, 2004, pp. 379398. (Journal Citation) [21] F. Ferrarotti, M. Marin and M. Mendoza, A last-resort semantic cache for Web queries, String Processing and Information Retrieval (SPIRE), Lecture notes in Computer science, vol. 5721, pp. 310-321, 2009. (Lecture Notes) [22] Q. Gan and T. Suel, Improved Techniques for Result Caching in Web Search Engines, Proc. Intl World Wide Web Conf. (WWW), pp. 431-440, 2009. (Conference proceedings) [23] D. Puppin, F. Silvestri, R. Perego, and R. Baeza-Yates, Loadbalancing and Caching for Collection Selection Architectures, Proc. INFOSCALE, pp. 2, May 2007. (Conference proceedings) [24] Michael Stonebraker, Paul Brown, Dorothy Moore, ObjectRelational DBMS, Kaufmann Morgan Series in DATA Management Systems, 1990. (Book style) Dr. Pushpa R. Suri received her Ph.D. Degree from Kurukshetra University, Kurukshetra. She is working as Associate Professor in the Department of Computer Science and Applications at Kurukshetra University, Kurukshetra, Haryana, India. She has many publications in International and N ational Journals and Conferences. Her teaching and research activities include Discrete Mathematical Structure, Data Structures, Information Computing and Database Systems. Harmunish Taneja received his M.Phil. degree in (Computer Science) from Algappa University, Tamil Nadu, India and Master of Computer Applications from Guru Jambeshwar University of Science and Technology, Hissar, Haryana, India. Presently he is working as Assistant Professor in Deptt. of Information Technology, M.M. University, Mullana, Haryana, India. He is pursuing Ph.D. (Computer Science) from Kurukshetra University, Kurukshetra. He has published 14 paper s in International / National Conferences and Seminars. His teaching and research areas include Database systems, Web Information Retrieval, and Object Oriented Information Computing.

[9]

[10]

[11]

[12]

[13]

[14]

You might also like