You are on page 1of 10

Enhancing Performance of Proxy Cache Servers

using Daemon process

Sachin Chavan, Naresh Lodha, Anukriti Rautela, Kunal Gupta

SVKMS NMIMS, Mukesh Patel School of Technology Management and Engineering,


Shirpur Campus, Dhule, Maharashtra, India

sachin.chavan@nmims.edu; nareshlodha.nmims@gmail.com;
anukritirautela.nmims@gmail.com; kunalgupta.nmims@gmail.com;

Abstract. The significant increase in internet users, has led to an enor-


mous rise in the network traffic which had a great impact on the char-
acteristics of the network like reduced bandwidth, increased latency and
higher response time for users. The most common method used is proxy
cache server which is deployed to reduce the web traffic and workload on
server. Proxy cache server stores the data locally but there is a problem
of stale data. There are many instances when the proxy cache returns
the data stored in its cache rather than returning the latest version from
the original server. This paper is concerned with a methodology that
address this issue of stale data by introducing Daemon Process. Daemon
is a background process that updates the data present in the cache peri-
odically. It does not interfere with the processes of the user and updates
the data when minimum number of users are online. Understanding and
identifying the main characteristics is crucial part to plan techniques
that help us save the bandwidth, improve access latency and response
time. This paper focuses on the factors affecting the proxy cache server
as well as on Daemon process.

keywords: Access Latency, Stale Data, proxy cache, mirroring, duplicate


suppression, daemon process.

1 Introduction

With the fast changing technology and people around the globe having access to
internet there has been significant growth in World Wide Web. There has been
rise in number of users and consumption of data has been increased. This has
notably contributed to increase in Web traffic. Web traffic leads to reduction in
bandwidth and increased access latency [11]. So lately there has been a lot of
significant research going on regarding the methods to reduce web traffic and
improve performance of web so as to decrease the response time and improve
access latency [15]. The most common method is the use of web proxy cache.
Web proxy caches have been established to reduce network traffic and work-load
Fig. 1. Architecture of Proxy Cache Server(Source [9])

on server and make the online communication as fast as possible [8].

A web proxy cache server is a system or software that runs on a computer


which is mediator between client and server [10] [11]. The client requests certain
pages and the web proxy forwards it to the original server if the requested
pages are not present in proxy’s cache (local storage). The main advantage of
using a proxy server is they can be established anywhere in the internet. Web
caches divide the load on server thus reducing access latency and saving network
bandwidth [14]. The basic function of web cache (both browser and proxy cache)
is to store the most visited web pages. If the user requests the same pages next
time, then it will be provided by the local server and not by the main server
thus, reducing the response time [4]. Many times the content of the web pages
stored in the server may be outdated or do not match with the new and updated
content of the web server leading to the problem of stale data. It can occur when
users try to extract the content locally for a span of time. In order to eradicate
the problem of stale data, we propose Daemon process. Daemon process is a
background process which is not in direct control of user. Daemon process runs
periodically and checks the consistency of the data cached and the data at the
web server. It updates information, which is done during slack time i.e. time
when web traffic is minimum [13]. Any new updates or changes made in a web
page can be detected by MD5 hashing where data of every element is stored [6]
[7]. This also helps in resolving the problem of content aliasing. This paper is
organized in following sections: In section II, we represent some related work.
Section III contains proposed methodology. Section IV gives the conclusion of
the work. In section V we discuss about the future scope.
2 Related work
2.1 Web Traffic
Web traffic is the amount of data that exchanged between the users and the
website. Web traffic is used to measure the most popular websites or web pages
within the websites. The increase in web traffic leads to increase in the response
time and hence, the data is loaded at a slower rate [1].

2.2 Mirroring
Mirroring is used to keep several copies of the contents available on website
or web pages on multiple servers in order to increase the accessibility of the
content. This causes redundancy and hence, ensures high availability of the web
documents [3]. A mirrored site is the exact replica of the original site which is
regularly updated and reflects the updates done on the actual site.

2.3 Caching
Caching is used to improve the response time as well as the performance of
WWW Internet applications. Caching causes reduction in system load. Most
techniques cache static content such as graphic and text files [1] [9].

Static Caching: Static caching checks the log of yesterdays user requests to
predict the requests of the user for the present day. Static caching improves cache
performance by using compression techniques and also frees up the cache space.
The cache servers performance is decided by two factors: byte hit ratio, which is
the hit rate with respect to the total number of bytes in the cache and hit ratio
which shows the percentage of all accesses that are fulfilled by data in cache [8]
[10].

Dynamic caching: Dynamic caching regularly updates the content on the local
server in order to keep up with the updated content of the dynamic websites
which helps in avoiding the problem of stale data [8] [11].

2.4 Duplicate suppression:


There are multiple instances when the occurrence of same content with different
URLs is available on the internet i.e. these web pages are duplicates of each
other but have different names. They are unwanted and use extra bytes over
the network which increases the network cost, bandwidth and creates storage
problem. If a cache finds a duplicate of a requested page then the network cost
can be reduced and the storage problem can be avoided [5]. Pre-fetching is a
technique in which the most likely data that will be requested in future is pre-
fetched i.e. the prediction of users future requests is done on the basis of past
requests made by user. So, the cached data is used to satisfy the request and
hence duplicate suppression is used [5].
2.5 Proxy Server

A proxy server is acts as an intermediary between a server and an endpoint


device such as the computer, from which a client is sending the request for a
service. A user connects to the proxy server while requesting for a file or any
other Internet resource available on a different server [14]. The proxy server
on receiving a request performs a search in its local cache of previously visited
pages. If the page is found then it is returned to the user or else the proxy server
acts as a client on behalf of the user and requests the page from the web server
by using its own addresses. If the page is found by the web server, the proxy
server relates it to the original request and sends it to the user or else the error
message is displayed on the users device [13].
Few common types of proxies are mentioned below:

– Shared Proxies: It serves multiple users.


– Dedicated Proxies: It just serves one server at a time.
– Transparent Proxies: : It simply forwards the request without providing any
anonymity to the user’s information i.e. the IP address of the user is revealed.
It is found in a corporate office.
– Anonymous Proxies: : It provides reasonable anonymity for most users i.e.
the web pages receives the IP address of the proxy server rather than that
of the client and allows the user to access the content that is blocked by
firewall.
– High Anonymity Proxies: : It does not identify itself as a proxy server i.e.
hides that it is being used by any clients.
– Forward proxies: It sends the requests of a client onward to a web server.
Users access forward proxies by directly surfing to a web proxy address or
by configuring their Internet settings.
– Reverse proxies: It passes requests from the internet, to an isolated, private
network through a firewall. It handles all requests for resources on destination
servers without requiring any action on the part of the requester.

3 Proposed Methodology

3.1 Cache Hit

As the Data requested by the user is available in the Local Storage/Cache and
should be updated as same as the Origin Web Server which helps user to get
fresh and correct data or information which is been request by the user. Hence
it is known as Cache Hit [5][8].

3.2 Cache Miss

As the Data request by user is not available in the Local Storage/Cache or


expired and the Data which is available is not update to date as in the Origin
Web Server. Hence it is known as the Cache Miss [5][7].
3.3 Stale Data

Many sites on Web Server is updated frequently with in a Day, Hour or Week.
So the data stored with Proxy server should be updated accordingly but this
affect the performance of the Server, so the Server avoid to update the Data
frequently. Hence it gives the Data which is not updated as on the Origin Web
Server. This data is known as Stale Data [9] [11].
For all above factor we purpose the methodology of using Daemon process in
the Proxy Cache Server which will help us to avoid the Factors discussed above.

3.4 Algorithm

1. Start
2. Initialize Count=0; It shows no process is updated.
3. Check Server the Time with local system,
(a) If time is in between 2 to 5;
Proceed to Step 3.
(b) Else;
Stop.
4. Check for Active Users,
(a) If Number of Users are less than N (for example n=50);
Proceed to Step 5.
(b) Else;
Wait for m minutes (for example m=15 minutes) and then go to Step 2.
5. Check for Update
(a) If Yes,
i. Download New Content.
ii. Replace new content with old content.
iii. Generate Hash and Store it in Table [7].
iv. Count=Count+1 ; Proceed to Step 6.
(b) If No,
Count=Count+1;
Proceed to Step 6.
6. Check for Count is less than number of process
(a) If yes,
i. Fetch next process.
ii. Repeat Step 2, 3, 4 and 5 again
(b) Else; Terminate the Daemon Process.
7. Stop
Fig. 2. Flow chart of daemon process

4 Results and Discussion


Table shows the response time of search engine for some keywords. When the
re-quest is passed through the browser it goes to cache memory when that par-
ticular page is not present in cache memory then the request is forwarded to
main server and page is fetched from main server. When that page is loaded on
browser copy of that page is stored at cache memory and next time when it is
Table 1. Response time of browser for some keywords

WEB IMAGE
KEYWORDS WEB CACHE KEYWORDS WEB CACHE
SERVER SERVER SERVER SERVER
SVKM 250 140 SVKM 230 200
NMIMS 140 130 NMIMS 300 100
RCPIT 250 120 RCPIT 350 150
CANNON 240 130 CANNON 250 100
SAMSUNG 210 140 SAMSUNG 640 200
NOKIA 250 190 NOKIA 240 120
MATLAB 240 160 MATLAB 280 160
OPERA 250 150 OPERA 120 120
SIEMENS 230 160 SIEMENS 310 100
MICROMAX 160 140 MICROMAX 190 110
MPSC 170 140 MPSC 180 100
UPSC 210 150 UPSC 150 140
IRCTC 160 140 IRCTC 330 90
RRB 260 120 RRB 310 70

requested by user reply given from cache memory. Table shows the figures for
response time of search engine for some of the key-words of simple web search.
First column shows the response time when the page is retrieved from main
server and second column shows the figures when the pages are retrieved from
cache server present locally. Table also shows the response time of browser for
image search with same keywords.

Fig. 3. Response time of browser for several Keywords (Web Search)


Figure 3 shows the response time for several keywords of browser from clients
which are using proxy cache hit. From figure the observation is that for some
key-words we are getting access latency reduced by half. For some keywords the
effect on access latency is very less or negligible. The results shown in the figure
are for simple web search. Among these keywords for some keywords we are
getting 50 percent or more than 50 percent reduction in access latency but in
some cases we are getting no effect on access latency. The reason for no effect
on access latency is that the IMS requests send by the browser for freshness of
data and if more updated copy is present on web server then that updated copy
fetched from main server (Cache Miss). The results shown in the figure 4 are for

Fig. 4. Response time of browser for several Keywords (Web Search)

image search. Among these keywords for some keywords we are getting 30-40
percent reduction in access latency but in some cases we are getting only 10-20
percent on access latency. The reason for no effect on access latency is that the
IMS requests send by the browser for freshness of data and if more updated copy
is present on web server then that updated copy fetched from main server.

5 Conclusion
Our paper proposes a novel methodology for implementing proxy cache server
by using the suggested algorithm. The proposed methodology states that the
Daemon Process will improve the performance of Proxy Cache Server in terms
of Cache Hit. The Daemon process guarantees that most of the time whenever
user access a Web pages the user gets the fresh content of Web page. As the
Daemon process is scheduled in the down time of Server it does not impose
addition load on to Server. Daemon Process runs in background and updates
the data available in Proxy cache without interfering the user processes. The
users are not interrupted due to schedule of Daemon process.
6 Future Scope
Currently, this Daemon process in system is working effectively with the websites
which are static in nature i.e. where the content is not updated frequently like
www.w3schools.com while the system is not effective with the dynamic websites
in which content is updated so often, such as, www.msn.com. In cases of dynamic
web-sites, the user will receive the stale data. Hence, an alternative method
is needed to be proposed along with the Daemon Process for overcoming this
problem.

Acknowledgments
We express our special thanks of gratitude to our Director Dr. Ram Gaud
(SVKMS NMIMS, Shirpur Campus). Also we would like to thank our Associate
Dean Dr. Nitin S. Choubey (MPSTME, Shirpur Campus)for his continuous sup-
port and guidance. We would like to acknowledge the support of the “Computer
Engineering Department” of Mukesh Patel School of Technology Management
and Engineering without which the completion of project would not have been
possible.

References
1. L. Breslau, Pei Cao, Li Fan, G. Phillips and S. Shenker, “Web caching and Zipf-like
distributions: evidence and implications,” INFOCOM ’99. Eighteenth Annual Joint
Conference of the IEEE Computer and Communications Societies. Proceedings.
IEEE, New York, NY, 1999, pp. 126-134 vol.1.
2. A. Mahanti, C. Williamson, D. Eager, “Traffic analysis of a Web proxy caching
hierarchy,” in IEEE Network, vol. 14, no. 3, pp. 16-23, May/Jun 2000.
3. Krishna Bharat, Andrei Broder, Jeffrey Dean, Monika R. Henzinger, “A COMPAR-
ISON OF TECHNIQUES TO FIND MIRRORED HOSTS ON THE WWW” in 4th
ACM Con-ference on Digital Libraries, 1999.
4. 1997. “Proceedings of the USENIX Symposium on Internet Technologies and Sys-
tems on USENIX Symposium on Internet Technologies and Systems” USENIX As-
soc., Berkeley, CA, USA.
5. Terence Kelly and Jeffrey Mogul. “Aliasing on the World Wide Web: prevalence
and per-formance implications”, Proceedings of the 11th international conference
on World Wide Web, 2002, ACM, New York, NY, USA, 281-292.
6. K. Jarvinen, M. Tommiska and J. Skytta, “Hardware Implementation Analysis of
the MD5 Hash Algorithm,” Proceedings of the 38th Annual Hawaii International
Conference on Sys-tem Sciences, 2005, pp. 298a-298a.
7. Takuya Asaka, Hiroyoshi Miwa, Yoshiaki Tanaka, “HASH-BASED QUERY
CACHING METHODS FOR DISTRIBUTED WEB CACHING IN WIDE AREA
NETWORKS” in IEICE TRANS. COMMUN., VOL.E82-B, No.6 June 1999.
8. S. B. Patil, Sachin Chavan, Preeti Patil, Sunita R Patil, “HIGH QUALITY DE-
SIGN TO ENHANCE AND IMPROVE PERFORMANCE OF LARGE SCALE
WEB APPLICATIONS” in International Journal of Computer Engineering and
Technology (IJCET).
9. Suryakant B Patil, Sachin Chavan, Preeti Patil, “HIGH QUALITY DESIGN AND
METHODOLOGY ASPECTS TO ENHANCE LARGE SCALE WEB SERVICES”
in International Journal of Advances in Engineering and Technology, March 2012.
10. Sachin Chavan, Nitin Chavan, “IMPROVING ACCESS LATENCY OF WEB
BROWSER BY USING CONTENT ALIASING IN PROXY CACHE SERVER”
in INTERNATIONAL JOURNAL OF COMPUTER ENGINEERING and TECH-
NOLOGY.
11. Andrzej Sieminski, “THE IMPACT OF PROXY CACHES ON BROWSER LA-
TENCY” in International Journal of Computer Science and Applications.
12. Balachander Krishnamurthy, Craig E. Wills, “PROXY CACHE COHERENCY
AND REPLACEMENTTOWARDS A MORE COMPLETE PICTURE”, in 19th
IEEE Interna-tional Conference on Distributed Computing Systems, Austin, TX,
June 1999.
13. Kartik Bommepally, Glisa T. K., Jeena J. Prakash, Sanasam Ranbir Singh and
Hema A Murthy “Internet Activity Analysis through Proxy Log”, IEEE, 2010.
14. E-Services Team, “Changing Proxy Server by the Robert Gordon University, School
hill, Aberdeen, Scotland-2006.
15. Chen, W.; Martin, P.; Hassanein, H.S., “Caching dynamic content on the Web,”
Canadian Conference on Electrical and Computer Engineering, 2003, vol.2, no., pp.
947- 950 vol.2, 4-7 May 2003.
16. Sadhna Ahuja, Tao Wu and Sudhir Dixit “On the Effects of Content Compres-
sion on Web Cache Performance, Proceedings of the International Conference on
Information Technolo-gy: Computers and Communications, 2003.

You might also like