This action might not be possible to undo. Are you sure you want to continue?
1 2 3
School of Computing Science, Simon Fraser University, BC, Canada Department of Computer Science, Hong Kong Baptist University, Hong Kong, China
Department of Computer Science and Technology, Tsinghua University, Beijing, China
ion-based sampling technique to minimize the chance of distributing forged copies from malicious nodes. We have built a prototype of the proposed system, and our experimental results demonstrate that it has fast response time with low overhead, and can effectively identify and block malicious nodes. The remainder of this paper is organized as follows. In Section II, we present our trustable P2P web caching system in detail. The performance evaluation of the system is presented in Section III. Finally, Section IV concludes the paper. II. THE P2PCLIENT WEB CACHING SYSTEM Fig. 1 depicts a generic P2P web caching system. With a P2P network, the storage spaces of several machines are virtually combined to form a huge web cache space to serve all the peers (clients). We now detail the operations of the system, including discovering neighboring nodes, searching desired web objects, and maintaining the trust level.
Abstract-Conventional web caching systems based on client-server model often suffer from the limited cache space and the single point of failure. In this paper, we present a novel peer-to-peer client web caching system, in which end-hosts collectively share their web cache contents. Aggregating these individual web caches, a huge virtual cache space is formed, and the burden on web servers can be greatly lightened. We design an efficient algorithm for managing and searching in the aggregated cache. We also implement consistency control to prevent sharing stale web objects in peers’ caches. Finally and most importantly, considering that end-hosts are generally not trusty as servers or proxies, we employ an opinion-based sampling technique to minimize the chance of distributing forged copies from malicious nodes. We have built a prototype of the proposed system, and our experimental results demonstrate that it has fast response time with low overhead, and can effectively identify and block malicious peers. 1
I. INTRODUCTION In the past decade, the web is growing with tremendous speed and the contents are becoming enormously rich. To reduce network traffic and user latency, web caching systems have been widely deployed [14,15]. However, existing caching systems often suffer from the limited cache space and the risk of single point failure. There have been many proposals on cooperative caching among proxies -, yet they may still suffer from the similar problems in a traditional client-server model. In this paper, we present a novel peer-to-peer (P2P) client web caching system, in which end-hosts in the network collectively share their web cache contents. Aggregating these individual web caches, a huge virtual cache space is formed, and the burden of the web server can be greatly lightened. Yet there are several critical issues to solve: First, how shall we manage the client caches for ease of search? Second, how to control the consistency with dynamic nodes? Third, how to maintain a reasonable trust-level of the system, especially considering that the clients are generally not trusty as servers or proxies? To this end, we design an efficient algorithm for managing and searching the aggregated cache. We also implement consistency control to prevent sharing stale web objects in peers’ web cache. Finally and most importantly, we propose an opin1 J. Liu’s work is partially supported by a Canadian NSERC Discovery Grant and a SFU President’s Research Grant. 2 X. Chu’s work is partially supported by a Research Grant Council, Hong Kong, China, under Grant RGC/HKBU2159/04E, and a Faculty Research Grant of Hong Kong Baptist University (FRG/03-04/II-22).
Fig. 1. Overview of the P2P web caching system
A. Neighbor Discovery Discovering other online nodes is quite an important issue in decentralized P2P network. A careless design of discovery protocol, like pinging a range of IP addresses and ports, would cause heavy network traffic overhead. Motivated by the JXTA Peer Discovery Protocol (PDP) , we have implemented two ways for discovering peers. One is active, in which a peer is allowed to request new peer information from its existing neighbors. The other method is a passive one where a peer advertises itself to other peers periodically. There is no single dedicated bootstrap server in our system. Every peer will keep a list of cached address for start up. We assume that every peer in the network knows at least one other
0-7803-8938-7/05/$20.00 (C) 2005 IEEE
Authorized licensed use limited to: UNIVERSITI UTARA MALAYSIA. Downloaded on August 25, 2009 at 06:31 from IEEE Xplore. Restrictions apply.
i. during the above operations. Restrictions apply. we need a representation of the trustworthiness of a peer in the network. Fig. When a node first joins into a network. Thus. The node receiving the reply can then ask that node for the object. One point to note is that the node whose node ID is closest to the GUID must be a trusty node. the neighbor will reply with a miss. the node with ID second closest to the GUID will be asked instead. Therefore we apply a concept of opinion to express the subjective beliefs of a node about the others . If local search misses. The node can cache a copy of that in its own cache and forward a copy to the web client. as asking this node will have greater chance of search hit with lower chance of replacement of the target object from the cache. indicating that it can provide no information for the search. If there is such a case. but limited by TTL. It will then send a search request to ask that neighbor for the object. In addition. in a certain radius of network. At the point of looking into the search history. Before implementing sampling. If more than one node has asked for the object.peer. A peer can start to evaluate the trustworthiness of another peer when it gets any file resources from the cache of that peer. 9] on its string of IP address concatenated with a colon and its port number dedicated for the P2P communications. the neighbor should record the nodes in the search history. the neighbor will reply with the address of the latest node asking for the object. This state of uncertainty about a peer is represented by the uncertainty part of the opinion. Node BBB this time is responsible to get object c for Client B. if there are no replies from peers in any circumstances. We are currently incorporating Distributed Hash Table into our system to further improve its search efficiency. the object should be retrieved from the originating server immediately. 2009 at 06:31 from IEEE Xplore. and keeping the search history. the search is finished and an OK response is sent back to the web browser with the object attached. It requests its neighbors for other peers’ information and a list of nodes is returned. Peers receiving the advertisement will add the node to its list of address. . The node receiving the miss will then retrieve the object from the originating server. In the passive method. a node will first pass the URL to the SHA-1 hash function to get a 16-byte GUID of the requesting object. the peer must get information of new peers in an active way. if some nodes ask the same requested node for the object. C. others may connect to the node in the future. Node CCC now knows Node AAA have previously asked for object c. If the object is found in that node’s cache. which does not express uncertainty. All nodes in this history list have a chance of holding an object. Therefore each node will have a responsibility of providing objects in its cache to other peers. It has not cached the object and thus sends a search request to Node CCC.e. disbelief. An opinion consists of three elements: belief. there is at least one entry of cached address in the list which could be configured manually at the very beginning. we introduce a sampling technique to prevent dishonest node distributing fake web file copies to other peers. If that neighbor has the object. In long run. Later. search is then finished. it gets no idea about the trustworthiness of its neighboring nodes. Downloaded on August 25. the node would have to determine a neighbor whose node ID is closest to GUID by comparing the two strings lexicographically. This concept of opinion is different from the traditional probability calculation used in many existing trust models. The GUID is then used to search the object in its local cache. each node maintains a search history of the other nodes. See Fig. this node will have a chance of being introduced to other nodes for holding the object. Finally. If the neighbor does not have the object. Searching In our web caching system. maintaining the number of valid entries in the list would rely on the neighbor discovery protocol. Once the number of neighbors is below the lower bound. B. for each search. It redirects Node BBB to ask Node AAA for the object. Upon receiving the request. Adopting the Probabilistic Search Protocol . If the returned object from a peer is checked to be genuine comparing with the 307 Authorized licensed use limited to: UNIVERSITI UTARA MALAYSIA. it will look into its search history of other peers. Searching is initiated by a web client with a search request. 2. every node has a 16-byte node ID generated by applying the SHA-1 hash function [8. a large request history will be generated. a copy of the file object can be retrieved and the search can be finished. If this closest neighboring node is distrustful. and uncertainty. the searching node should get the object from the original source immediately so as to minimize the object retrieval time. Afterwards. Opinion-based Sampling To increase the trustiness of our system. broadcast is adopted to advertise a peer itself. the neighbor may not be able to find a node entry for the requested object. If the object is found. A lower bound of the number of online neighbors is also configured. An illustration of the search operation. Advertisement is forwarded peer by peer. The peer can then try to connect with new neighbors. 2 for an illustration of the operations. The time interval of advertisements and the value of TTL should be chosen carefully to avoid creating too much traffic on the network. Otherwise. and the neighbor will send the node a copy of the object.
Every time when we got a file from a peer. At the same time a distrusted node will not be introduced to other nodes upon search request. dC . which wants to reduce the request to the original server by getting cache copies from the peer. 1). but its opinion about C is affected by the opinion of peer B about C if peer A has already got some belief about peer B.copy in the original server. Therefore we adopt a scheme of sampling. During this period the opinion about this node will be set to (0. 0. but the searching node may find that it should distrust this node according to the opinion it previously had. dC . so its opinion about all the others will start from (0. uA = p +n +2 B p +n +2 B p +n +2 A AB AB AB bC + bC d A + dC u A + uC A′ A′ = C = C . the higher the belief about a peer. disbelief. Let (bB . . On the other hand. Downloaded on August 25. Consistency Control For consistency control. the peer will be distrusted for an expiration time. Assuming older files will expire after a longer time. if there exists evil nodes in the network which return fake copies of file to the others. each file is associated with an expiration period and an expiration timestamp. The evaluation of a peer’s trustworthiness involves the checking of accuracy of a returned file object. ωB . (b A′ . Then node A wants to know node C’s trustworthiness and ask node B to give node A its opinion on node C. if a peer is found to return forged files. there may be a situation that peer A is totally uncertain with peer C’s trustworthiness in the beginning. we will generate a random number between 0 and 1. 1). a request with the last modified date of the file in the “If-Modified-Since” header field is sent to the original server. It is not wise to check the object every time as this will downgrade the performance of the system and violating the spirit of our design. A′ = bC AB AB AB (bC . dC = bB dC . the corresponding disbelief about that peer will increase. The file of a distrusted node will not be requested even though the ID of this distrusted node is closest to the GUID of the file. These three elements always satisfy the following condition: A A A bB + dB + uB =1 A A A A about C. which performs the checking process occasionally. one is A about B and the other is B 308 Authorized licensed use limited to: UNIVERSITI UTARA MALAYSIA. As mentioned above. the disbelief A A A about this peer will increase. uC 2 2 2 By the above calculation.3. If this number A is smaller than 1 − bB . a job list is formed. However. if peer A got a relatively low belief about peer B. the peer will gain trust from the peer issuing the request. A state diagram for the operations is shown in Fig. dB . a peer can get a relatively objective opinion about the other peers. In this case the searching node will not request this distrusted node for the object either. Let the opinion of A about B. The expiration timestamp is the sum of the last modified time of the file and the expiration period. 0. There are two opinions here. However. uC = dB + uB + bB uC Then the final opinion of peer A about peer C. The list is checked periodically to find out which files have their timestamp expired. To do this. uB ) be the opinion of peer A about peer B which consists of belief. we let the clients periodically check back with the server to determine if cached objects are still valid. ωC B A AB AB AB by ωB to be (bC . If a peer A already got a relatively high belief in its opinion about a neighboring peer B. We define and opinion of B about C. D. to be (bB . uB ) B B B B to be (bC . peer A will reduce the probability of checking B’s returning object. uC ) : When a new peer joins into the network. Restrictions apply. 3. the opinion of all existing peers about the new peer will start from (0. and younger files will expire after a shorter time. A simple representation of this probability A will be 1 − bB . There is a chance that a neighbor node introduces a node to the searching node that it thinks to be trustworthy. In every step of a search we must make sure that the node being requested is trusty. and uncertainty respectively. If the disbelief about a peer exceeds a certain threshold. the opinion about this node will be set back to (0. or the file is found to be different from that in original server. 2009 at 06:31 from IEEE Xplore. The combination method we used is discounting combination which works as follows: Suppose node A has already got the opinion about node B. An accuracy checking will have exactly two results: the file object from the peer is exactly the same as the one from the original server. dC . we will take this file as a sample and do the accuracy checking. This indicates that the server should only return the requested document if the document has been changed since the Apart from this. Now we can express the elements of the opinion of node A about a neighboring node B as functions of p and n: p n 2 A bB = . the lower will be the chance of performing accuracy checking of the peer’s returning file. uC ) and original opinion A A A (bC . Similarly. d A′ . a peer needs to get a copy of the object from the original web server immediately and compare the two copies. it will be uncertain with the trustworthiness of all its neighboring peers. When this distrust period expires. dB . uC ) . 1). dC . we have implemented the feature that allows a peer to collect neighboring peers’ opinions about the others periodically. the probability of checking B’s returning object will increase. This means the belief parts of the opinion to the corresponding peer increases. By arranging the expiration timestamp in ascending order. 1. u A′ ) C C C will be a combination of of peer A about peer C. uC ) and is the discounting of ωC calculated by the following equations: AB A B AB A B AB A A A B bC = bB bC . which is regarded as a “negative event” (n). 0. which is regarded as a “positive event” (p). and fakeness of the object is being detected. In our implementation the threshold of distrust is set to 0. Combining the others’ opinions and its own. 0). For a file with timestamp expired. dA = . dC .
. The old file is replaced by the new one and the expiration period is divided by a factor. It can be seen that the hit rate is generally increasing and approaching 100% for each test. the simulator keeps a list of node addresses and ports to which it can request web objects from the web caching system. Periodically. in a smaller network. A request will be issued in every 500ms. the number of requests reaching each node decreased when the number of nodes increased (as the number of requests generated by the simulator in one minute is fixed). Variation of Hit Rate against Time 100% 80% 60% 40% 20% 0% 1 3 5 7 9 11 13 15 17 19 21 23 4 nodes 8 nodes 12 nodes 16 nodes Time (in minute) Fig. Then it acts as a common web browser. Thus. These both contribute to a lower hit rate. the hit rate approaches 100% in a shorter time interval.6-9. 4. the node with the desired object may not be in the vicinity. if no new file is returned. depending on the applications. Therefore. We also show the time to reach 99% hit rate in Table 1. another program is implemented to forge cached files. when the peer-to-peer network becomes larger. It can be seen that the dishonest node is figured out in a shorter interval if the network is smaller. it randomly selects a URL and randomly selects a node to request for the web objects. and hence the request rate in our experiment is 120 requests per minute. the frequency of file retrieval from the dishonest node decreased. leading to a slower increasing rate of disbelief. There are several causes for this. The causes for these trends are similar to those for searching. 3. Downloaded on August 25. Experiment setup. To find out the efficiency of searching. Variation of Hit Rate against Time Number of nodes Time for reaching 99% hit rate 4 5 8 15 12 15 16 >24 Fig. The disbelief levels against time in each test are shown in Figs. the expiration period should be longer and it would be multiplied by a factor. we have conducted a series of tests with a simulator that synthesizes requests as well as other node activities. To investigate the trustiness of the sampling algorithm. the number of cached objects in each node is smaller. the expiration period should be shorter. 5. To investigate its searching efficiency and trustiness.Hit Rate specified date. A dishonest node is identified if its disbelief is greater than a threshold. 309 Authorized licensed use limited to: UNIVERSITI UTARA MALAYSIA. Fig. the frequency of figuring out the dishonest node is higher. State diagram illustrating the dynamic changes of expiration period of a file in the cache. 2009 at 06:31 from IEEE Xplore. If a file is returned. Restrictions apply. with fewer numbers of nodes. It also keeps a list of URL. Moreover. As shown in Fig. 4. Thus. The number of requests reaching each node decreased when the number of nodes increased. PERFORMANCE EVALUATION We have built a prototype of the proposed system. However. Amount of time needed to reach 99% hit rate for different number of nodes in the network. Second. hit rate is measured in the experiment. First. we then measure the disbelief level of the dishonest node in each network. To create a dishonest node in the network. The expiration period for each file is estimated in this way and the period is changing dynamically. The variation of hit rate against time with different number of nodes is shown in Fig. III. 5. Table 1.
 X. Li. and Y. Wessels and K. Xu.2 0. 1998 D.1 0. H. A. 310 Authorized licensed use limited to: UNIVERSITI UTARA MALAYSIA. pp.15 0.25 Variation of Disbelief against Time for 16 Nodes Node0 Disbelief 0. Luotonen. 1995. S.  R. Li. April 17. Liu and J. Seltzer.25  Disbelief  Node9 Node10 0.  J. 6. in Proceedings of ACM Symposium on Principles of Distributed Computing. Liu. “Scalable P2P Search”.3 0. Restrictions apply. Eastlake 3rd and P.” IEEE Computing in Science and Engineering.1 0. 11.” IEEE Communications.. San Diego. September 1997. no.2 0.15 0. “Caching and Prefetching for Web Content Distribution. CONCLUSION In this paper.3 0.3 0. 40-49. M. Miwa. “US Secure Hash Algorithm 1 (SHA1)”.25 0. “Internet Cache Protocol. 2009 at 06:31 from IEEE Xplore. New Jersey. Lyu. Disbelief level against time for 16 nodes ACKNOWLEDGEMENT The authors would like to thank for C.15 0. D. 9. Liu. 141-152. and J.05 0 1 3 5 7 9 11 13 15 17 19 21 23 25 Node0 Node1 Node2 Disbelief 0.1 0.35 0. Englewood Cliffs. “Squirrel: A Decentralized Peer-to-peer Web Cache”. vol.25 Time (in minute) Fig. Claffy. Time (in minute) Fig.  J. Jia. “Distributed Web Caching using Hash-based Query Caching Method”. We further introduce the concept of opinion to represent the trustworthiness of individual peer. 37-44. To increase the trust-level of the system. September 2001.3 0. IV. Rowstron. Big Sky. CA. (v2)”. “World-Wide Web Cache Consistency”. 2003. Disbelief level against time for 12 nodes  D. 122-135. Gwertzman and M. we propose to use a sampling technique to minimize the chance of distributing fake web file copies among the peers. pp. 83-87. “Proxy Caching for Media Streaming over the Internet. Tanaka. IEEE Internet Computing. Jones. Special issue on Web Engineering.1 0. August 2004. R. and X. 7. Menascé. August 2004. 1999.05 0 1 3 5 7 9 11 13 15 17 19 21 23 25 Node1 Node5 Node7 Node8 Node9 Node15 Time (in minute) Fig. 7. and P. T. Disbelief level against time for 4 nodes Variation of Disbelief against Time for 8 Nodes 0. Lyer. in Proceedings of IEEE International Conference on Control Applications.05 0 1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31  Node12   Time (in minute) Fig. and can effectively identify and block malicious peers. “Hash Routing for Collections of Shared Web Caches”. IEEE Internet Computing. 1. J. RFC2187. 2002. September 1997. B. November /December 1997. Ross. in Proceedings of the USENIX Annual Technical Conference. in which peers in the network share their web cache contents.2 0. Downloaded on August 25. pp 1620-1625. “Protecting Free Expression Online with Freenet”. vol.05 0 1 3 5 7 9 11 13 15 17 19 21 23 25 REFERENCES    A. and our experimental results demonstrate that it has fast response time with low overhead. “Secure Hash Standard”. Prentice Hall.  J. National Institute of Standards and Technology. “A Trust Model Based Routing Protocol for Secure Ad Hoc Network”. MT. D. We have built a prototype of the proposed system. Java P2P unleashed. vol. RFC2186. Indianapolis: Sams.2 0. 2004. K. pp. Node4 Node5 Node9 Disbelief 0. (v2)”. Flenner. March / April 2003. Druschel. January 1996. 8. 2002. pp. 5. vol. W.15 0. Disbelief level against time for 8 nodes  Variation of Disbelief against Time for 12 Nodes 0. Clarke et al. Wessels and K. Choi for their effort in building the prototype. Asaka. RFC3174. Ng and P. March 6-13. “Application of Internet Cache Protocol. A. in Proceedings of IEEE Aerospace Conference. we have presented a trustable peer-to-peer web caching system. . Xu.Variation of Disbelief against Time for 4 Nodes 0. Feature Topic on Proxy Support for Streaming on the Internet. pp. I. Web Proxy Servers. IEEE Network. Claffy. 2.