You are on page 1of 6

The 3rd International Conference on Grid and Pervasive Computing - Workshops

A P2P Collaborative RFID Data Cleaning Model


Xiaogang Peng, Zhen Ji Software College of Shenzhen University, Shenzhen, P.R.China, 518060. Zongwei Luo, Edward C. Wong, C. J. Tan E-Business Technology Institute G01-G05 Technology Innovation and Incubation Building The University of Hong Kong, Pokfulam Road, Hong Kong

patrickpeng@126.com Abstract
RFID emerges to be one of the key technologies to modernize Logistics and supply chain management. In a typical RFID enabled logistics and supply chain application, there exist RFID readers to detect and identify the goods with RFID tags attached. Considering the huge amount of goods, the successful reading of RFID data becomes a crucial issue. Many algorithms and models to improve the RFID reading have been proposed, yet most of them focus on addressing the problem in a single reading node. In this paper, we introduce a P2P model to identify and remove inaccurate reading of RFID data by utilizing the information exchanged among related nodes along the business processing route of each RFID tagged item. The successful deployment of this model will ease the demand for high accurate reading of each RFID reading node while reducing the total cost of the RFID network. Our simulation shows the RFID network correct function against the business requirements will be ensured while the overall performance of the RFID network can be guaranteed. consists of a number of detection nodes. The reading data of each detection node will be integrated and analyzed to generated logics for higher level processing such as RFID management and event processing[5], therefore a little decrease in the successful RFID data read rate of each node will add up to a noticeable drop in the overall system performance in RFID event generating. Furthermore, the amount of RFID tagged items passing by these nodes is always huge [6], a slight decreasing of the overall successful read rate will cause a big number of misreads on tagged items, which will reduce the accuracy of further data process such as tracing or event generating . Maintaining high successful RFID read rates of the supply chain systems becomes one of the most crucial issues in the RFID research area. Unfortunately, according to related research as seen in [7,8], the observed accurate read rate of RFID data in the real-world project is about 60-70%, which is far from satisfying. In order to improve the successful read rates of RFID, many data cleaning algorithms and mechanisms have been proposed as seen in [9,10,11,12]. Most of the proposed data cleaning methods focus on the problems within a single node but ignore that the tag readings from related detection nodes will also provide useful information to assist the data cleaning processes. In this paper, we consider the detection nodes as a chain according the movement of a RFID tagged item in a supply chain and introduce a P2P model to identify and remove inaccurate reading of RFID data by utilizing the information of connected nodes along the processing route of each RFID tagged item. The successful deployment of this model will also ease the demand for high accurate reading of each RFID reading node while reducing the total cost of the

1. Introduction
Radio Frequency Identification (RFID) is a technology that allows readers to detect a tagged item without line of sight or contact by using radiofrequency waves. Because of the advantage of the reading flexibility, RFID is adopted in a wide range of applications [1,2,3,4]. In a typical RFID enabled Logistics and supply chain application, RFID readers are employed to detect the RFID tagged items passing by each detection node. Usually, a supply chain system

978-0-7695-3177-9/08 $25.00 2008 IEEE DOI 10.1109/GPC.WORKSHOPS.2008.12

304

RFID network. Our simulation shows the RFID network correct function against the business requirements will be ensured while the overall performance of the RFID network can be guaranteed.

2. Related Research
Comparing to the current bar code, RFID excels in the flexibility that no line of sight and contacts are needed, therefore has been deployed in many different area [2, 3, 4, 5]. For example, supply chain management will benefit from RFID technology in saving the labor cost on scanning tags, facilitating inventory replenishment decision and tracing products in supply chain [18]. Most of the existing RFID enabled systems share the same three-level architecture as shown in figure 1.
Enterprise Application Software Database or Data Warehouse

Data Processing Server

Detection Node

Detection Node

Detection Node

Figure 1. RFID System Architecture At the first layer, the detection nodes are deployed to read the RFID tags in different locations according to business requirements. In most systems, multiple readers are used in each detection node to reduce the false negative readings. Because of the comparatively low correct read rate, usually 60%-70%, of the raw RFID data captured by the readers, the cleaning or smoothing [8,9] step is needed. Some of the data cleaning mechanisms apply to readers [13, 15, 16] and some requires a centralized back-end system [17, 18], or the data processing server at layer two in figure 1, to handle the raw data. In the server, other manipulations like compression, simple rule association and event generation [19] can also be found before entering the enterprise application or database, which are at the third layer. One of the biggest challenges of the RFID data is the data volume. As a real world example, seven terabyte data generated by the Wal-Mart RFID trial

every day is reported [20]. Sending all these data in to a centralized system for data cleaning requires a high performance server as well as a high speed network, which will inevitably increase the total hardware cost. One solution is to distribute the centralized information into local nodes, then analyze the information in a distributed way. In this paper, we propose a P2P model to further reduce the communication overhead by introducing a data cleaning relationship establishment towards building data cleaning clusters, which will be discussed in Section 3. Different from the client-server network structure with centralized resources bundled to one or several servers, every node within the P2P network equally participates in services or applications by sending or receiving messages between other nodes through diverse channels within the network [21]. Through this structure, the processing power of each node and bandwidths of connection can be better utilized and integrated to accomplish some real time, heavilyloaded computation tasks. By modeling the RFID data cleaning task with the P2P network concept, we design a novel P2P model over the RFID detection node network and develop an algorithm to identify and remove error data generated by readers in the nodes. Consider the scenario that tagged items pass by a RFID detection node, there are three types of errors besides successful read: false negative, false positive and redundant reading as stated in Section 1. The redundant case can be removed by using a time stamps to mark the entering and exiting of the tagged item. Then by using the numbers of successful read, false negative and false positive, the performance of the node can be evaluated following the measures used in information retrieval [22]: let TP, which is the short form of true positive, denotes the number of successful read cases, FN and FP represents the numbers of false negative and false positive cases respectively, the precision rate P of the node will be P=TP/(TP+FP), and the recall rate R will be R=TP/(TP+FN). In this paper, R is used to measure the capability of the successful reading from the original tags in each detection node while P is a measure to evaluate the performance of each node in avoiding false positive cases. Another measure F1, formulated as F1=2PR/(P+R), is used to assess the overall performance of the node by combining precision and recall. In an ideal zero-error case, these three values will be 100%.

305

3. The P2P Collaborative RFID Data Cleaning Model


In many RFID enabled logistics or supply chain management applications, RFID detection nodes are deployed wherever the information of the tagged items needs to be captured. The nodes within a supply chain system physically form a complicated network connected with the channels. This scenario can be abstracted as a complicated graph of detection nodes with edges representing the possible movements of tagged items. When the tags of the items are recognized by a certain detection node, the information will be stored locally. Then the information will be sent to a centralized server for further data management process in normal case. In the P2P approach, we define a RFID data exchange network (RDEN) from the physical detection node network layer by taking the detection nodes as vertexes and the information exchanging between nodes as edges. The RFID data exchange network (RDEN) is modeled as a undirected graph G(V,E). The vertex set V is defined as the RFID detection node in a supply chain management system denoted by V(v1,v2,v3). There is an edge eE connecting detection node v1 with detection node v2 if there is information transferring between v1 to v2. The data management process is done by sending and receiving short messages between vertexes in this P2P network. For data cleaning purpose, a small number of vert For data cleaning purpose, a small number of vertexes in the network will be involved to form a data cleaning cluster (DCC). Nodes in the cluster are related by the business processing logic (BPL) towards solving RFID data cleaning problems. The method we used to build the DCC from RDEN is similar to the building of interconnected RFID reader collision model upon reader collision network mentioned in [13], which forms the network by exchanging the neighboring information to update the routing neighbor table in each node. The fact that each tagged item can only travel through one path in time order in the supply chain makes it possible to find a directed data cleaning path (DDCP) within the DCC. Assumption is made that the path can be determined before the each tagged item enters the system. The assumption is self validated in some scenarios, such as items moving through an assembly line or goods transporting from a manufacturer to distribute center then a certain retail store. One of the solutions for some other sophisticated

cases will be the divide and conquer methods[23], which segments the DCC into simpler sub-networks and apply some data aggregation based on the business processing logic or path finding algorithms in graphs. The DDCP is therefore defined as a set of ordered vertexes <v1, v2 vivn> , where n is the total number of nodes and vi is the i-th nodes that each tagged item passes. Consider the case of three nodes vi-1, vi and vi+1, vi-1 is the PREVIOUS node of vi while vi+1 is the NEXT node of vi. When the DDCP is known, the relation of the detection nodes along the DDCP can be utilized in RFID data cleaning process. As a tagged item passes a node, the occurrence of the item in this node will be recorded if it is read successfully, and this information will assist in detecting RFID reading errors for related nodes along the path. For example, if an item is not detected in a node yet reported in the previous node and the next node, a false negative case will be announced with a high probability. By exchanging the RFID reading information between the nodes, the false positive and false negative cases in a node will be detected and error correction will be applied properly. As a simple example, consider the case of a DCC with 4 nodes in the left hand side of Figure 2, tagged items are passing through node a0 to either a1 or a1 then goes to node a2. By information exchange between nodes, items are found both in a2 and a0, then a1 and a1 will be regarded as one node logically after applying BPL and the data of the two nodes will be integrated to form a DDCP as seen in the right hand side of Figure 2. Consider the scenario that some tagged items moving from stating point a0 towards ending point an, in a DDCP found in a DCC by using business logic, the tags are detected with a certain successful read rate ri by each detection nodes ai when it passes by. In each node ai, besides the table with records of RFID in the LOCATION, classic form of Ti<TagID, TIMESTAMP>[14]. For collaborative data cleaning purpose we introduced another table with the following structure: TCi<TagID, PRE, CURRENT, NEXT, STATUS >. In each row of the table, TagID records the tagged RFID of the item. Binary values in PRE, CURRENT, NEXT represent the detection of the item in previous, current and next node accordingly with 1 for detected and 0 otherwise. The STATUS column is used to indicate the data cleaning result of this item.

306

Data integration by BPL a1

a2 a0 a1

corresponding messages will be sent to ai+1 and ai-1. As the only correction decision can be made in this phase, the pattern 1,0,1, which represent the false negative case with high probability in ai, will triggers the action of assigning 1 to current status to signify the false negative case in this node and the STATUS is set to C1. The updating of STATUS column and actions taken on different pattern are summarized into table 1.
000 OK

DDCP(a0,a1,a2)
R(-C0) R(+C0)

R(-fp) or R(+fp), C(1,0), S(-C0), S(+C0)

100

001 R(+fn), C(0,1) S(-C1), S(+C1) R(+C1) 011

010

a0

a1

a2
R(+C1)

Figure 2, DDCP finding in a DCC by BPL The P2P collaborated data cleaning process can be divided into three phases: initialization phase, local correction phase and peer correction phase. In the initialization phase, when one of the tagged items is detected, a record will be inserted into the TCi with the RFID captured in the TagID column, at the same time, the value in CURRENT column will be assigned to 1 indicating that the item is recognized in this node. Then the information of the detected node will be sent to the previous and next nodes to update the values in the corresponding columns of the rows with the same RFID in table TCi-1 and TCi+1. Table 1. Local correction phase of node ai.
Pr e 0 0 0 1 1 Cur 0 1 1 0 0 Next 1 0 1 0 1 STATUS Pause Pause Pause Pause C1 Actions taken Send -FP to node ai+1 None Send +FN to node ai-1 Send +FP to node ai-1 Change CURRENT to 1, then send +C1 to node ai-1 and C1 to node ai+1 NONE NONE
101

110 R(-C1)

C(0,1) S(-C1), S(+C1) 111 OK

R(+C1)

Figure 3. State transition diagram in the peer correction phase. In peer correction phase, the correction decisions will be made by considering the PRE, CURRENT, NEXT patterns as well as the messages received from the previous node and next node. The detail operations for each different pattern of the nodes can be found in the state transition diagram in figure 3. In the diagram, the states are decided by the PRE, CURRENT, NEXT patterns. The transmissions of states are triggered by the messaged received, which is denoted as R (message) on the out-going arrows of the diagram. There are tour kinds of messages: FN (the item with the RFID tag is predicted to be false negative by the sender), FP (the item with the RFID tag is predicted to be false negative by the sender), C1 (the CURRENT value of the sender with the same RFID has been changed to 1) and C0 (the CURRENT value of the sender with the same RFID has been changed to 0). The + sign before a message shows that the message is sent by the next node while - sign indicates that the message is from the previous node. Another function, denoted as C(value1, value2) in the diagram, represents the CURRENT value of the current node will be changed from value 1 to value 2 and the STATUS column is changed to C0 or C1 according to value2. The function S(message) means

1 1

1 1

0 1

Pause OK

The local correction phase follows the initialization phase. In node ai, the program starts to check the values in PRE, CURRENT and NEXT columns of TCi. Based on the combination of these three values, node ai will update the values in STATUS column and

307

the message is sent to previous node (+ sign) or next node (- sign). A series of actions will take place in state transition, all of which are indicated on the arrows in figure 3.

4. Simulation and Results


In order to evaluate the performance of our proposed model, a simulation software system is developed. In the simulation system, the number of detection nodes n, the successful read rate ri for each node ai , as well as the number of RFID tags m are parameters that need to be decided before the simulation. Then the system generates a table of m RFID codes in the form To<TagID, LOCATION, TIMESTAMP> as the original data to represent the real world tagged items. The readings of the tagged items over each of the detection nodes are simulated as a table Ti . The RFID data in Ti is different from To because of the preset successful read rate ri in each node. For simplicity, we evenly set the successful reading rate of each node equal to a certain value r. The collaborative data cleaning process starts by initializing the Tci table in each node ai as explained in previous section. The sending and receiving of data will change the Tci table and the updated information will eventually change the Ti table as described in previous section. We randomly generate 1000 RFID tags to test the proposed model in DDCPs with 3, 5 and 7 nodes correspondingly. For each run, the same DDCP is evaluated under four different successful reading rates: 60%, 70%, 80% and 90%. The values of Precision rates, Recall rates and F1 measures of each node under different experiment settings are calculated. In order to give a overview of the performance of the whole model in each testing case, after applying the collaborative data cleaning algorithm, the average Precision rates, Recall rates and F1 measures of different testing cases are calculated and shown in table 2. Based on the results, the proposed collaborative data cleaning model improves the performance of each node under all testing cases. The highest improvement occurs in the nodes with the successful read rate of 60%, which achieves an improvement of 29% in average precision, 17% in average recall and 23% in F1 measure. This finding shows that the proposed collaborative model is capable in removing errors in a low successful read rate and therefore will ease the high demand of accuracy on each detection node leading to a lower hardware cost of the system. Another interesting phenomenon is that when the

number of nodes increases in a system, by applying the collaborative model, the recall rate is slightly improved under the same preset successful reading rate. This result contradicts to the belief that more nodes in a RFID system will cause more chaos in data cleaning and therefore this model is ideal for real world implementation because of this characteristic. The increase rage of 5%-8% in the measures of the test under a high success read rate 90% also suggest that this model even fits in some systems with data cleaning mechanism enabled for further improvement. Table 2. Average performance on cleaned data P_avg R_avg F1_avg r n
60 60 60 70 70 70 80 80 80 90 90 90 3 5 7 3 5 7 3 5 7 3 5 7 89.71% 89.30% 89.73% 92.88% 92.66% 92.79% 95.74% 95.23% 96.46% 98.16% 97.91% 98.40% 76.30% 78.06% 78.21% 82.58% 84.92% 85.10% 88.40% 89.70% 90.12% 94.61% 95.08% 94.83% 82.14% 83.04% 83.39% 87.27% 88.49% 88.69% 91.82% 92.32% 93.15% 96.33% 96.45% 96.57%

5. Conclusion
In this paper, we develop a novel P2P RFID data cleaning model from the physical layer of the detection node network in real world applications. By collaboratively sending and receiving messages between related nodes, the model is capable of detecting and removing false positive and false negative cases automatically to meet the data cleaning goal. Compared to most existing data cleaning mechanism, the P2P model excels in avoiding the calculation overhead in the centralized server and the huge amount data transfer problem to reduce the total network cost. In our simulation, DDCPs with different numbers of detection nodes at different preset successful read rates each time are tested and we achieve improvements in all testing cases. Especially, in low accurate reading rate settings, the improvements in recall rate are more than 15%. These satisfying results provide a way to ease demand for high accurate reading of each RFID reading node while reducing the total cost of the RFID

308

network. We also find that when the number of nodes in a DDCP grows, the recall rate of each node in our simulation actually goes up. This interesting finding shows the robust scalability of the proposed model and therefore will be ideal for real world implementation.

References
[1] Siemens to Pilot RFID Bracelets for Health Care.: http://www.infoworld.com/article/04/07/23/HNrfid implants 1.html (2004). [2] Worlds Third Largest Retailer Completes Warehouse RFID Implementation.: http://www.informationweek.com/story/showArticle.jhtml?ar ticleID=57702741 (2005). [3] Tesco Pushes on with Full-scale RFID Rollout.: http://www.computing.co.uk/news/1160636 (2005). [4] R. B. Ferguson, Logan Airport to Demonstrate Baggage, Passenger RFID Tracking, eWeek, 2006. [5] S.Chawathe, V.Krishnamurthy, S.Ramachandran, and S.Sarma, Managing RFID data, Proceedings of the 30th VLDB Conference, 2004, pp. 1189-1195. [6] B. S. Prabhu, Xiaoyong Su, Harish Ramamurthy, ChiCheng Chu, Rajit Gadh, WinRFID A Middleware for the enablement of Radio Frequency Identification (RFID) based Applications, Invited chapter in Mobile , Wireless and Sensor Networks: Technology, Applications and Future Directions, Rajeev Shorey, Chan Mun Choon, Ooi Wei Tsang, A. Ananda (eds.), John Wiley, 2005. [7] C. Floerkemeier and M. Lampe, Issues with RFID usage in ubiquitous computing applications Pervasive Computing: Second International Conference, PERVASIVE, 2004. [8] S. Jeffery, M. Garofalakis and M. Franklin, Adaptive cleaning for RFID data streams, Proceedings of the 32nd international conference on Very large data bases(VLDB), 2006, pp. 63174. [9] S. Jeffery, G. Alonso, M. Franklin, W. Hong, and J. Widom, Declarative support for sensor data cleaning, Pervasive, 2006. [10] J. Waldrop, D. W. Engels and S. E. Sanna, Colorwave. a MAC for RFID reader networks, In IEEE Wireless Communications and Networking Conference (WCNC), New Orleans, Louisiana, USA, 2003, pp. 17011704. [11] Daniel W. Engels, The reader collision problem, AUTO-ID Center Whitepaper, http://autoid.mit.edu/whitepapers/MIT-AUTOID-WH007.PDF, 2002. [12] Engels D.W., Sarma S. E, The reader collision problem, IEEE International Conference on Systems, Man and Cybernetics, Hammamet, Tunisia, 2002. [13] Shijie Zhou, Zongwei Luo, Edward Wong, CJ Tan, Interconnected RFID Reader Collision Model and its Application in Reader Anti-collision, IEEE RFID 2007, Texas, USA, 2007. [14] Shan R. Orlowska M., Li X., RFID Data Management: Challenges and Opportunities, IEEE First International

Conference on RFID, Grapevine, Texas, USA, 26-28, 2007, pp. 175-182. [15]Yuan, Y.; Yang, Z.; He, Z.; He, J. Taxonomy and survey of RFID anti-collision protocols, Computer Communications, 29 (2006),pp.21502166 [16] Su-Ryun Lee, Sung-Don Joo Chae-Woo Lee, An enhanced dynamic framed slotted ALOHA algorithm for RFID tag identification. The Second Annual International Conference on Mobile and Ubiquitous Systems: Networking and Services(MobiQuitous 2005). San Diego, CA, USA, 2005, pp. 166-174 [17] M. J. Franklin, et al.. Design Considerations for High Fan-In Systems: The HiFi Approach. In CIDR. 2005. [18] Laurie Sullivan. RFID Implementation Challenges Persist, All This Time Later. Information Week, Oct 2005. [19] Richard Cocci, Yanlei Diao, and Prashant Shenoy. SPIRE: Scalable Processing of RFID Event Streams. In Proceedings of the 5th RFID Academic Convocation, April 2007. [19] Richard Cocci, Yanlei Diao, and Prashant Shenoy. SPIRE: Scalable Processing of RFID Event Streams. In Proceedings of the 5th RFID Academic Convocation, April 2007. [20] B. S. Prabhu, Xiaoyong Su, Harish Ramamurthy, ChiCheng Chu, Rajit Gadh, WinRFID A Middleware for the enablement of Radio Frequency Identification (RFID) based Applications, Invited chapter in Mobile , Wireless and Sensor Networks: Technology, Applications and Future Directions, Rajeev Shorey, Chan Mun Choon, Ooi Wei Tsang, A. Ananda (eds.), John Wiley, December 2005 . [21] A. Oram, editor. Peer-to-Peer: Harnessing the Power of Disruptive Technologies. O'Reilly & Associates, March 2001. [22] Raghavan, V., Bollmann, P., & Jung, G. S. (1989). A critical investigation of recall and precision as measures of retrieval system performance. ACM Trans. Inf. Syst.,7, 205229. [23] Ram Swaminathan Divide-and-conquer algorithms for graph layout problems Networks 28(2):6985, 1996 [24] S.Chawathe, V.Krishnamurthy, S.Ramachandran, and S.Sarma. Managing RFID data. Proceedings of the 30th VLDB Conference, pages1189-1195, 2004.

309

You might also like