Mining and Postmining - ICECT 2011 Paper

2011 3rd Prernational Conference on El Mining and Post-Mining of Time "TMathu *D.Narmadha ariment of Computer Science and Engineer Science and Engineering Karunya University ironies Computer Technology (ICECT 2011) Stamped Association Rules *'S.Geetha Coimbatore, India ‘path. tosify.con, *narmads Ae ia aed te pertin k Rroeca ae mee! a Oe reo d Kap ean ee re et orerator slecs the ine tanned” nae Ries feaeabow fal wercotiac te ace Be El oer Gorse ott ap te ata ae eee acres ethene, Keywords-Time stamped associ constaint template, matching operator tule, ontologs, wser- | INTRODUCTION Database Mining has attracted a growing amount of aention in database communities due to its wide pplcablity in various fields where vast amounts of data are collected every day. One major application domain of data rnining is in the analysis of transactional data. It is assumed that the database system keeps information about user transactions, where each transaction is a collection of data items. An association rule eaptures the notion ofa set of data items occurring together in transactions. In recent days the time related data come up in a variety of domains such as stock market analysis, medical data, environmental data, financial data and the analysis of the usage of web pages by different users at different time slots. In [1], (2h (3). they have paid attention to time-related information, which is implicitly related to transaction data, eg., the time that a transaction is executed, and discovered association patterns that vary overtime. In [4).Jin Soung Yoo and Shashi Shekar tas proposed an algorithm called SPAMINE(Similar Profiled temporal Association MINing m=thod) which mines Similar patterns that co-occurs with a particular event over time. It defines tight upper and lower bounds of true support sequences fo estimate support sequences without examining an input data set, For early pruning of candidate item sets, they uilized the concept ofa lower bounding distance, which i offen used for indexing schemes in the time’ series lieraure and defined the lower bounding distance which -142444677.91826.00 © 2011 IEEE tha_prabhasyahoo, co. in, *geethaskarunya edu further reduces itemset search space, However, they do not mine association roles AAs we know, association rules helps the decison makers 10 analyze the mining results effectively, our work focuses cn mining timestamped association rules. The. work of Agarwal and Srikanth [5] discovers association rules and i has been extensively studied in [6], (7). [8]. and [9].The use of monotonicity propery af suppor ced to pine candidate Atemsets is proposed in [5]. In 4, they have proposed the ‘monotonicity propery of lower bounding distance which is more effective in pruning the candidate itemsets. In out approach, we use the prining techniques as suggested by Yoo and Shashi in the process of mining timestamped association rules. Mast of the existing algonthms generate huge volume of association rules, where most ofthe mined rules ae less interesting tothe use. For example, n'a market basket analysis, the use of traditional association mining algorithm [3] discovers the association between all he items im the dataset. Bu ifthe data miner is interested only in analyzing the sale of confectionery items, he would be frustrated by the enormous amount of other rules which are ol useful for his analysis, So itis crucial to devise a scheme which helps the decision maker to do his analysis based on his constraints. Ontology is a formal representation of the knowledge by a set of concepts within a domain (ep Supermarket) and the relationships between those concepts (ee. Confectionery items in supermarket), It is used to reason about the properties of that domain, and may be used to describe the domain, Fig. I shows the example of a supermarket domain with its concepts in a hierarchical Figure 1. Example of Domain Ontology In this paper, we propose an approach to mine timestamped association rules and to select interesting association rules from huge volume of discovered rules, This approach consists of two phases. In the first phase, we use pruning, techniques proposed in [4] to reduce the itemset Vae149> 11 Conference on Electronies Computer Technology (ICECT 2011) 2011 3d bnrernational Confere In the second phase, — information be ee subset Fen eg sentiiic © items. we second nfrmation pecans th 9 search space of candida erm ie discovered from the Thus, gen so ee not Possible i me Ivaned chit temses Pangan om enced SUT rae Fules, user interesting rules are discovered using na Thalomain ontology which integrates user knowledge! i, The complexity of the user knowledge can be represen more effectively using ontology. We propose an approse! called user-constraint template which allows the dataminer to select the interesting relationship between the concepts. A matching operator is defined over the user-constraint template in order to describe the actions that the wser can Perform. This operator is used 10 select the rules that are interesting to the data miner, The mining results can improve Aphis. chain planning ‘and retail decision making. for maximizing the visibility of items most likely’ to be in high demand during special time periods Ml, RELATED WORK In the literature, several methods have been examined for Thining and post mining association rules. To our knowledge there is no method that post mines temporal association rules Using the concept of ontology. Cyclic association rule mining hich detects periodically repetitive pattems of frequent item Se1S over time has been examined by Ozden [1] Cy associations can be considered as item sets that occur in gxery cycle with no exception. The work of Ozden [1] was gitended by Ramaswamy [3] for relaxed matches. Li, Ning, fuane and Jajadia [2] explored the problem of finding frequent item sets along with calendar-based patterns, The calendar-based patterns are defined with a calendar schema, €8» (year, month, and day). For example, (*, 12, 25) represents the set of time points each corresponding to the 25th day of December. However, real-life pattems. ane usually Imperfect and may not demonsirate ‘any regulos periodicity, In the work of Li [12], a temporal pattern i defined with the set of time points where the user expects discovered item sets to be frequent. In [4] emporal panera are searched with a user defined numeric reference sequence and consider the prevalence similarities of all possibe neg, sels, nol only frequent item sets. Dong and Li {13} presente the problem of mining emerging pattems, which are item aoc whose suppons increase significantly ffom one dala ser ak another. Liu, Hsu and Mal]4) studied the change’ ce fundamental association ‘rules between two time periods using suppon and confidence. In contrast, our paper ares: on mining timestamped association rules. Association ars mining has been examined in the literawre through many algorithms. The CLOSET algorithm was proposed in 13), 8 new efficient method for mining closed itemsets. CLO uses @ novel frequent pattern tree (FP-tre) situcture, winch is @ compressed representation ofall the transactions wich database. Another solution forthe reduction of the number frequent itemsets is mining maximal frequent itemsaty tie The authors proposed the MAFIA algorthn based on snl firs waversal and several pruning methods as" pyle’, Equivalence Pruning (PEP), FHUT, HUTMFI, or ane! fecording. However, the main drawback of the menue extracting maximal frequent itemsets is the. Ince 2S and properties (17) [18}. Unfortunate Beng een data evatuaton, the objective measures are nats reduce the number of extracted rules and tg sts Interesting ones, A very reeen approach, (19) wer in a preprocessing step. The data set is first Preproces according tothe constants exacted fom the eng then, the data mining step takes place. The dierent: 2 aporeach is that, fist, they “apply consina preprocessing task, whereas we work in the post . task. The advantage of the pruning constaimt eS Permits to exclude from the start the information that me ser 3s not interested in, thus permitting to apply the soe algorithm 10 this new database, Let us consider thar wea is not sure about which items he/she should prune. In at case, heishe should ereate several pruning test and fons test, he/she will have to apply the Apriori algorithm wins execution time is very high. Second, they use SEROUS order to express user knowledge, and they propose smo. expressive and flexible language for user expeeone fepresentaion, ie., Rule Schemas. In contrast, our pprea uses user-consiraint template which selects onl ie interesting rules ML NOTATIONS The timestamped association rule mining task can te stated as follows: Let I= {i1, i2,..in} be a finite set of i and D-DIU ...U Dn, Di Mj =0, i = j, be a time stumpe! Tunsaction database wherein each transaction consist of timeslot and an itemset, Let T =t1, 12. be the timeslots 4 minimum_support sequence S= over time sk Uj. is applied to find the distance berween each tem. and user specified threshold, Many similarity measures hav TES" discussed in the time series databace Tierena A Lp norm is the most Popularly used distance measure in? similar time sequence Search[1].We use L2 norm distuxe cared Euclidean distance to find ihe eandidne items Which satisfies the minimum, Support sequence threshold the given timeslots 1- if support_count (1) support_count(s) (at each time slot) 2 ‘min_conf_sequence threshold(min_conf sequence is a sequence of confidence values over different time slots) ‘The above steps to generate timestamped association rules ean discover a prohibitive amount of association rules. For instance, when the number of attributes and the number of transactions becomes large, thousands of rules are ‘extracted from a database, As the number of rules become huge, itis difficult for the data miner to analyze the mining results, Also it is impossible to use the results. Thus, itis crucial to help the decision-maker with an efficient technique for reducing the number of rules. The interestingness of the rule strongly depends on interactivity with the user (22) W415ionce on Ele enal Conferen ‘sar ard Imernati teresting rules © ser Existing methods do not omit aa _ re tect the inl be extracted. To sé! : knowledge should ee: enc rei undesanae orm emin O ee cee eal Tevels of the knowlede disse Om roles at several levels o se eomitt”doa comm provides an exPlic repeat in. Ontology ain, where each concept presents the ‘items. present a domain, representation of concepts in ee is a collection if items. Instance of @ cconcept level ground Figure 3. Ontology based on sopermartet example The subsumption relation between concepts shows the is- 4 superclass of, is-a subciass of relations. The concept- instance relation represents the relation between concepts and the instances. There are two types of concepts: leaf: concepts and generalized concepts from the subsumption relation (=). Leaf-concepts are connected in the easiest way to database—each concept is associated to one item in the database. Generalized concepts are described as the concepts that subsume other concepts in the ontology. A generalized concept is connected to the database through its subsumed concepts. The Rule Schema formalism is based on the specification language for user knowledge introduced by Liu, The model proposed by Liu is described using elements ory ‘ng an is-a organization of database an item taxonomy allow attributes. Using item Tepresentation of user filtered rules are mor constraint template ean be represented i ICgrocery item: «< Computer Technology (ICECT 24, fopose a matching operator fy wating aperetor: (M) selects the tint Seg tle that matches with the user-specine MP ge the matching operator is applied Gee tne te template M (UC), the antecedent and ih Sonat timestamped association rules should mare, hen Thus by using POTAR algorithm, yey generated timestamped association ‘ry ae ey hy & interesting. VLEXPERIMENTAL EVALUA Tio, EXPERIMENT DATASET Our experiments were performed wis jestamped transaction (T1000. AT100_1100_PS0_APS db) genera yg tool [11]. We use the following. parameters {S28 1 the synthetic data sets we used. T is the tyqi tte transactions (10,000), T is the number of gmt? (100), $ is the average size of transactions 0, an the number of time slots. A reference time cnet generated by choosing randomly a support seq; item set. zt B. EXPERIMENT RESULTS ‘The association rules generated from the smi is compared with that of the rules generate fom ine Algoritim. The pruning techniques used inthis papers to be more effective than the monotonicity pre support used in it. Since the lower bounding disaer: upper and lower bound support sequences are wel number of candidate itemsets generated is ret considerably. Thus it results in the reduction of ecuie time for the generation of timestamped association refi shown in the graph cited in Fig. 4.1t shows that te oe execution time of traditional association mule nity technique is more when compared with our approach. 4 # ome, Table. 1 shows the comparison of number of inet: rules selected when matching _operior = 4 ZO — esas a es tw atten tite Figure 4, Evaluation of execution time 61 3 not constraint template and whe? yy spplied over user a nts ‘®. represents the five different const! ie Ni V4e159OO 2011 3rd International Confe ference on Electron Hes Computer Technology CECT 2011) celet different Set of interesting rules. Our 6 ga evaiaton prowes thal the rules generate om Beet nde selected URS Fe eesti to the on panysoX OF THENUMBER OF RELES WITH AND WT apt COMP* VG MATCHING OFFRATOR Hour Without Operator 100% vil. CONCLUSION ‘is paper discusses the problem of mining timestamped ssaviaign Tules at each time slot and postmines for song wser interesting rules, The rules discovered Seine the association between the items that have the Seiar support during a particular event. Furthermore, the Gaoered ules are pruned by using ontology and user Sarat templte 10 help the decision maker to analyze the reslls effectively. ACKNOWLEDGMENT The authors would like to thank Ms-S.Geetha, Asi Profesor of Karunya University for her encouragement sed support in this work. REFERENCES i] 8 Ones, Ramasvamy, and A. Sibersehas, oy ns IEE nT Cone Daa Ene CODE at [) Y.Li P. Ning, X.S. Wang, and 5S. Jajodia, “Discovering Jendar- ‘xed Temporal Association Rules,” J. Data and Knowledge Eng 18 1S,n0. 2, 2003. 5 Rarapnany, 8 Mahan, and A. Sifberscatz “On the Dis cTiesing Fates i Association Rules” Proc dart Cont: Ve yy 2 Dees VLDB), 198 Jin Sug Yoo and Shashi Shekhar, “Similarity Assiaion Mining", IEEE Transactions on Kno! $212.00. 8 Ag 209. \arval and 8 Srikan, “Fast Algorithms for Mining Associouon ip, BAS Pre Int Cont, Very Large Databases (VLDB). 1904. ot 1 Nan pe and Vn, “Mining requent, Pes ty Sie Gcnnton” Pre. ACM SIGMOD, 2000 4 ¥. Fu -Discovery of Multi-Level Associ {af Paahas Poe. Int Coa. Very Lane ne M. Chen, and P. Yu, “Aa Effetive Hashing-Based Algorithm 1 eite8 Asocaon Rules." Proc. ACM SIOMOD. I agg Ristutand Agra htNing Generalized Axseition BIC I Cot, Very Large Databases (VLDB), 1995 cyclic Association Profiled Temporal edge And Data jon Rules from (WLDB), 110] Claud vt farnica and Fabrice Guillet. “Keoraled onlelge ined Ireracine Tharasctins an Knuoledge Jind Dese Eee ease Ge Srp! Be Met ea Notes i Computer Scene, 208, Volume 1392008 0410, DOF Giooreresscoseinng syn ome EN ess, mae nm Yin Dicmeog emp Paty Le Hanae Sa) ing ge See a 6 828 te Bea tae ata SOE, 2 TY A ere ana 31 Rint a Ca een i Chere AS Tacs ee i adate Se 2 [16] 2. Burdick. M. Calimlim, 1 Flannick J. Gehske, and T. You. “Mato. 2 ie he io aos SSE Th gt wn Petia Hsin en an sa Un EST kn n,n “cg te Peak nd YA tt: lis ee poy CER ER et Rata: Di ae te cons Tae ak ae Mong fst nace See SF {20] D. Gunopulos and G. Das, “1 ime Series Similarity Messares.” Sore SO SESo ead taba Te pcr en rE ee a ss ning i Aten Creat ea Damen lS slap ers rie 12) ining vs153

Mining and Postmining - ICECT 2011 Paper

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Mining and Postmining - ICECT 2011 Paper

Uploaded by

Copyright:

Available Formats

You might also like