Professional Documents
Culture Documents
2010 Int C 006 PP PDF
2010 Int C 006 PP PDF
Conference Information
Papers by Session Bradford, West Yorkshire, UK
Papers by Author
Getting Started 29 June - 1 July 2010
Search
Trademarks
Sponsors:
Published by
Conference Information
This year, CIT comprises of 15 technical tracks: Artificial Intelligence, Cloud Computing, Computer and
System Architecture, Computer Graphics/Image Processing, Computer Networks, High Performance
Computing, Information Security, Information Visualization, IT for Biomedicine, Management of Data
and Database Systems, New Web Technology and Applications, Software Engineering,
Telecommunications, Ubiquitous Computing, and Utility Computing.
This CIT-2010 main conference consists of 96 papers selected from more than 485 submissions, giving
an acceptance rate of 20%. We wish to acknowledge the authors for choosing CIT-2010 as the venue to
present their research results. The final decision of acceptance/rejection of submissions has been taken
after a high-quality review process involving 292 Program Committee members and 69 additional
reviewers. We thank the Program Committee and the additional reviewers that contributed their valuable
time and expertise to provide professional reviews working under a very tight schedule.
We are also pleased to have four distinguished keynote speeches to be given by Erol Gelenbe (Imperial
College, London, UK), Yi Pan (Georgia State University, USA), Peter Key (Microsoft Research
Cambridge, UK), and Weijia Jia (City University of Hong Kong, Hong Kong). We are grateful to the
keynote speakers for their support and contributions.
The coordination with the Steering Chairs, Daming Wei, and Laurence T. Yang and the General Chairs,
Geyong Min and Tarek El-Ghazawi, was essential for the success of the final program. We sincerely
appreciate their constant support and guidance. It was a great pleasure to work with such an excellent
team. Last but not the least, we would like to thank Yongwen Pan for his help with paper submission
system.
Finally, we expect that the conference favors a useful interaction between researchers, and provides a
stimulating forum for exchanging and developing new ideas in the exciting and rapidly changing field of
computer and information technology.
xxxviii
xxxvii
Papers by Session
Cheng Hao Jin1, Gouchol Pok2, Hi-Seok Kim3, Eun-Jong Cha4, Keun Ho Ryu1
1
Database/Bioinformatics Laboratory, Chungbuk National University, South Korea
2
Department of Computer Science, Yanbian University of Science and Technology, China
3
School of Electronics & Information Engineering, Cheongju University, South Korea
4
Department of Mechanical Engineering, Chungbuk National University, South Korea
kimsunghoyust@gmail.com, gcpok2000@gmail.com, khs8391@cju.ac.kr,
ejcha@chungbuk.ac.kr, khryu@dblab.chungbuk.ac.kr ,
Abstract—In some real-world applications, the predefined make classification models more accurate and easier to
features are not discriminative enough to represent well the understand. Frequent pattern mining (also association
distinctiveness of different classes. Therefore, building a more mining) was first proposed by Agrawal, Imielinski and
well-defined feature space becomes an urgent task. The main Swami [4] and widely studied in the past decades. A frequent
goal of feature space transformation is to map a set of features pattern is generated from the given database with its
defined in a space into a new more powerful feature space so frequency no less than user-specified minimum support
that the classification based on the transformed data can count. Frequent patterns contain much more underlying
achieve performance gain compared to the performance in the semantics than single features since a frequent pattern is the
original space. In this paper, we introduce a feature
combination of single features. However, usually a large
transformation method in which the feature transformation is
conducted using the closed frequent patterns. Experiments on
number of frequent patterns can be generated at a given
real-world datasets show that the transformed features minimum threshold so that it is impracticable to use the
obtained from combining the closed frequent patterns and the whole frequent patterns as features and becomes difficult to
original features are superior in terms of classification interpret. This led the researchers to study closed patterns,
accuracy than the approach based solely on closed frequent which can extract the whole frequent patterns even their
patterns. exact frequencies without any information loss. Also the
number of closed frequent patterns is much smaller than that
Keywords- Closed Frequent Pattern, Feature Space of frequent patterns. Thus, it is a good choice to use closed
Transformation, Classification Accuracy frequent patterns as features instead of the whole frequent
patterns.
I. INTRODUCTION The contributions of this paper can be summarized as
follows: 1) We introduce closed frequent pattern-based
Classification such as decision trees [1, 17], Support feature space transformation, 2) the extensive experimental
Vector Machine [2], neural network [3], Bayesian results on real-world datasets show that the transformed
networks[30], k-nearest neighbor[31], case-based feature space obtained from combing the closed frequent
reasoning[32], etc. has been received considerable attention patterns and single features could achieve better
in various communities. Classification is a supervised classification accuracy than that solely on closed frequent
learning task that can predict the class labels of unseen patterns.
instances. If the predefined features represent well the The rest of the paper is organized as follows. In Section
distinctiveness of different classes, the classification could II, we review related work. The basic terminology and the
achieve high classification performance. However, in some problem definition are presented in Section III. The
cases, original single features could not satisfactorily capture extensive experimental evaluations on real-world datasets are
the intrinsic structure of each class and thus cannot provide reported in Section IV and our conclusion and future work
useful information to the classifier. Thus, in order to improve are given in Section V.
the classification performance, there is an emerging need for
constructing a good feature space for the complex structural II. RELATED WORK
data. Feature space transformation, which is used to In this paper, we use closed frequent patterns as features
transform the original single feature space to another new to represent the data and then classification method is
more powerful feature space, can solve this problem. applied on this transformed feature space. Thus, our research
One of attempts to improve classification accuracy is to belongs to pattern-based classification task. Associative
apply frequent patterns in the classification task. In this classification [5, 6, 7, 13, 14, 15, 16, 20, 21, 22, 23, 24, 25,
research area, frequent patterns are used as features which 26, 27, 28], which integrates classification and association
mining, is related to our research topic. During last decades,
Corresponding author
1307
Definition 3 Feature Space Transformation TABLE II. DISCRETIZED DATASET
Assume P = {p1, p2, …, pp} be the set of closed frequent Dataset Instances(n) Features(F) Classes(C)
patterns generated from the given database D at a given T. Diabetes 768 15 2
Therefore, the feature space transformation can be expressed Glass 214 20 7
as follows: D = {xi, yi} in 1 o D’ = {zi, yi} in 1 where zi P is Heart-statlog 270 18 2
a set of generated closed frequent patterns. Iris 150 12 3
Waveform 5000 109 3
Example: Still take Fig. 1 (a) as an example and the Wine 178 37 3
generated closed frequent patterns are illustrated in Fig. 2 (b). Zoo 101 34 7
Then the transformed feature space is shown in Fig. 3 (b).
The state-of-the-art C4.5 [1] and SMO in Weka and
C5.0 in Clementine are chosen as classification models.
Notice that, classification accuracy is the primary evaluation
criterion for the experiments and 10-fold cross validation is
used. The results of classification accuracy from the
transformed feature space are shown in Table III, IV, V, VI
VII, VIII and IX. In these tables, P is the set of closed
frequent patterns and F P is the set of features union of
original single features and closed frequent features. From
different maximum value of T listed in each table, we can
see that the stop point of generating closed frequent is
different in each dataset respectively. These tables also give
average classification accuracy. From these tables, we
conclude that at most cases the transformed feature space
from F P achieves much higher classification accuracy
than features only from P.
TABLE III. CLASSIFICATION RESULTS ON DIABETS
Figure 3. (a) original transaction database and (b) transformed feature C4.5 SMO C5.0
space T(%)
P FP P FP P FP
IV. EXPERIMENTAL RESULTS 5 77.99 77.99 76.30 76.30 79.95 79.95
10 77.86 77.86 77.08 77.08 79.30 79.30
We use several datasets from UCI Machine Learning
Repository [10] to test the effect of closed frequent pattern- 15 76.56 76.56 74.09 74.09 78.65 78.65
based feature space transformation for the classification 20 76.17 75.78 75.39 75.00 79.30 79.56
accuracy. In these experiments, we compare classification 25 75.91 75.91 75.65 74.09 77.73 79.56
results of two transformed feature space. One feature space is 30 71.22 76.04 70.70 75.13 70.96 78.65
from solely on closed frequent patterns and the other is from 35 71.22 75.91 67.84 76.69 70.96 78.39
combining closed frequent patterns and single features. We 40 69.40 76.04 68.49 75.78 70.31 78.39
show the results at a wide range of Ts. A summary of these 45 69.40 76.17 68.10 75.91 70.44 78.52
datasets is shown in Table I. 50 69.40 68.62 70.31
75.91 74.74 78.39
TABLE I. DATASET USED IN EXPERIMENTS 55 69.14 75.78 68.23 77.08 70.05 78.65
60 69.14 75.78 68.23 77.08 70.05 78.65
Dataset Instances Features Classes
65 69.14 75.78 68.23 77.08 70.05 78.65
Diabetes 768 8 2
70 70.05 75.78 69.14 77.08 70.05 78.65
Glass 214 9 7
75 67.84 75.78 67.84 76.95 67.84 78.65
Heart-statlog 270 13 2
Average 72.03 76.21 70.93 76.01 73.06 78.84
Iris 150 4 3
Single
Waveform 5000 40 3 75.78 76.82 78.65
Feature
Wine 178 13 3
Zoo 101 16 7 TABLE IV. CLASSIFICATION RESULTS ON GLASS
1308
TABLE VII. CLASSIFICATION RESULTS ON WAVEFORM
15 76.64 76.64 77.10 77.10 80.37 80.37
20 72.90 76.64 77.10 78.04 77.57 80.37 C4.5 SMO C5.0
T(%)
25 72.43 76.17 77.57 78.50 78.04 80.37 P FP P FP P FP
30 72.90 76.64 77.57 78.97 78.50 80.37 5 73.06 73.04 78.68 78.66 84.38 84.38
35 71.03 74.77 76.17 78.50 77.57 79.44 10 74.66 75.58 85.36 85.58 86.26 86.16
40 69.63 74.77 71.03 79.44 75.23 79.44 15 74.24 76.16 82.86 85.42 82.06 85.80
45 68.69 74.77 69.63 74.30 71.96 79.44 20 73.06 75.66 77.78 85.84 78.30 87.20
50 68.69 72.90 70.09 75.70 71.96 79.44 25 62.94 76.06 64.30 85.92 66.44 87.06
55 65.42 73.83 62.62 76.17 66.82 79.44 30 52.44 76.10 51.18 85.96 52.90 87.18
60 64.49 74.77 63.08 73.83 66.82 79.44 35 40.50 76.24 41.10 85.92 41.10 87.18
65 50.47 74.77 48.60 73.83 50.00 79.44 40 40.50 76.24 41.10 85.92 41.10 87.18
70 50.47 74.77 48.60 75.23 50.00 79.44 Average 61.43 75.64 65.30 94.90 66.57 86.82
75 49.07 74.77 49.07 76.17 47.20 79.44 Single
76.44 85.96 87.32
Feature
80 49.07 74.77 45.79 74.77 47.20 79.44
85 47.20 74.30 47.20 75.70 47.20 79.44 TABLE VIII. CLASSIFICATION RESULTS ON WINE
Average 64.84 75.21 65.92 76.80 65.76 79.85
C4.5 SMO C5.0
Single T(%)
74.77 73.36 79.44 P FP P FP P FP
Feature
5 94.94 94.94 97.75 97.75 98.88 98.88
TABLE V. CLASSIFICATION RESULTS ON HEART-STATLOG
10 94.38 94.38 98.88 98.88 98.88 98.88
C4.5 SMO C5.0 15 95.51 95.51 98.31 98.31 98.88 98.88
T(%)
P FP P FP P FP 20 96.07 96.07 97.75 97.19 99.44 99.44
5 82.96 82.96 80.37 80.37 90.00 90.00 25 95.51 94.94 96.63 97.75 98.88 98.88
10 85.93 85.93 80.37 80.37 90.37 90.37 30 95.51 94.38 94.38 98.31 99.44 98.31
15 83.33 83.33 81.48 81.48 88.89 88.89 35 92.13 93.26 94.38 98.88 97.19 98.31
20 81.11 81.11 81.11 81.11 88.89 88.89 40 94.94 96.63 94.38 98.88 98.88 97.75
25 79.63 79.63 83.33 83.33 87.78 87.78 45 95.51 96.63 93.82 99.44 97.19 98.88
30 83.33 83.33 78.89 78.89 88.89 88.89 50 95.51 96.63 95.51 98.88 97.19 98.88
35 81.85 81.85 80.74 80.74 87.41 87.41 55 96.07 96.07 94.94 98.88 96.63 98.88
40 80.74 80.74 83.33 82.96 87.04 87.04 60 82.02 95.51 81.46 98.88 86.52 100
45 82.96 82.96 83.70 81.48 87.04 87.04 65 73.60 95.51 71.91 98.31 76.97 100
50 82.59 82.59 83.70 82.59 86.67 86.67 70 74.72 95.51 72.47 98.31 76.40 98.88
55 81.11 80.37 82.22 84.07 83.70 87.04 75 44.38 95.51 44.38 98.31 44.38 98.88
60 72.96 81.48 72.96 83.33 74.07 87.41 80 44.38 95.51 44.38 98.31 44.38 98.88
65 72.96 81.48 72.96 83.33 74.07 87.41 85 44.38 95.51 44.38 98.31 44.38 98.88
70 70.00 81.48 70.00 84.44 70.00 87.41 Average 82.91 95.44 83.28 98.45 85.56 98.91
75 70.00 81.48 70.00 84.44 70.00 87.41 Single
95.51 98.31 98.88
Feature
Average 79.43 82.05 79.01 82.20 83.65 87.98
Single TABLE IX. CLASSIFICATION RESULTS ON ZOO
81.85 84.07 87.41
Feature
C4.5 SMO C5.0
TABLE VI. CLASSIFICATION RESULTS ON IRIS T(%)
P FP P FP P FP
C4.5 SMO C5.0 5 93.07 93.07 96.04 96.04 100 100
T(%)
P FP P FP P FP 10 88.12 88.12 96.04 96.04 100 100
5 95.33 95.33 94.67 94.67 96.00 96.00 15 91.09 91.09 96.04 96.04 100 100
10 95.33 95.33 94.67 94.67 96.00 96.00 20 90.10 90.10 96.04 96.04 100 100
15 94.00 94.00 94.67 94.67 96.00 96.00 25 89.11 89.11 92.08 96.04 98.02 100
20 94.00 94.00 94.67 94.67 96.00 96.00 30 90.10 90.10 92.08 96.04 98.02 100
25 94.00 94.00 94.00 94.00 96.00 96.00 35 89.11 89.11 93.07 97.03 97.03 99.01
30 94.00 94.00 94.67 94.67 96.00 96.00 40 91.09 91.09 93.07 96.04 97.03 99.01
35 94.67 94.67 95.33 94.67 96.00 96.00 45 91.09 91.09 93.07 96.04 97.03 99.01
Average 94.48 94.48 94.67 94.57 96.00 96.00 50 90.10 90.10 93.07 96.04 97.03 99.01
Single 55 91.09 91.09 93.07 96.04 97.03 99.01
94.00 94.00 96.00
Feature
1309
60 86.14 91.09 87.13 95.05 89.11 99.01 [9] W. Fan, K. Zhang, H. Cheng, J. Gao. X. Yan, J. Han, P. S. Yu O.
Verscheure, “Direct Mining of Discriminative and Essential Frequent
65 87.13 93.07 85.15 96.04 89.11 99.01 Patterns via Model-based Search Tree,” Proc 14th ACM SIGKDD
70 87.13 93.07 86.14 96.04 89.11 99.01 International Conference on Knowledge Discovery and Data Mining,
2008, pp. 230-238
75 89.11 93.07 87.13 97.03 89.11 99.01
[10] D. Newman, S. Hettich, C. Blake,C. Merz, “UCI Repository of
80 81.19 93.07 79.21 97.03 79.21 99.01 Machine Learning Databases,” 1998
85 40.59 93.07 40.59 96.04 40.59 99.01 [11] U. Fayyad, K. Irani, “Multi-interval Discretization of Continous-
Average 86.20 91.21 88.18 96.16 91.61 99.36 valued Attributes for Classification Learning,” Proc. IJCAI 1993, pp.
Single 1022–1027
92.08 97.03 99.01
Feature [12] I. Witten, E. Frank, “Data Mining: Practical Machine Learning Tools
and Techniques,” Morgan Kaufmann, San Francisco, 2nd edition
(2005)
V. CONCLUSION AND FUTURE WORK [13] G. Cong, K. Tan, A. Tung, and X. Xu, “Mining top-k covering rule
groups for gene expression data,”. Proc. ACM International
In this paper, we introduce closed frequent pattern-based Conference on Management of Data (SIGMOD), 2005, pp. 670-681
feature space transformation and the experimental results on [14] J. Wang and G. Karypis, “HARMONY: Efficiently mining the best
real-world datasets show that the transformed feature space rules for classification,” Proc. SIAM international conference on Data
from union of closed frequent patterns and original features Mining Proceedings (SDM'05), 2005, pp. 205-216
achieves much higher classification accuracy than that from [15] F. Thabtah., P. Cowling and Y. Peng, “MCAR: Multi-class
only the closed frequent patterns with respect to various Ts. Classification based on Association Rule”, Proc. IEEE International
Conference on Computer Systems and Applications ,2005, pp. 127-
In the future, I plan to extend my research to mining the 133.
most discriminative closed frequent patterns efficiently from [16] A. Zimmermann and L. De Raedt, “Corclass : Correlated association
a large number of frequent patterns, to address the real-world rule mining for classification,”. In Discovery Science, volume
application challenges and further improve the learning 3245/2004 of Lecture Notes in Computer Science, 2004, pp. 60–72.
performance. [17] L. Breiman, J. Friedman, R. Olshen and C. Stone, “Classification and
regression trees,” Wadsworth Intl., 1984
[18] J. R. Quinlan and R. M. Cameron-Jones, “FOIL: A midterm report,”
ACKNOWLEDGMENT Proc. European Conf. Machine Learning, Vienna, Austria, 1993, pp.
3-20
This research was financially supported by the Ministry
[19] D. Lo, H. Cheng, J. Han, S. Khoo, and C. Sun, “Classification of
of Education, Science Technology (MEST) and Korea software behaviors for failure detection: a discriminative pattern
Industrial Technology Foundation (KOTEF) through the mining approach,” Proc. 15th ACM SIGKDD International
Human Resource Training Project for Regional Innovation Conference on Knowledge Discovery and Data mining, 2009, pp.
and also supported by Basic Science Research Program 557–566
through the National Research Foundation of Korea (NRF) [20] G. Dong, X. Zhang, L. Wong, and J. Li, “CAEP:classification by
funded by the Ministry of Education, Science and aggregating emerging patterns,” Proc. of The Second International
Conference on Discovery Science (DS'99), 1999, pp. 43-55
Technology (NRF No. 2010-0001732).
[21] J. Li, G. Dong, K. Ramamohanarao, and L. Wong, “DeEPs: a new
instance-based lazy discovery and classification system,” Machine
Learning, 2002, 54(2): 99-124
REFERENCES
[22] F. Thabtah, P. Cowling, and Y. Peng, “MCAR: Multi-class
Classification based on Association Rule”, IEEE International
[1] J. Quinlan. C4.5: Programs for Machine Learning. Morgan Kaufmann, Conference on Computer Systems and Applications ,2005, pp. 127-
1993. 133.
[2] Corinna Cortes and V. Vapnik. Support-vector networks. Machine [23] F. Thabtah, P. Cowling, and Y. Peng, “MMAC: A new multi-class,
Learning, 20:273-297, 1995. multi-label associative classification approach,” Proc. Fourth IEEE
[3] R. Duda, P. Hart, and D. Stork. Pattern Classification. Wiley International Conference on Data Mining(ICDM), 2004, pp. 217-224
Interscience, 2nd edition, 2000. [24] Z. Tang and Q. Liao, “A New Class Based Associative Classification
[4] R. Agrawal, T. Imielinski, and A. Swami, “Mining association rules Algorithm,” Proc. International Multiconference of Engineers and
between sets of items in large databases,” Proc. SIGMOD, 1993, pp. Computer Scientists, 2007, 685-689.
207-216. [25] G. Chen, H. Liu et al, “A new approach to classification based on
[5] B. Liu, W. Hsu, and Y. Ma, “Integrating classification and association association rule mining”, Decision Support Systems 42, 2006, pp.
rule mining,” Proc. Fourth International conference on Knowledge 674-689
Discovery and Data Mining(KDD), 1998, pp. 80-86. [26] R. Thonangi and V. Pudi, “ACME: An associative classifier based on
[6] W. Li, J. Han, and J. Pei, “CMAR: Accurate and efficient maximum entropy principle,” Proc. 16th International Conference on
classification based on multiple class-association rules,”. Proc Algorithmic Learning Theory (ALT), Singapore, Springer, 2005,
International Conference on Data Mining(ICDM), 2001, pp. 369-376. 122–134.
[7] X. Yin and J. Han, “CPAR: Classification based on predictive [27] X Li., D Qin., C. Yu., “ACCF: Associative Classification Based on
association rules,” Proc. SIAM International Conference on Data Closed Frequent Itemsets,” Proc. Fifth International Conference on
Mining (SDM'03), 2003, pp. 331-335 Fuzzy Systems and Knowledge Discovery (FSKD), 2008, pp. 380-
384
[8] H. Cheng, X. Yan, J. Han, and C.-W. Hsu, “Discriminative frequent
pattern analysis for effective classification,” Proc. International [28] A. Veloso, W. Meira Jr., and M. Zaki, “Lazy associative
Conference on Data Engineering(ICDE 07), Turkey, 2007, pp. 716– classification,” Proc. Sixth International Conference on Data
725. Mining(ICDM), 2006, pp. 645-654
1310
[29] B. Arunasalam and S. Chawla, “Cccs: a top-down associative
classifier for imbalanced class distribution,” Proc. the 12th ACM
SIGKDD international conference on Knowledge Discovery and Data
Mining(ICDM)), 2006, pp. 517–522
[30] Pearl, J. & Russel, S, “Bayesian networks. Report (R-277),” In Proc.
Handbook of Brain Theory and Neural Networks, . Arbib, ed, MIT
Press, Cambridge, 2000, pp. 157–160.
[31] D. Aha, D. Kibler and M. Albert”Instancebased learning algorithms,”
Machine Learning., 1991, pp. 37–66.
[32] I. Watson and F. Marir, “Case-based reasoning: A review,” The
Knowledge Engineering Review, 1994
1311