Welcome to Scribd, the world's digital library. Read, publish, and share books and documents. See more
Download
Standard view
Full view
of .
Save to My Library
Look up keyword
Like this
3Activity
0 of .
Results for:
No results containing your search query
P. 1
An Interactive Visualization Methodology For Association Rules

An Interactive Visualization Methodology For Association Rules

Ratings: (0)|Views: 128 |Likes:
Published by ijcsis
The task of the knowledge discovery and data mining process is to extract knowledge from data such that the resulting knowledge is useful in a given application. Obviously, only the user can determine whether the resulting knowledge satisfies this requirement. Moreover, what one user may find useful is not
necessarily useful to another user. Visual data mining tackles the data mining tasks from this perspective enabling human involvement and incorporating the perceptivity of humans. The objective of this paper is to present the students performance through visualization mining method on data coming from educational institute. Such method together with the novel visualization technique described here allows the analyst to explore data and view significant differences among performance values of students. The results are immediately presented in a graphical form and the user is allowed to change settings in order to allow him or her to iteratively explore the data and find some useful knowledge.
The task of the knowledge discovery and data mining process is to extract knowledge from data such that the resulting knowledge is useful in a given application. Obviously, only the user can determine whether the resulting knowledge satisfies this requirement. Moreover, what one user may find useful is not
necessarily useful to another user. Visual data mining tackles the data mining tasks from this perspective enabling human involvement and incorporating the perceptivity of humans. The objective of this paper is to present the students performance through visualization mining method on data coming from educational institute. Such method together with the novel visualization technique described here allows the analyst to explore data and view significant differences among performance values of students. The results are immediately presented in a graphical form and the user is allowed to change settings in order to allow him or her to iteratively explore the data and find some useful knowledge.

More info:

Published by: ijcsis on Mar 08, 2011
Copyright:Attribution Non-commercial

Availability:

Read on Scribd mobile: iPhone, iPad and Android.
download as PDF, TXT or read online from Scribd
See more
See less

03/19/2011

pdf

text

original

 
AN INTERACTIVE VISUALIZATIONMETHODOLGY FOR ASSOCIATION RULES
MOHAMMAD KAMRAN
Research Scholar, Integral University, Kursi Road,Lucknow, India
 
E-mail: mkamran_lko@hotmail.com,
Dr. S. QAMAR ABBAS
 Professor, Ambalika Institute of Management &Technology, Lucknow, India
Dr. MOHAMMAD RIZWAN BAIG
 Professor, Department of Information Technology, IntegralUniversity, Lucknow, India
Abstract-
The task of the knowledge discovery and data miningprocess is to extract knowledge from data such that the resultingknowledge is useful in a given application. Obviously, only theuser can determine whether the resulting knowledge satisfies thisrequirement. Moreover, what one user may find useful is notnecessarily useful to another user. Visual data mining tackles thedata mining tasks from this perspective enabling humaninvolvement and incorporating the perceptivity of humans. Theobjective of this paper is to present the students performancethrough visualization mining method on data coming fromeducational institute. Such method together with the novelvisualization technique described here allows the analyst toexplore data and view significant differences among performancevalues of students. The results are immediately presented in agraphical form and the user is allowed to change settings in orderto allow him or her to iteratively explore the data and find someuseful knowledge.
I. INTRODUCTIONFor data mining [1] to be effective, it is important toinclude the human in the data exploration process andcombine the flexibility, creativity, and general knowledge of the human with the enormous storage capacity and thecomputational power of today’s computers. Visual dataexploration aims at integrating the human in the dataexploration process, applying its perceptual abilities to thelarge data sets available in today’s computer systems. Thebasic idea of visual data exploration is to present the data insome visual form, allowing the human to get insight into thedata, draw conclusions, and directly interact with the data.Visual data mining techniques have proven to be of high valuein exploratory data analysis and they also have a high potentialfor exploring large databases. These huge databases contain awealth of data and constitute a potential goldmine of valuableinformation. As new courses and new colleges emerges, thestructure of the educational database changes. Finding thevaluable information hidden in those databases and identifyingand constructing appropriate models is a difficult task. Datamining techniques play an important role at each stop of theinformation discovery process and visual data explorationusually allows a faster data exploration and often providesbetter results, especially in cases where automatic algorithmsfail. In addition, visual data exploration techniques provide amuch higher degree of confidence in the findings of theexploration. This fact leads to a high demand for visualexploration techniques and makes them indispensable inconjunction with automatic exploration techniques.The main contribution in this study is addressing thecapabilities and strengths of data mining technology inidentifying placement of students and to guide the teachers toconcentrate on appropriate attribute associated and counsel thestudents or arrange for suitable placement to them. In thiswork, we propose a dynamical framework for association rulemining that integrates interactive visualization techniques inorder to allow users to drive the association rule findingprocess, giving them control and visual cues to easeunderstanding of both the process and its results.
 
II. ASSOCIATION RULE MINING (ARM)Association Rules Mining (ARM) [2] can be divided intotwo sub problems: the generation of the frequent itemsetslattice and the generation of association rules. The complexityof the first sub problem is exponential. Let |I|=m the number of items, the search space to enumerate all possible frequentitemsets is equal to 2
m
, and so exponential in m [2]. Let I ={a
1
,a
2
, … , a
m
} be a set of items, and let T ={t
1
, t
2
, … , t
n
} be a setof transactions establishing the database, where everytransaction t
i
is composed of a subset X

I of items. A set of items X

I is called itemset A transaction t
i
contains anitemset X in I, if X

t
i
. Several ARM published papers arebased on two main indices which are support and confidence[2]. The support of an itemset is the percentage of transactionsin a database where this itemset is one subgroup. Theconfidence is the conditional probability that a transactioncontains an itemset knowing that it contains another itemset.An itemset is frequent if support (X)
minsup, where minsupis the user-specified minimum support. An association rule isstrong if confidence(r)

minconf, where minconf is the user-specified minimum confidence. Left part of an association ruleis called antecedent and right part is called conclusion. Our motivations are described hereafter. 
(IJCSIS) International Journal of Computer Science and Information Security,Vol. 9, No. 2, February 2011129 http://sites.google.com/site/ijcsis/ISSN 1947-5500
 
III. MOTIVATIONThe number of generated rules is a major problem onassociation rules mining. This number is too significant andleads to another problem called Knowledge mining. Thehuman cycles spent in analyzing knowledge is the real bottleneck in datamining. This issue can limit the final user‘sexpertise because of a strong cognitive activity. To solve it,visual datamining became an important research area. Indeed,extracting relevant information is very difficult when it ishidden in a large amount of data. Visual data mining attemptsto improve the KDD process by offering adapted visualizationtools which allow tackling various known problems. Thosetools can use several kinds of visualization techniques whichallow simplifying the acquisition of knowledge by the humanmind. It can handle more data visually and extract relevantinformation quickly.Indeed, in most real life databases, thousands and evenmillions of high-confidence rules are generated, among whichmany are redundant. In this paper, we are interested in themost used kind of visualization categories in data mining, i.e.,use visualization techniques to present the information catchedout from the mining process. Visualization tools became moreappealing when handling large data sets with complexrelationships, since information presented in the form of images is more direct and easily understood by humans.Visualization tools allow users to work in an interactiveenvironment with ease in understanding rules. In a basedtabular view of association rules, all strong rules arerepresented as in a tabular representation format (rule table), inwhich each entry corresponds to a rule. All rules can bedisplayed in different order, such as order by premise,conclusion, support or confidence. This helps users to have aclearer view of the rules and locate a particular rule moreeasily. IV. VISUAL DATA MININGThe rise of KDD revealed new problems as knowledgemining. These large amounts of knowledge must be exploredwith specific advanced tools. Indeed, expertise requires animportant cognitive work, a fortiori, a harmful waste of timefor industrial. Extracting nuggets is a difficult task whenrelevant information is hidden in a large amount of data. Inorder to tackle this issue, visual datamining was conceived topropose visual tools adapted to several well-known KDDtasks. These tools contribute to the effectiveness of theprocesses implemented by giving understandablerepresentations while facilitating interaction with experts.Visual data mining is present during all KDD process:upstream to apprehend the data and to carry out the firstselections, during the mining, downstream to evaluate theobtained results and to display them. Visual tools becamemajor components because of the increasing role of the expertwithin KDD process. Visual datamining integrates conceptsresulting from various domains such as visual perception,cognitive psychology, visualization metaphors, informationvisualization, etc.We focus on visualization during the post processing stageand we are interested by ARM. Independently of both contextand task, ARM has a main drawback which is the high number of generated rules. Several works on filtering rules wereproposed and a state of the art was presented in [3]. Althoughreducing the whole of generated rules significantly, thisnumber remains however important. Expert must be able toeasily interact with an environment of datamining in order tomore easily understand the displayed results. This point isessential for the global performance of the system. Visualtools for association rules were proposed to reduce thiscognitive analysis but they remain limited [3].V. VISUAL ASSOCIATION RULE MININGVarious works already exist to help expert analysis in text-mode [4]. Several works on visual rules exploration werepublished [2], [5], [6], [7]. The main beliefs of our interactiveARM are described hereafter. All these tools use severalmethods which are textual, 2D or 3D way. The choice of oneof them proves to be a difficult work. Moreover, their interpretations can vary according to the expert. Each one of these techniques presents advantages and drawbacks. It isnecessary to take them into account for the initial choice of therepresentation. The effectiveness of these approaches isdependent on the input data files. These representations areunderstandable for small quantities of data but becomecomplex when these quantities increase. Indeed, particular information can not be sufficiently perceptible in the mass.The common limitation of all the representations is that if theyare global, they quickly become unreadable (size of the objectsin 2D, occlusions in 3D) and if they are detailed, they do notprovide an overall picture on these data to the expert.VI. RELATED WORK 
 
Traditionally, many simple methods are designed to render small amount of data or statistical features of big data sets,such as histogram, pie, tree, etc. To visualize more complexdata, modern scientific visualization utilizes more advancedtechniques. Visualization techniques, such as EXVIS [8],Chernoff Faces [9], icons [10] and m-Arm Glyph [11], areoften called glyph-based methods. Glyphs are graphicalentities whose visual features, such as shape, orientation, color and size, are used to encode attributes of an underlyingdataset, and glyphs are often used for interactive explorationof data sets [12]. Glyph-based techniques range fromrepresentation via individual icons to the formation of textureand color patterns through the overlay of many thousands of glyphs [13]. Chernoff used facial characteristics to representinformation in a multivariate dataset [14]. Each dimension of the data set encodes one facial feature, such as nose, eyes,eyebrows, mouth, or jowls. Glyphmaker proposed by Foleyand Ribarsky visualize multivariate datasets in an interactivefashion [14]. Levkowitz described a prototype system for combining colored squares to produce patterns to represent anunderlying multivariate dataset [15]. In [10] an icon encodessix dimensions by six lines of different colors within a squareicon. In [13] Levkowitz describes the combination of textures
(IJCSIS) International Journal of Computer Science and Information Security,Vol. 9, No. 2, February 2011130 http://sites.google.com/site/ijcsis/ISSN 1947-5500
 
and colors in a visualization system. The m-Arm Glyph byPickett and Grinstein [11] consists of a main axis and m arms,and the length and thickness of each arm and the anglesbetween each arm and main axis are used to encode differentdimensions of a data set. [6] describes a glyph-based systemfor large high dimensional datasets. These techniques areincapable of visualizing large amount of high dimensional databecause:
 
Lack of human computer interaction.
 
Lack of integration with other data mining andknowledge discovery (KDD) tools.VII. PROPOSED WORK Nowadays, higher educational organizations are facing avery high competitive environment and are aiming to get morecompetitive advantages over the other business competitions.These organizations should improve the methodology of teaching, placement and counseling of students. They consider students and teachers as their main assets and they want toimprove their key process indicators by effective and efficientuse of their assetsStudents’ academic performance is critical for educationalinstitutions because strategic programs can be planned inimproving or maintaining students’ performance during their period of studies in the institutions. The academicperformance in this study is measured by certain attribute asindicated in Table 1. This study presents the work of datamining in predicting the final placement of students. Thisstudy applies association rule mining technique to choose thebest prediction and analysis. The list of students who arepredicted as likely to drop from the selection criterion by datamining is then turned over to teachers and management for direct or indirect intervention.For example, let us consider the transaction database of few students from Students’ repository of institute whichshows the students general and academic grades in differentcourses they enrolled for during their years of attendance inthe institution. Student performance score is basicallydetermined by the sum total of the continuous assessment andthe examination scores. In most institutions the continuousassessment which includes various assignments, class tests,group presentations is summed up to weigh 30% of the totalscore while the main semester examination is 70%. Todifferentiate different students’ performances we have selecteddifferent attributes as attendance, Mark, Activity etc. .asshown in table 1.Educational institutions with Association rule mining canpredict the student's performance more accurately, which inturn can result in quality education.
A.
 
Student Level Analysis
Successfully training the student requires analyzing thedata at the student level. Using the associated discovery datamining technique, educational institutions can more accuratelyselect the kind of training to offer to different kinds of students. With the help of this technique, educationalinstitutions can.i.
 
Segment the student database to create studentprofiles.ii.
 
Conduct analysis on a single student segment for asingle factor. For example, the institution can performin-depth analysis of the relationship betweenattendance and academic achievement.iii.
 
Analyze the student segments for multiple factorsusing group processing and multiple target variables.For example,
What are the characters shared bystudents who drop out from colleges?iv.
 
Perform sequential (over time) basket analysis onstudent segments. For example,
What percentage of high attendance holders also achieved in academicside also?
B. Developing new strategies
 Teachers can increase the placement percentage byidentifying the most lucrative student segments and organizethe training sessions accordingly. The results may be affected,if teachers do not offer the right kind of training to the rightstudent segment at the right time. With data mining operationssuch as segmentation or association analysis, institutions cannow utilize all of their available information for betterment of students.
TABLE I ATTRIBUTE LIST
ATTRNAME ATTRPossibleValues
Enrolment No. ENR Yes, NoAttendance ATTPoor, Good,Average10+2 Grade INTA, B, CArea of expertiseEXPM,C,EGender 
G
 M, FFund F P, S, FStudentDepartmentSTDME, CS, ITActivitiesperformed bythestudentACTA, B, CPercentage of practicalsessionPSAA, B, CExercise givenbyteacher ETA, B, CAverage mark of theexperiencereportER A, B, CFinal mark MARA, B, CEvaluation EVLA, B, C
(IJCSIS) International Journal of Computer Science and Information Security,Vol. 9, No. 2, February 2011131 http://sites.google.com/site/ijcsis/ISSN 1947-5500

You're Reading a Free Preview

Download
/*********** DO NOT ALTER ANYTHING BELOW THIS LINE ! ************/ var s_code=s.t();if(s_code)document.write(s_code)//-->