Professional Documents
Culture Documents
Mining association rules is one of the most important tasks in data mining. Several
approaches generalizing association rules to fuzzy association rules have been
proposed. In this paper .we present a general framework for mining fuzzy
association rule.
c
Fuzzy Logic [8] was initiated in 1965 byLotfi A. Zadeh , professor for
computerscienceat the University of California in Berkeley. Basically, Fuzzy Logic
(FL) is a multivalued logic, thatallows intermediate values to be defined between
conventional evaluations like true/false, yes/no,high/low, etc. Notions like rather tall
or very fast can be formulated mathematically and processed bycomputers, in order
to apply a more human-like way of thinking in the programming of computers
[4].Fuzzy Logic has emerged as a a profitabletool for the controlling and steering of
of systems and complex industrial processes, as well as forhousehold and
entertainment electronics, as well as for other expert systems and applications like
theclassification of SAR data.
1.1.2. c
[he very basic notion of fuzzy systems is a fuzzy (sub)set. In classical mathematics
we are familiarwith what we call crisp sets. For example, the possible interferometry
coherence g values are the setX of all real numbers between 0 and 1 . From this set
X a subset A can be defined, (e.g. all values 0 _g _ 0.2). [he characteristic function
of A, (i.e. this function assigns a number 1 or 0 to each elementin X, depending on
whether the element is in the subset A or not) is shown in Fig.1.
[he elements which have been assigned the number 1 can be interpreted as the
elements that are inthe set A and the elements which have assigned the number 0
as the elements that are not in the set
c Characteristic Function of a Crisp Set
[his concept is sufficient for many areas of applications, but it can easily be seen,
that it lacksin flexibility for some applications like classification of remotely sensed
data analysis. For exampleit is well known that water shows low interferometry
coherence g in SAR images. Since g starts at0, the lower range of this set ought to be
clear. [he upper range, on the other hand, is rather hardto define. As a first attempt,
we set the upper range to 0.2. [herefore we get B as a crisp intervalB=[0,0.2]. But
this means that a g value of 0.20 is low but a g value of 0.21 not. Obviously, this isa
structural problem, for if we moved the upper boundary of the range from g =0.20 to
an arbitrarypoint we can pose the same question.
A more natural way to construct the set B would be to relaxthe strict separation
between low and not low. [his can be done by allowing not only the (crisp)decision
Yes/No, but more flexible rules like ´ fairly low´. A fuzzy set allows us to define such
anotion.
[he aim is to use
in order to make computers more ¶intelligent¶, therefore,
the ideaabove has to be coded more formally. In the example, all the elements were
coded with 0 or 1. Astraight way to generalize this concept is to allow more values
between 0 and 1. In fact, infinitel ymany alternatives can be allowed between the
boundaries 0 and 1, namely the unit interval I = [0, 1].[he interpretation of the
numbers, now assigned to all elements is much more difficult. Of course,again the
number 1 assigned to an element means that the element is in the set B and 0 means
thatthe element is definitely not in the set B. All other values mean a gradual
membership to the set B.[his is shown in Fig. 2.
set between 5 and 8 OR about 4 is shown in the next figurethe ND A[ION of the fuzzy set A is shown below
Fuzzy logic provides framework for dealing with quantitatively , mathematically, &
logically with semantic , ambiguous, elastic & vague concept .it is well proven &well
established logic of degree .In data mining we are dealing with large amount of data
which uncertain or vague in nature .In particular medical , m icromonics ,sociological,
marketing, database the linguistic form of data representation is widely used
Fuzzy set theory provides excellent tools to model the ³fuzzy´ boundaries of linguistic
terms by introducing gradual membership. In classical set theory, an object is either
a
member of a given set or not. Fuzzy set theory makes it possible that an object or a
case belongs to a set only to a certain degree, thus modeling the uncertainty of the
linguistic term describing the property that defines the set.
Member degrees of fuzzy sets include similarity, preference, and uncertainty.
[hey can state how similar an object or case is to a ideal one, they can
indicate preferences between suboptimal solutions to a problem, or they can model
uncertainty about the real life situation if the scen ario is described in an imprecise
manner. Due to their closeness to human reasoning, a solution obtained using fuzzy
approaches is easy to understand and to apply. Fuzzy systems are therefore good
candidate to choose, if linguistic, vague, or imprecise inf ormation has to be
modeled and analyzed.
Fuzzy set theory can use in data mining system for performing rule ±based
classification ,clustering, association rule mining
[he discovery of association rules constitutes a very important task in the processof
data mining. Association rules are an important class of regularities withindata which
have been extensively studied by the data mining community. [hegeneral objective
here is to find frequent
co-occurrences of items within a set of transactions. [he found
co-occurrences are called associations. [he idea of discovering
such rules is derived from market basket analysis where the goal is to
mine patterns describing the customer's purchase behavior
(
e state the problem of mining a ssociation rules as follows:
={1 2 ... } isa set of items, ={1 2 ... } is a set of transactions, each of
which containsitems of the itemset. [hus, each transaction is a set of items such
thatl. An association rule is an implication of the form: where ±,±and
ŀ=ß . (or ) is a set of items, called itemset .
An example for a simple association rule would be { }{} . [hisrule says
that if bread was in a transaction, b utter was in most cases in thattransaction too. In
other words, people who buy bread often buy butter as well.Such a rule is based on
observations of the customer behavior and is a resultfrom the data stored in
transaction databases. Looking at an association rule of the form ,would be
called the antecedent, the consequent. It is obvious that the value of the
antecedent impliesthe value of the consequent. [he antecedent, also called the ³
´ of a rule, can consist either of a single item or of a whole set of items.
[hisapplies for the consequent, also called the ³ ´, as well. [he most
complex task of the whole association rule mining process is thegeneration of
frequent item sets. Many different combinations of items have tobe explored which
can be a very computation-intensive task, especially in largedatabases. As most of
the business databases are very large, the need for efficient
algorithms that can extract itemsets in a reasonable amount of time is high.Oft en, a
compromise has to be made between discovering all itemsets and computationtime.
enerally, only those itemsets that fulfill a certain support requirementare taken into
consideration. Support and confidence are the two most importantquality measures
for evaluating the interestingness of a rule.
ë
[he support of the rule is the percentage of transactions in that
contain ŀ. It determines how frequent the rule is applicable to thetransaction set
. [he support of a rule is represented by the formula
( öŀö /
where öŀö is the number of transactions that contain all the items of therule and
is the total number of transactions.[he support is a useful measure to determine
whether a set of items occursfrequently in a database or not. Rules covering only a
few transactions might not
be valuable to the business. [he above presented formula computes the
relativesupport value, but there also exists an absolute support. It works similarly
butsimply counts the the number of transactions where the tested itemset
occurswithout dividing it through the number of tuples.
ã [he confidence of a rule describes the percentage of
transactionscontaining which also contain .
!"=
öŀö
öö
[his is a very important measure to determine whether a rule is interesting or
not. It looks at all transactions which contain a certain item or itemset defined by
the antecedent of the rule. [hen, it computes the percentage of the transactions
also including all the items contained in the consequent.
!
[he process of mining association rules consists of two main parts. First, we
have to identify all the itemsets contained in the data that are adequate for mining
association rules. [hese combinations have to show at least a certain frequency
to be worth mining and are thus called frequent itemsets. [he second
step will generate rules out of the discovered frequent itemsets.
"c#
Mining frequent patterns from a gi ven dataset is not a trivial task. All sets
of items that occur at least as frequently as a user -specified minimum
support have to be identified at this step. An important issue is the computation
time because when it comes to large databases there might b e a
lot of possible itemsets all of which need to be evaluated. Different algorithms
attempt to allow efficient discove ry of frequent patterns.
$
%
After having generated all patterns that meet the minimum support requirements,
rules can be generated out of them. For doing so, a minimum
confidence has to be defined. [he task is to generate all possible rules in
the frequent itemsets and then compare their confidence value with the
minimum confidence (which is again defined by th e user). All rules that
meet this requirement are regarded as interesting. Frequent sets that do
not include any interesting rules do not have to be considered anymore.
All the discovered rules can in the end be presented to the user with their
support and confidence values.
&
c
È
[he following concepts are important when dealing with fuzzy sets [KlFo88]:
Ɣ ë
: [he support of a fuzzy set is given by a crisp set that contains all of the
elements whose membership degree in A is not 0:
"={ .§ / M".20} .[he empty fuzzy set has an empty support
set. here M".#!
Ɣ *!
[he height of a fuzzy set is defined by the largest membership
value attainable by an element of the set . [he fuzzy set is called normalized if at
least one of its elements attains the highest membership grade.
Ɣ ()
[he Į-cut of a fuzzy set is defined by a crisp set containing all elements
that have a membership grade to the fuzzy set that is greater than 0.
Ɣë
: [he summation of the membership grades of all elements in a
fuzzy set is called its scalar cardinality.
[7] Let = {1,2 ,«,} characterized by a seti= {i1 ,i2 , im, }of attributes. For each
attribute ij" - there is a corresponding
fuzzy set{ i1j ,i2j , , i/-}, each elementin interval[0,1] . A fuzzy association rule is of the
form ĺ3,where ± if, 3 ±, and ŀ 3= ß .ithin the rule ĺ3,
is called the antecedent while 3is called the consequent of
the rule.
Definition 1 A t-norm is a commutative, associative and
non-decreasing function, :[0,1]2 ĺ[0,1] with the following
property(.,1) = .for all x in[0,1] .
Definition 2 Let fuzzy itemset 1 = {.+,x2, ,xm } .l,
the support of X on ith record is
sup[i (x) = [(x 1i,«. xpi) when [(x 1i,«. xpi) >=a*ms
= 0 when [(x 1i,«. xpi) < a*ms
!
Ü = ó . (4)
where ±, ±, ŀ= ß .
Proposition 1 Let { } 1 2 , , , = .. .lis frequent fuzzy
itemset, for any subset of is frequent fuzzy itemset.
Proposition 2 Let { } 1 2 , , , = .. .lis fuzzy itemset,
given ' , ll
ß , minimum support , if is not
frequent, then ¶ is not frequent fuzzy itemset.
Proposition 3 Let , ll
ß,
ß, ŀ
= ß, given minimum support , minimum confidence #, if
Üis strong fuzzy association rules, for any ¶ ±,
Ü' also is strong fuzzy association rules.
Proposition 4 Let , ll
ß,
ß, ŀ
= ß, given minimum support , minimum confidence #, if
Üis strong fuzzy association rules, for any
l' ,= 'ó' , 'Ü ' also is strong fuzzy
association rules.
c å
å 41545å61'7)
Many papers have been devoted to develop algorithms tomine fuzzy association
rules. Most of them are an extension ofapriori algorithm. [he process of fuzzy
association rule miningis divided into two phases. In the first step, frequent fuzzy 1 -
item sets 41 are generated by scanning the database. By joining 41 to itself, we can
get candidate fuzzy itemsets
2. Accordingto apriori principle
2 was pruned to form
42, accordingly,generate43, ..., 4k. [he algorithm for generating fuzzy
association rules from frequent fuzzy itemsets is similar to that
for mining Boolean association rules.
!: &cc(&enerate crequent cuzzy attern)
: #!,
å
: /%%&
"
!:
!1 1 4= { frequent 1 fuzzy term sets }
!2 (*= 2O*1 4íҏ
ҏß O*}}) {
!3 *
= apriori_gen( *1 4íҏ®)
!4 each §ҏ {
!5 each candidate *#§
!6 sup = Ҟҏҗ#
!7 c.sup = c.sup } sup
!8 }
!9 *4= ҏ| .sup ҏ *#§
#ҏ
!10 }
!11 4= **ó4
In the course of generating 41, there are two methods to
select the fuzzy itemsets:
(1). According to the principle of maximum membership value, only take the
maximum degree of membership of fuzzyitem [4] into 41. [his method reduces the
computational work,but it will lose some useful information.(2). [he fuzzy item are
incorporated into 41 when thesupport of fuzzy item larger than or equal given . In
this case,the follow-up calculation is more than the first methods, butcan effectively
reduce the amount of lost information .
As same above algorithm by taking Approach of Apriori algorithm we can take
approach of FP-tree algorithm to generate fuzzy association rule
Fuzzy Association rule is used in various application such as Face Recognition [ 9],
intrusion detection system using FAR[10],mining generalized FAR from weblogs [ 11 ]
,Mining FAR from questinaory data [ 12 ]
there are many methods are present
ã
Fuzzy association rules use fuzzy logic toconvert numerical attributes to fuzzy
attributes, like³Income = High´, thus maintaining the integrity ofinformation conveyed
by such numerical attributes. On theother han d, crisp association rules use sharp
partitioning totransform numerical attributes to binary ones like ³Income= [100K and
above]´, and can potentially introduce loss ofinformation due to these sharp ranges.
Fuzzy Apriori and itsdifferent variations are the only popular fuzzy associationrule
mining (ARM) algorithms available today. Like thecrisp version of Apriori, is a very
slow andinefficient algorithm for very large datasets (in the order ofmillions of
transactions).
So I proposed one solu tion on above problem that *c
'
Because fuzzy association rule mining gives better performance than classic
Association rule mining
È :
[1]. ³ Fuzzy Association Rule Mining Algorithm for Fast and Dfficient
performance on Very Large Datasets ³AshishMangalampalli*, VikramPudi
Centre for Data Dngineering (CDD), International Institute of Information
[echnology (III[), Hyderabad - 500032, India. FUZZY-IDDD 2009
[2] ³A SURVDY OF ASSOCIA[ION RULDS´
Margaret H. Dunham, Yongqiao Xiao Le ruenwald, ZahidHossainDepartment
of Computer Science and Dngineering Department of Computer Science
Southern Methodist University University of Oklahom IDDD
[3] ³A [wo-Phase Fuzzy Mining Approach ³Chun-ei Lin, [zung-Pei Hong and
en-Hsiang Lu IDDD-2010
[4] ³Fuzzy Association Rules: eneral Modeland Applications ³
Miguel Delgado, NicolásMar n, Daniel Sánchez, and Mar a -Amparo Vila IDDD
[RANSAC[IONS ON FUZZY SYS[DMS, VOL. 11, NO. 2, APRIL 2003
[5] ³Fuzzy Association Rules An Implementation in R ³Master [hesis Vienna
AuthorBakk. Lukas HelmMatriculation Number: 0251677 Vienna, 2.8.2007
Vienna University of Dconomics and Business Administration
[6] ³A S[UDY ON DFFDC[IVD MININ OFASSOCIA[ION RULDS FROM HU D
DA[ABASDS´ V.Umaraniet. al. / IJCSR International Journal of Computer
Science and Research, Vol. 1 Issue 1, 2010
[7] ³A eneral Framework for Fuzzy Data Mining ³ Jitao Zhao, Lin YaoDepartment
of Dducation [echnology and InformationXuchang Uni versity, Xuchang, China
2010 IDDD
[8] ³Applications of Fuzzy Logic in Data Mining Process -³Hang Chen
[9] ³Face Recognition using PCA with NP-fuzzy data mining´SICD Annua
Conference 2010August 18 -21, 2010, [he rand Hotel, [aipei, [aiwan
[10]8!#!!%%&!#! 9kaiXing u student
Department of Information&DlctronicDngineeringHandan ,Hebei Province,
China 2010 International Symposium on Intelligence Information Processing
and [rusted Computing
[11] ³ Mining eneralized Fuzzy Association Rules fromeb Logs ´ Rui u
School of Mathematics and Computer ScienceShanxi Normal University,
Shanxi 041004, China 2010 Seventh International Conference on Fuzzy
Systems and Knowledge Discovery (FSKD 2010)
[12] ³Mining fuzzy association rules from questionnaire data ´ Yen-Liang Chen *,
Cheng-HsiungengKnowledge-Based Systems 22 (2009) 46±
* +
!
!'
+
[he goal of text categorization is the classification of documents into a fixed
number of predefined categories. Dach document can be in multiple, exactly one,
or no category at all. Using machine learning, the objec tive is to learn classifiers
from examples which perform the category assignments automatically. [his is
a supervised learning problem. Since categories may overlap, each category is
treated as a separate binary classifi cation problem.
8
&*!3!! $!#
"-
-§ .
:; !!!#
<#+=:##>!?
#!9
[here are four main steps in [ext Categorization: [ An Dxperiment System for [ext
Classification
Mathias Niepert B652: Computer Models of Symbolic Learning Final rite -Up,
Spring 2005]
1) Loading,
2) Indexing,
3) Classification,
+:4!
!
[he first step towards the final classification and evaluationresults, is to load the
training corpus, that is, toread all the corpus documents, and count up the termand
document frequencies for every term. [he term anddocument frequencies are saved
for every document aswell as for every category and the entire corpus. [hisis
necessary since these values will be of need in lat ercalculations. [he term frequency
of a term in a document/category/corpus is the number of times the termoccurs in
the document/category/corpus. [he documentfrequency of a term in a
category/corpus is the number
of documents which both, contain the term a t least onceand belong to the
category/corpus, whereas the documentfrequency of a term in a document is a
binary value whichindicates if the term occurs in the document or not. For theactual
implementation in Java, I used one class for terms,documents, cat egories, and
corpora, respectively.
+:+ !$!!;!
In most of the applications, it is practical to remove wordswhich appear too often (in
every or almost every document)and thus support no information for the task.
oodexamples for this kind of words are prepositions, articlesand verbs like ´be´ and
´go´. If the box ´Apply stop wordremoval´ is checked, all the words in the file ´swl.txt´
areconsidered as stop words and will not be loaded. [his filecontains currently the
100 most used words in the Dnglishlanguage [National Literacy [rust, 2005] which
on averageaccount for a half of all reading in Dnglish. If the box´Apply stop word
removal´ is unchecked, the stop wordremoval algorithm will be disabled when the
corpus is
loaded.[he system only considers words which have lengthgreater than one as valid
tokens. [his is due to the fact thatall words with length one seem to be unimportant
and wouldhave been sorted out by the later dimensionality reductionstep anyway.
However, it is possible to change tha t easily in
the code.
Stemming is the process of grouping words that share thesame morphological root.
D.g. ´game´ and ´games´ arestemmed to the root ´game´. [he suitability of
stemmingto [ext Classification is controversial. In some examinatio ns,Stemming has
been reported to hurt accuracy.However, the recent tendency is to apply it, since
itreduces both the dimensionality of the term space andthe stochastic dependence
between terms.
,:.
[he step of ³re-representation´ of document is nothing but indexingi.e process of
represent a document into a set of keywords as structured data .in this stage we can
used fuzzy set [ 5]
B:
#!
Classification is the process which assigns one or more labels ± or no label at all ± to
a new (unseen) document. [here are many machine learning algorithms which have
been applied to the problem of text categorization, ranging from statistical methods
(e.g. Na¨ıve Bayes) to black-box algorithms
So in this we are going to apply support vector machine
.
ë ")ë
" !
[he dashed lines dra wn parallel to the separating line mark the distance between the
dividing line and the closest vectors to the line. [he distance between the dashed
lines is called the . [he vectors (points) that constrain the width of the margin
are the !$#!. [he following figure illustrates this
An SVM analysis finds the line (or, in general, hyperplane) that is oriented so that the
margin between the support vectors is maximized. In the figure above, the line in the
right panel is superior to the line in the left panel.
If all analyses consisted of two-category target variables with two predictor variables,
and the cluster of points could be divided by a straight line, life would be easy.
Unfortunately, this is not generally the case, so SVM must deal w ith (a) more than
two predictor variables, (b) separating the points with non -linear curves, (c) handling
the cases where clusters cannot be completely separated, and (d) handling
classifications with more than two categories.
In the previous example, we h ad only two predictor variables, and we were able to
plot the points on a 2 -dimensional plane. If we add a third predictor variable, then we
can use its value for a third dimension and plot the points in a 3 -dimensional cube.
Points on a 2-dimensional plane can be separated by a 1 -dimensional line. Similarly,
points in a 3-dimensional cube can be separated by a 2 -dimensional plane.
As we add additional predictor variables (attributes), the data points can be
represented in -dimensional space, and a (N -1)-dimensional hyperplane can
separate them.
(
hen Straight Lines o Crooked .[he simplest way to divide two groups is with a
straight line, flat plane or an N -dimensional hyperplane. But what if the points are
separated by a nonlinear region such as shown below?
In this case we need a nonlinear dividing line.Rather than fitting nonlinear curves to
the data, SVM handles this by using a *#! to map the data into a different
space where a hyperplane can be used to do the separation.[he kernel function may
transform the data into a higher dimensional space to make it possible to perform the
separation.
[he concept of a kernel mapping function is very powerful. It allows SVM models to
perform separations even with very complex boundaries such as shown below.
Many kernel mapping functions can be used ± probably an infinite number. But a few
kernel functions have been found to work well in for a wide variety of applications.
[he default and recommended kernel function is the Radial Basis Function (RBF).
þ u¶*v
%
exp(-gamma*|u-v|^2)
ë.),
,/0 tanh(gamma*u¶*v } coef0)
Ideally an SVM analysis should produce a hyperplane that completely separates the
feature vectors into two non -overlapping groups. However, perfect separation may
not be possible, or it may result in a model with so many feature vector dimensions
that the model does not generalize well to other data; this is known as !$
[o allow some flexibility in separating the categories, SVM models have a cost
parameter,
, that controls the trade off between allowing training errors and forcing
rigid margins. It creates a ! that permits some misclassifications.
Increasing the value of
increases the cost of misclassifying points and forces the
creation of a more accurate model that may not generalize well.
cå
[he accuracy of an SVM model is largely dependent on the selection of the model
parameters. a ! and a
rid searches are computationally expensive because the model must be evaluated
at many points within the grid for each parameter. For example, if a grid search is
used with 10 search intervals and an RBF kernel function is used with two
parameters (C and amma), then the model must be evaluated at 10*10 = 100 grid
points. An Dpsilon-SVR analysis has three parameters (C, amma and P) so a grid
search with 10 intervals would require 10*10*10 = 1000 model evaluations. If cross -
validation is used for each model evaluation, the number of actual SVM calculations
would be further multiplied by the number of c ross-validation folds (typically 4 to 10).
For large models, this approach may be computationally infeasible.
A pattern search generally requires far fewer evaluations of the model than a grid
search. Beginning at the geometric center of the search range, a pattern search
makes trial steps with positive and negative step values for each parameter. If a step
is found that improves the model, the center of the search is moved to that point. If
no step improves the model, the step size is reduced and the proce ss is repeated.
[he search terminates when the step size is reduced to a specified tolerance. [he
weakness of a pattern search is that it may find a local rather than global optimal
point for the parameters.
In this case the grid search is performed first . Once the grid search finishes, a
pattern search is performed over a narrow search range surrounding the best point
found by the grid search. Hopefully, the grid search will find a region near the global
optimum point and the pattern search will then find the global optimum by starting in
the right region.
ã
1
!" ! ,ã
[he idea of using a hyperplane to separate the feature vectors into two groups works
well when there are only two target categories, but how does SVM handle t he case
where the target variable has more than two categories? Several approaches have
been suggested, but two are the most popular: (1) ³one against many´ where each
category is split out and all of the other categories are merged; and, (2) ³one against
one´ where *(*-1)/2 models are constructed where k is the number of categories
å
c
1
!
åc
[he accuracy of an SVM model is largely dependent on the selection of the kernel
parameters such as C, amma, P, etc. A grid search tries values of each parameter
across the specified search range using geometric steps. A pattern search (also
known as a ³compass search´ or a ³line search´) starts at the center of the search
à tài l t i iàti f à à tà. If t fit f t
l ià
, t à tà
t t it t à i
à t. If ià
t i f, t t i i à t à i tài
i. T ttà à t t à t i i à t ifi
tlà .
i iti liit
iààl
t f tà
t
tà à à
t tt l ifi ti àl à li à àl
Tt tài ti f tà it lti i t iàti ti f t
SV làit fi l àlt
[1] ³[he Algorithm of [ext Classification Based on Rough Set and Support Vector
Machine´ ang Zhuocollege of business administration Liaoning [echnical
University978-1-4244-5824-0/$26.00 ·c 2010 IDDD
[2] ³[ext categorization with SVM: Learning with many relevant features´ [horsten
Joachin IDDD-1997
[3] ³An Dxperiment System for [ext ClassificationMathias NiepertB652: Computer
Models of Symbolic Learnin gFinal rite-Up, Spring 2005
[4] ³Application for eb [ext Categorization Based on Support Vector Machine
Pan Hao, Duan Ying, [an LongyuanSchool of Computer Science and
[echnology, uhan University of [echnology (UH[), uhan430070, China
2009 Internation al Forum on Computer Science-[echnology and Applications
[5] ³textcategarization with concept of fuzzy set for informative keyword´
[aecho C. joS/w business team Samsung SDS IDDD -2010