Fuzzy Data Mining

c

presently there is one issue related to large database i.e now a days database is
available in large size ,for association it require whole data at one time which cannot
fit in to main memory ,so presently algorithm like Apriori, is a very slow and
inefficient algorithm for very large datasets (in the order of millions of transactions).
Some algorithm have been to solve this memory problem by presenting various
solution like Partitioning, sampling approach is used [2 ],or two phase mining
approch[3 ],or A novel combination of features like two -phased multiple
partitiontidlist-style processing, byte-vector representationof tidlists, [1]

Mining association rules is one of the most important tasks in data mining. Several
approaches generalizing association rules to fuzzy association rules have been
proposed. In this paper .we present a general framework for mining fuzzy
association rule.
c
Fuzzy Logic [8] was initiated in 1965 byLotfi A. Zadeh , professor for
computerscienceat the University of California in Berkeley. Basically, Fuzzy Logic
(FL) is a multivalued logic, thatallows intermediate values to be defined between
conventional evaluations like true/false, yes/no,high/low, etc. Notions like rather tall
or very fast can be formulated mathematically and processed bycomputers, in order
to apply a more human-like way of thinking in the programming of computers
[4].Fuzzy Logic has emerged as a a profitabletool for the controlling and steering of
of systems and complex industrial processes, as well as forhousehold and
entertainment electronics, as well as for other expert systems and applications like
theclassification of SAR data.
1.1.2. c

[he very basic notion of fuzzy systems is a fuzzy (sub)set. In classical mathematics
we are familiarwith what we call crisp sets. For example, the possible interferometry
coherence g values are the setX of all real numbers between 0 and 1 . From this set
X a subset A can be defined, (e.g. all values 0 _g _ 0.2). [he characteristic function
of A, (i.e. this function assigns a number 1 or 0 to each elementin X, depending on
whether the element is in the subset A or not) is shown in Fig.1.
[he elements which have been assigned the number 1 can be interpreted as the
elements that are inthe set A and the elements which have assigned the number 0
as the elements that are not in the set
c Characteristic Function of a Crisp Set
[his concept is sufficient for many areas of applications, but it can easily be seen,
that it lacksin flexibility for some applications like classification of remotely sensed
data analysis. For exampleit is well known that water shows low interferometry
coherence g in SAR images. Since g starts at0, the lower range of this set ought to be
clear. [he upper range, on the other hand, is rather hardto define. As a first attempt,
we set the upper range to 0.2. [herefore we get B as a crisp intervalB=[0,0.2]. But
this means that a g value of 0.20 is low but a g value of 0.21 not. Obviously, this isa
structural problem, for if we moved the upper boundary of the range from g =0.20 to
an arbitrarypoint we can pose the same question.
A more natural way to construct the set B would be to relaxthe strict separation
between low and not low. [his can be done by allowing not only the (crisp)decision
Yes/No, but more flexible rules like ´ fairly low´. A fuzzy set allows us to define such
anotion.
[he aim is to use
in order to make computers more ¶intelligent¶, therefore,
the ideaabove has to be coded more formally. In the example, all the elements were
coded with 0 or 1. Astraight way to generalize this concept is to allow more values
between 0 and 1. In fact, infinitel ymany alternatives can be allowed between the
boundaries 0 and 1, namely the unit interval I = [0, 1].[he interpretation of the
numbers, now assigned to all elements is much more difficult. Of course,again the
number 1 assigned to an element means that the element is in the set B and 0 means
thatthe element is definitely not in the set B. All other values mean a gradual
membership to the set B.[his is shown in Fig. 2.
c Characteristic Function of a Fuzzy Set

[he membership function is a graphical representation of the magnitude
ofparticipation of each input. It associates a weighting with each of the inputs that are
processed, definefunctional overlap between inputs, and ultimately determines an
output response. [he rules use theinput mem bership values as weighting factors to
determine their influence on the fuzzy output sets ofthe final output conclusion.[he
membership function, operating in this case on the fuzzy set of interferometric
coherence g,returns a value between 0.0 and 1.0. For example, an interferometric
coherence g of 0.3 has a membershipof 0.5 to the set low coherence (see Fig. 2). It
is important to point out the distinction betweenfuzzy logic and probability. Both
operate over the same numeric range, and have similar values : 0.0representing
False (or non-membership), and 1.0 representing [rue (or full -membership).
However,there is a distinction to be made between the two statements: [he
probabilistic approach yields thenatural -language statement, ´[here is an 50%
chance that g is low,´ while the fuzzy terminology correspondsto ´g¶s degree of
membership within the set of low interferometric coherence is 0.50.´
[hesemantic difference is significant: the first view supposes that g is or is not low; it
is just that we onlyhave an 50% chance of knowing which set it is in. By contrast,
fuzzy terminology supposes that g is´more or less´ low, or in some other term
corresponding to the value of 0.50.

å ååc
e can introduce basic operations on fuzzy sets. Similar to the operations on crisp sets we also wantto intersect, unify
and negate fuzzy sets. In his very first paper about fuzzy sets [1], L. A. Zadehsuggested the minimum operator for the
intersection and the maximum operator for the union of twofuzzy sets. It can be shown that these operators coincide
with the crisp unification, and intersectionif we only consider the membership degrees 0 and 1. For example, if A is a
fuzzy interval between 5and 8 and B be a fuzzy number about 4 as shown in the Figure below
c Dxample fuzzy sets

In this case, the fuzzy set between 5 and 8 AND about 4 is
c Dxample: Fuzzy AND
set between 5 and 8 OR about 4 is shown in the next figurethe ND A[ION of the fuzzy set A is shown below

c Dxample: Fuzzy OR
c Dxample: Fuzzy ND A[IONFuzzy

Fuzzy logic provides framework for dealing with quantitatively , mathematically, &
logically with semantic , ambiguous, elastic & vague concept .it is well proven &well
established logic of degree .In data mining we are dealing with large amount of data
which uncertain or vague in nature .In particular medical , m icromonics ,sociological,
marketing, database the linguistic form of data representation is widely used
Fuzzy set theory provides excellent tools to model the ³fuzzy´ boundaries of linguistic
terms by introducing gradual membership. In classical set theory, an object is either
a
member of a given set or not. Fuzzy set theory makes it possible that an object or a
case belongs to a set only to a certain degree, thus modeling the uncertainty of the
linguistic term describing the property that defines the set.
Member degrees of fuzzy sets include similarity, preference, and uncertainty.
[hey can state how similar an object or case is to a ideal one, they can
indicate preferences between suboptimal solutions to a problem, or they can model
uncertainty about the real life situation if the scen ario is described in an imprecise
manner. Due to their closeness to human reasoning, a solution obtained using fuzzy
approaches is easy to understand and to apply. Fuzzy systems are therefore good
candidate to choose, if linguistic, vague, or imprecise inf ormation has to be
modeled and analyzed.
Fuzzy set theory can use in data mining system for performing rule ±based
classification ,clustering, association rule mining

[he discovery of association rules constitutes a very important task in the processof
data mining. Association rules are an important class of regularities withindata which
have been extensively studied by the data mining community. [hegeneral objective
here is to find frequent
co-occurrences of items within a set of transactions. [he found
co-occurrences are called associations. [he idea of discovering
such rules is derived from market basket analysis where the goal is to
mine patterns describing the customer's purchase behavior
(
e state the problem of mining a ssociation rules as follows:
={1 2 ... } isa set of items, ={1 2 ... } is a set of transactions, each of
which containsitems of the itemset. [hus, each transaction is a set of items such
thatl. An association rule is an implication of the form: where ±,±and
ŀ=ß . (or ) is a set of items, called itemset .
An example for a simple association rule would be { }{} . [hisrule says
that if bread was in a transaction, b utter was in most cases in thattransaction too. In
other words, people who buy bread often buy butter as well.Such a rule is based on
observations of the customer behavior and is a resultfrom the data stored in
transaction databases. Looking at an association rule of the form ,would be
called the antecedent, the consequent. It is obvious that the value of the
antecedent impliesthe value of the consequent. [he antecedent, also called the ³
´ of a rule, can consist either of a single item or of a whole set of items.
[hisapplies for the consequent, also called the ³ ´, as well. [he most
complex task of the whole association rule mining process is thegeneration of
frequent item sets. Many different combinations of items have tobe explored which
can be a very computation-intensive task, especially in largedatabases. As most of
the business databases are very large, the need for efficient
algorithms that can extract itemsets in a reasonable amount of time is high.Oft en, a
compromise has to be made between discovering all itemsets and computationtime.
enerally, only those itemsets that fulfill a certain support requirementare taken into
consideration. Support and confidence are the two most importantquality measures
for evaluating the interestingness of a rule.
ë
[he support of the rule is the percentage of transactions in that
contain ŀ. It determines how frequent the rule is applicable to thetransaction set
. [he support of a rule is represented by the formula
( öŀö /
where öŀö is the number of transactions that contain all the items of therule and
is the total number of transactions.[he support is a useful measure to determine
whether a set of items occursfrequently in a database or not. Rules covering only a
few transactions might not
be valuable to the business. [he above presented formula computes the
relativesupport value, but there also exists an absolute support. It works similarly
butsimply counts the the number of transactions where the tested itemset
occurswithout dividing it through the number of tuples.
ã [he confidence of a rule describes the percentage of
transactionscontaining which also contain .
!"=
öŀö
öö
[his is a very important measure to determine whether a rule is interesting or
not. It looks at all transactions which contain a certain item or itemset defined by
the antecedent of the rule. [hen, it computes the percentage of the transactions
also including all the items contained in the consequent.
!
[he process of mining association rules consists of two main parts. First, we
have to identify all the itemsets contained in the data that are adequate for mining
association rules. [hese combinations have to show at least a certain frequency
to be worth mining and are thus called frequent itemsets. [he second
step will generate rules out of the discovered frequent itemsets.
"c#

Mining frequent patterns from a gi ven dataset is not a trivial task. All sets
of items that occur at least as frequently as a user -specified minimum
support have to be identified at this step. An important issue is the computation
time because when it comes to large databases there might b e a
lot of possible itemsets all of which need to be evaluated. Different algorithms
attempt to allow efficient discove ry of frequent patterns.
$
%
After having generated all patterns that meet the minimum support requirements,
rules can be generated out of them. For doing so, a minimum
confidence has to be defined. [he task is to generate all possible rules in
the frequent itemsets and then compare their confidence value with the
minimum confidence (which is again defined by th e user). All rules that
meet this requirement are regarded as interesting. Frequent sets that do
not include any interesting rules do not have to be considered anymore.
All the discovered rules can in the end be presented to the user with their
support and confidence values.

& c

È

È

#!$c%%&'()
[he traditional way to discover the fuzzy sets needed for a certain data set is to
consult a domain expert who will define the sets and their membership functions.
[his requires access to domain knowledge which can be difficult or expensive to
acquire. In order to make an automatic discovery of fuzzy sets possible, an approach
has been developed which generates fuzzy sets automatically by clustering
[FSY98]. [his method can be used to divide quantitative attributes into fuzzy sets,
which deals with the pro blem that it is not always easy do define the sets a priori.
[he proposed method uses a known clustering algorithm to find the medoids of *
clusters. [he whole process of automatically discovering fuzzy sets can be
subdivided into four steps:
Ɣ [ransform the database to make clustering possible (the value of all the attributes
has to be positive integer).
Ɣ Find the *medoids of the transformed database using a clustering
method.
Ɣ For each quantitative attribute, fuzzy sets are constructed using the
medoids.
Ɣ enerate the associated membership functions.
In [FSY98], the CLARANS algorithm is proposed to conduct the clustering. After
discovering *medoids, we can compute *fuzzy sets out of them. e define { 1
2... *} as the *medoids from a database. [he -thmedoid can be defined as
={+,... } . If we want to discover the fuzzy sets for the --th attribute,
ranging from -to .-, our mid-points will be {+,... } . [he fuzzy sets
will then show the following ranges:
{-í2 -}{1 -í3 -}{ í1 -í 1 -}... { *í1 -í.-} .
Finally, the membership functions for the fuzzy sets have to be computed. e can
get our membership function looking at the definition of the sets above.
For the fuzzy set with mid-point *-, the membership function looks as follows: If
. *í1 -, the membership of .is 0. Also for . * 1 -, .=0 because in both
cases, the value lies outside the range of the fuzzy set. If .takes exactly the value of
the mid-point *-, the
membership is 1. For all other cases, we have to use a formula in order to compute
the specific membership:
.={.í *í1 -
*-í *í1 -
*í1 - . *-
.í * 1 -
*-í * 1 -
*- . * 1 -
A distinction between two types of fuzzy sets has been introduced in
[XieD05]. [hese two types are called equal space fuzzy sets (Figure 15) and equal
data points fuzzy sets (Figure 16). Dqual space fuzzy sets are symmetrical and all
occupy the same range in the universal set. In contrary, equal data points fuzzy sets
cover a certain number of instances and thus are not symmetrical.
c/! c%%&!!! 0&
1

'[he occurrence frequency of an item set is number of transaction that c ontain the
item set this also known of as simply frequency ,support . '
[he following concepts are important when dealing with fuzzy sets [KlFo88]:
Ɣ ë
: [he support of a fuzzy set is given by a crisp set that contains all of the
elements whose membership degree in A is not 0:
"={ .§ / M".20} .[he empty fuzzy set has an empty support
set. here M".#!
Ɣ *!
[he height of a fuzzy set is defined by the largest membership
value attainable by an element of the set . [he fuzzy set is called normalized if at
least one of its elements attains the highest membership grade.
Ɣ ()
[he Į-cut of a fuzzy set is defined by a crisp set containing all elements
that have a membership grade to the fuzzy set that is greater than 0.
Ɣë
: [he summation of the membership grades of all elements in a
fuzzy set is called its scalar cardinality.

[7] Let = {1,2 ,«,} characterized by a seti= {i1 ,i2 , im, }of attributes. For each
attribute ij" - there is a corresponding
fuzzy set{ i1j ,i2j , , i/-}, each elementin interval[0,1] . A fuzzy association rule is of the
form ĺ3,where ± if, 3 ±, and ŀ 3= ß .ithin the rule ĺ3,
is called the antecedent while 3is called the consequent of
the rule.
Definition 1 A t-norm is a commutative, associative and
non-decreasing function, :[0,1]2 ĺ[0,1] with the following
property(.,1) = .for all x in[0,1] .
Definition 2 Let fuzzy itemset 1 = {.+,x2, ,xm } .l,
the support of X on ith record is
sup[i (x) = [(x 1i,«. xpi) when [(x 1i,«. xpi) >=a*ms
= 0 when [(x 1i,«. xpi) < a*ms
wherex-is the membership value of item setxj on ith record,

[0,1] X-§®=1,2,..,®-= 1,2,«
. is one t-normoperators, is one constant called adjusting factor,
represents minimum threshold. Formula 1 is the form of
support degree when equal to Xin paper [3] .
Definition 3 Let fuzzy itemset 1 2 { , , , } = .. .l,

the support degree of X in is
[he fuzzy itemsetis called frequent fuzzy itemset, when

support of is grater than or equal to given .
Definition 4 the support degree of fuzzy association rules is
defined as
(Ü ) = ( ó) .
where±, ±, ŀ= ß , ó.
Definition 5 the confidence of fuzzy association rules is
defined as
()()
()
!
Ü = ó . (4)
where ±, ±, ŀ= ß .
Proposition 1 Let { } 1 2 , , , = .. .lis frequent fuzzy
itemset, for any subset of is frequent fuzzy itemset.
Proposition 2 Let { } 1 2 , , , = .. .lis fuzzy itemset,
given ' , ll
ß , minimum support , if is not
frequent, then ¶ is not frequent fuzzy itemset.
Proposition 3 Let , ll
ß,
ß, ŀ
= ß, given minimum support , minimum confidence #, if
Üis strong fuzzy association rules, for any ¶ ±,
Ü' also is strong fuzzy association rules.
Proposition 4 Let , ll
ß,
ß, ŀ
= ß, given minimum support , minimum confidence #, if
Üis strong fuzzy association rules, for any
l' ,= 'ó' , 'Ü ' also is strong fuzzy
association rules.
c å
å 41545å61'7)

Many papers have been devoted to develop algorithms tomine fuzzy association
rules. Most of them are an extension ofapriori algorithm. [he process of fuzzy
association rule miningis divided into two phases. In the first step, frequent fuzzy 1 -
item sets 41 are generated by scanning the database. By joining 41 to itself, we can
get candidate fuzzy itemsets
2. Accordingto apriori principle
2 was pruned to form
42, accordingly,generate43, ..., 4k. [he algorithm for generating fuzzy
association rules from frequent fuzzy itemsets is similar to that
for mining Boolean association rules.

!: &cc(&enerate crequent cuzzy attern)

: #!,
å

: /%%&
"
!:
!1 1 4= { frequent 1 fuzzy term sets }
!2 (*= 2O*1 4íҏ
ҏß O*}}) {
!3 *
= apriori_gen( *1 4íҏ®)
!4 each §ҏ {
!5 each candidate *#§

!6 sup = Ҟҏҗ#
!7 c.sup = c.sup } sup
!8 }
!9 *4= ҏ| .sup ҏ *#§
#ҏ
!10 }
!11 4= **ó4
In the course of generating 41, there are two methods to
select the fuzzy itemsets:
(1). According to the principle of maximum membership value, only take the
maximum degree of membership of fuzzyitem [4] into 41. [his method reduces the
computational work,but it will lose some useful information.(2). [he fuzzy item are
incorporated into 41 when thesupport of fuzzy item larger than or equal given . In
this case,the follow-up calculation is more than the first methods, butcan effectively
reduce the amount of lost information .

As same above algorithm by taking Approach of Apriori algorithm we can take
approach of FP-tree algorithm to generate fuzzy association rule
Fuzzy Association rule is used in various application such as Face Recognition [ 9],
intrusion detection system using FAR[10],mining generalized FAR from weblogs [ 11 ]
,Mining FAR from questinaory data [ 12 ]
there are many methods are present

ã

Fuzzy association rules use fuzzy logic toconvert numerical attributes to fuzzy
attributes, like³Income = High´, thus maintaining the integrity ofinformation conveyed
by such numerical attributes. On theother han d, crisp association rules use sharp
partitioning totransform numerical attributes to binary ones like ³Income= [100K and
above]´, and can potentially introduce loss ofinformation due to these sharp ranges.
Fuzzy Apriori and itsdifferent variations are the only popular fuzzy associationrule
mining (ARM) algorithms available today. Like thecrisp version of Apriori, is a very
slow andinefficient algorithm for very large datasets (in the order ofmillions of
transactions).
So I proposed one solu tion on above problem that *c

'
Because fuzzy association rule mining gives better performance than classic
Association rule mining
È :
[1]. ³ Fuzzy Association Rule Mining Algorithm for Fast and Dfficient
performance on Very Large Datasets ³AshishMangalampalli*, VikramPudi
Centre for Data Dngineering (CDD), International Institute of Information
[echnology (III[), Hyderabad - 500032, India. FUZZY-IDDD 2009
[2] ³A SURVDY OF ASSOCIA[ION RULDS´
Margaret H. Dunham, Yongqiao Xiao Le ruenwald, ZahidHossainDepartment
of Computer Science and Dngineering Department of Computer Science
Southern Methodist University University of Oklahom IDDD
[3] ³A [wo-Phase Fuzzy Mining Approach ³Chun-ei Lin, [zung-Pei Hong and
en-Hsiang Lu IDDD-2010
[4] ³Fuzzy Association Rules: eneral Modeland Applications ³
Miguel Delgado, NicolásMar n, Daniel Sánchez, and Mar a -Amparo Vila IDDD
[RANSAC[IONS ON FUZZY SYS[DMS, VOL. 11, NO. 2, APRIL 2003
[5] ³Fuzzy Association Rules An Implementation in R ³Master [hesis Vienna
AuthorBakk. Lukas HelmMatriculation Number: 0251677 Vienna, 2.8.2007
Vienna University of Dconomics and Business Administration
[6] ³A S[UDY ON DFFDC[IVD MININ OFASSOCIA[ION RULDS FROM HU D
DA[ABASDS´ V.Umaraniet. al. / IJCSR International Journal of Computer
Science and Research, Vol. 1 Issue 1, 2010
[7] ³A eneral Framework for Fuzzy Data Mining ³ Jitao Zhao, Lin YaoDepartment
of Dducation [echnology and InformationXuchang Uni versity, Xuchang, China
2010 IDDD
[8] ³Applications of Fuzzy Logic in Data Mining Process -³Hang Chen
[9] ³Face Recognition using PCA with NP-fuzzy data mining´SICD Annua
Conference 2010August 18 -21, 2010, [he rand Hotel, [aipei, [aiwan
[10]8!#!!%%&!#! 9kaiXing u student
Department of Information&DlctronicDngineeringHandan ,Hebei Province,
China 2010 International Symposium on Intelligence Information Processing
and [rusted Computing
[11] ³ Mining eneralized Fuzzy Association Rules fromeb Logs ´ Rui u
School of Mathematics and Computer ScienceShanxi Normal University,
Shanxi 041004, China 2010 Seventh International Conference on Fuzzy
Systems and Knowledge Discovery (FSKD 2010)
[12] ³Mining fuzzy association rules from questionnaire data ´ Yen-Liang Chen *,
Cheng-HsiungengKnowledge-Based Systems 22 (2009) 46±

* +

!
!'

+

[he goal of text categorization is the classification of documents into a fixed
number of predefined categories. Dach document can be in multiple, exactly one,
or no category at all. Using machine learning, the objec tive is to learn classifiers
from examples which perform the category assignments automatically. [his is
a supervised learning problem. Since categories may overlap, each category is
treated as a separate binary classifi cation problem.

8
&*!3!! $!#
"-
-§ .
:; !!!#
<#+=:##>!?
#!9
[here are four main steps in [ext Categorization: [ An Dxperiment System for [ext
Classification
Mathias Niepert B652: Computer Models of Symbolic Learning Final rite -Up,
Spring 2005]
1) Loading,
2) Indexing,
3) Classification,

+:4!
!

[he first step towards the final classification and evaluationresults, is to load the
training corpus, that is, toread all the corpus documents, and count up the termand
document frequencies for every term. [he term anddocument frequencies are saved
for every document aswell as for every category and the entire corpus. [hisis
necessary since these values will be of need in lat ercalculations. [he term frequency
of a term in a document/category/corpus is the number of times the termoccurs in
the document/category/corpus. [he documentfrequency of a term in a
category/corpus is the number
of documents which both, contain the term a t least onceand belong to the
category/corpus, whereas the documentfrequency of a term in a document is a
binary value whichindicates if the term occurs in the document or not. For theactual
implementation in Java, I used one class for terms,documents, cat egories, and
corpora, respectively.
+:+ !$!!;!

In most of the applications, it is practical to remove wordswhich appear too often (in
every or almost every document)and thus support no information for the task.
oodexamples for this kind of words are prepositions, articlesand verbs like ´be´ and
´go´. If the box ´Apply stop wordremoval´ is checked, all the words in the file ´swl.txt´
areconsidered as stop words and will not be loaded. [his filecontains currently the
100 most used words in the Dnglishlanguage [National Literacy [rust, 2005] which
on averageaccount for a half of all reading in Dnglish. If the box´Apply stop word
removal´ is unchecked, the stop wordremoval algorithm will be disabled when the
corpus is
loaded.[he system only considers words which have lengthgreater than one as valid
tokens. [his is due to the fact thatall words with length one seem to be unimportant
and wouldhave been sorted out by the later dimensionality reductionstep anyway.
However, it is possible to change tha t easily in
the code.

Stemming is the process of grouping words that share thesame morphological root.
D.g. ´game´ and ´games´ arestemmed to the root ´game´. [he suitability of
stemmingto [ext Classification is controversial. In some examinatio ns,Stemming has
been reported to hurt accuracy.However, the recent tendency is to apply it, since
itreduces both the dimensionality of the term space andthe stochastic dependence
between terms.
@A !&#!

For many machine learning algorithms it is necessaryto reduce the dimensionality of
the feature space, if theoriginal dimensionality of the space is very high. In most
ofthe cases this improves not only the performance but alsothe accuracy of the
classification itself. [erm eighting isthe process of assigning values to all the terms
in the corpusaccording to their importance for the actual classificationpart. Here,
importance is defined as the ability of the termto distinguish between different
categories in the corpus. Usually, the more important a term is the higher is the
assigned weight value. [here are already nine differentweighting functions
implemented
,:.
[he step of ³re-representation´ of document is nothing but indexingi.e process of
represent a document into a set of keywords as structured data .in this stage we can
used fuzzy set [ 5]
B:
#!

Classification is the process which assigns one or more labels ± or no label at all ± to
a new (unseen) document. [here are many machine learning algorithms which have
been applied to the problem of text categorization, ranging from statistical methods
(e.g. Na¨ıve Bayes) to black-box algorithms
So in this we are going to apply support vector machine
.
ë ")ë

" !
A Support Vector Machine (SVM) performs classification by constructing an -

dimensional hyper plane that optimally separates the data into two categories
In the parlance of SVM literature, a predictor variable is called an , and a

transformed attribute that is used to define the hyperplane is called a . [he
task of choosing the most suitable representat ion is known as #!. A
set of features that describes one case (i.e., a row of predictor values) is called a
$#!. So the goal of SVM modeling is to find the optimal hyperplane that separates
clusters of vector in such a way that cases with one category of the target variable
are on one side of the plane and cases with the other category are on the other size
of the plane. [he vectors near the hyperplane are the !$#!. [he figure
below presents an overview of the SVM process.
,)$ -+

Before considering -dimensional hyperplanes, let¶s look at a simple 2 -dimensional

example. Assume we wish to perform a classification, and our data has a categorical
target variable with two categories. Also ass ume that there are two predictor
variables with continuous values. If we plot the data points using the value of one
predictor on the X axis and the other on the Y axis we might end up with an image
such as shown below. One category of the target variable is represented by
rectangles while the other category is represented by ovals.
In this idealized example, the cases with one category are in the lower left corner
and the cases with the other category are in the upper right corner; the cases are
completely separated. [he SVM analysis attempts to find a 1 -dimensional hyper
plane (i.e. a line) that separates the cases based on their target categories. [here
are an infinite number of possible lines; two candidate lines are shown above. [he
question is which line is better, and how do we define the optimal line.
[he dashed lines dra wn parallel to the separating line mark the distance between the
dividing line and the closest vectors to the line. [he distance between the dashed
lines is called the . [he vectors (points) that constrain the width of the margin
are the !$#!. [he following figure illustrates this
An SVM analysis finds the line (or, in general, hyperplane) that is oriented so that the
margin between the support vectors is maximized. In the figure above, the line in the
right panel is superior to the line in the left panel.
If all analyses consisted of two-category target variables with two predictor variables,
and the cluster of points could be divided by a straight line, life would be easy.
Unfortunately, this is not generally the case, so SVM must deal w ith (a) more than
two predictor variables, (b) separating the points with non -linear curves, (c) handling
the cases where clusters cannot be completely separated, and (d) handling
classifications with more than two categories.
In the previous example, we h ad only two predictor variables, and we were able to
plot the points on a 2 -dimensional plane. If we add a third predictor variable, then we
can use its value for a third dimension and plot the points in a 3 -dimensional cube.
Points on a 2-dimensional plane can be separated by a 1 -dimensional line. Similarly,
points in a 3-dimensional cube can be separated by a 2 -dimensional plane.
As we add additional predictor variables (attributes), the data points can be
represented in -dimensional space, and a (N -1)-dimensional hyperplane can
separate them.
(
hen Straight Lines o Crooked .[he simplest way to divide two groups is with a
straight line, flat plane or an N -dimensional hyperplane. But what if the points are
separated by a nonlinear region such as shown below?
In this case we need a nonlinear dividing line.Rather than fitting nonlinear curves to
the data, SVM handles this by using a *#! to map the data into a different
space where a hyperplane can be used to do the separation.[he kernel function may
transform the data into a higher dimensional space to make it possible to perform the
separation.
[he concept of a kernel mapping function is very powerful. It allows SVM models to
perform separations even with very complex boundaries such as shown below.
Many kernel mapping functions can be used ± probably an infinite number. But a few
kernel functions have been found to work well in for a wide variety of applications.
[he default and recommended kernel function is the Radial Basis Function (RBF).
þ u¶*v
(gamma*u¶*v } coef0)^degree
%
exp(-gamma*|u-v|^2)
ë.),
,/0 tanh(gamma*u¶*v } coef0)
Ideally an SVM analysis should produce a hyperplane that completely separates the
feature vectors into two non -overlapping groups. However, perfect separation may
not be possible, or it may result in a model with so many feature vector dimensions
that the model does not generalize well to other data; this is known as !$
[o allow some flexibility in separating the categories, SVM models have a cost
parameter,
, that controls the trade off between allowing training errors and forcing
rigid margins. It creates a ! that permits some misclassifications.
Increasing the value of
increases the cost of misclassifying points and forces the
creation of a more accurate model that may not generalize well.
cå

[he accuracy of an SVM model is largely dependent on the selection of the model
parameters. a ! and a
! . A grid search tries values of each

parameter across the specified search range using geometric steps. A pattern
search (also known as a ³compass search´ or a ³line search´) starts at the center of
the search range and makes trial steps in each direction for each parameter. If the fit
of the model improves, the search center moves to the new point and the process is
repeated. If no improvement is found, the step size is reduced and the search is tried
again. [he pattern search stops when the search step size is reduced to a specified
tolerance.
rid searches are computationally expensive because the model must be evaluated
at many points within the grid for each parameter. For example, if a grid search is
used with 10 search intervals and an RBF kernel function is used with two
parameters (C and amma), then the model must be evaluated at 10*10 = 100 grid
points. An Dpsilon-SVR analysis has three parameters (C, amma and P) so a grid
search with 10 intervals would require 10*10*10 = 1000 model evaluations. If cross -
validation is used for each model evaluation, the number of actual SVM calculations
would be further multiplied by the number of c ross-validation folds (typically 4 to 10).
For large models, this approach may be computationally infeasible.
A pattern search generally requires far fewer evaluations of the model than a grid
search. Beginning at the geometric center of the search range, a pattern search
makes trial steps with positive and negative step values for each parameter. If a step
is found that improves the model, the center of the search is moved to that point. If
no step improves the model, the step size is reduced and the proce ss is repeated.
[he search terminates when the step size is reduced to a specified tolerance. [he
weakness of a pattern search is that it may find a local rather than global optimal
point for the parameters.
In this case the grid search is performed first . Once the grid search finishes, a
pattern search is performed over a narrow search range surrounding the best point
found by the grid search. Hopefully, the grid search will find a region near the global
optimum point and the pattern search will then find the global optimum by starting in
the right region.
ã
1
!" ! ,ã

[he idea of using a hyperplane to separate the feature vectors into two groups works
well when there are only two target categories, but how does SVM handle t he case
where the target variable has more than two categories? Several approaches have
been suggested, but two are the most popular: (1) ³one against many´ where each
category is split out and all of the other categories are merged; and, (2) ³one against
one´ where *(*-1)/2 models are constructed where k is the number of categories
å
c
1
!
åc

[he accuracy of an SVM model is largely dependent on the selection of the kernel
parameters such as C, amma, P, etc. A grid search tries values of each parameter
across the specified search range using geometric steps. A pattern search (also
known as a ³compass search´ or a ³line search´) starts at the center of the search
à tài l t i iàti f à à tà. If t fit f t
l ià , t à tà t t it t à i
à t. If ià t i f, t t i i à t à i tài
i. T ttà à t t à t i i à t ifi
tlà .
T i à fitti, à li ti i t l t t fitti à i

à tà l t tài ài t ài à ttà à à.
T flli fià lài àt illtà t iffàt à tà l

à à à fitti:
W l SV à ft tt l ifi ti [ ]
i iti liit
iààl t f tà
t tà à à
t tt l ifi ti àl à li à àl
T D DT TDXT ASSI I ATI BASD Y SDT A

S PP T VDT AI D i à 2 i it à t
i tà f t it f t t à àà i
flli
Tt tài ti f tà it lti i t iàti ti f t
SV làit fi l àlt
[1] ³[he Algorithm of [ext Classification Based on Rough Set and Support Vector
Machine´ ang Zhuocollege of business administration Liaoning [echnical
University978-1-4244-5824-0/$26.00 ·c 2010 IDDD
[2] ³[ext categorization with SVM: Learning with many relevant features´ [horsten
Joachin IDDD-1997
[3] ³An Dxperiment System for [ext ClassificationMathias NiepertB652: Computer
Models of Symbolic Learnin gFinal rite-Up, Spring 2005
[4] ³Application for eb [ext Categorization Based on Support Vector Machine
Pan Hao, Duan Ying, [an LongyuanSchool of Computer Science and
[echnology, uhan University of [echnology (UH[), uhan430070, China
2009 Internation al Forum on Computer Science-[echnology and Applications
[5] ³textcategarization with concept of fuzzy set for informative keyword´
[aecho C. joS/w business team Samsung SDS IDDD -2010

Fuzzy Data Mining

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Fuzzy Data Mining

Uploaded by

Copyright:

Available Formats

c

c Characteristic Function of a Fuzzy Set

c Dxample fuzzy sets

c Dxample: Fuzzy AND

c Dxample: Fuzzy OR

c Dxample: Fuzzy ND A[IONFuzzy

È

wherex-is the membership value of item setxj on ith record,

Definition 3 Let fuzzy itemset 1 2 { , , , } = .. .l,

[he fuzzy itemsetis called frequent fuzzy itemset, when

@A !&#!

A Support Vector Machine (SVM) performs classification by constructing an -

In the parlance of SVM literature, a predictor variable is called an , and a

,)$ -+

Before considering -dimensional hyperplanes, let¶s look at a simple 2 -dimensional

(gammau¶v } coef0)^degree

! . A grid search tries values of each

T i à fitti, à li ti i t l t t fitti à i

T flli fià lài àt illtà t iffàt à tà l

W l SV à ft tt l ifi ti [ ]

T D DT TDXT ASSI I ATI BASD Y SDT A

You might also like

Fuzzy Data Mining

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Fuzzy Data Mining

Uploaded by

Copyright:

Available Formats

c 

   

c  Characteristic Function of a Fuzzy Set

c Dxample fuzzy sets

c Dxample: Fuzzy AND

c Dxample: Fuzzy OR

c Dxample: Fuzzy ND A[IONFuzzy

          

      

         

    È   

wherex-is the membership value of item setxj on ith record,

Definition 3 Let fuzzy itemset 1 2 { , , , } = .. .l ,

[he fuzzy itemsetis called frequent fuzzy itemset, when

@A !& #!

A Support Vector Machine (SVM) performs classification by constructing an -

In the parlance of SVM literature, a predictor variable is called an , and a

,)$  -+  

Before considering -dimensional hyperplanes, let¶s look at a simple 2 -dimensional

   (gamma*u¶*v } coef0)^degree

  ! . A grid search tries values of each

T i à fitti, à  li ti i  t l t t fitti à i 

T f ll i fià l ài à t illtà t  iffàt  à tà l 

W   l SV  à f t tt l ifi ti [ ]

T D DT TDXT ASSI I ATI BASD Y SDT A

You might also like

c

c Characteristic Function of a Fuzzy Set

c Dxample fuzzy sets

c Dxample: Fuzzy AND

c Dxample: Fuzzy OR

c Dxample: Fuzzy ND A[IONFuzzy

È

wherex-is the membership value of item setxj on ith record,

Definition 3 Let fuzzy itemset 1 2 { , , , } = .. .l,

[he fuzzy itemsetis called frequent fuzzy itemset, when

@A !&#!

A Support Vector Machine (SVM) performs classification by constructing an -

In the parlance of SVM literature, a predictor variable is called an , and a

,)$ -+

Before considering -dimensional hyperplanes, let¶s look at a simple 2 -dimensional

(gammau¶v } coef0)^degree

! . A grid search tries values of each

T i à fitti, à li ti i t l t t fitti à i

T flli fià lài àt illtà t iffàt à tà l

W l SV à ft tt l ifi ti [ ]