Professional Documents
Culture Documents
WEKA Data Mining Tool
WEKA Data Mining Tool
1
Data Mining
Task Types Numerous Algorithms
Classification C4.5 Decision Tree
Clustering K-Means
K Means Clustering
Discovering Association
Rules
Discovering Sequential
Patterns – Sequence Analysis
Regression
R i
Detecting Deviations from
Normal
2
http://www cs waikato ac nz/ml/weka/
http://www.cs.waikato.ac.nz/ml/weka/
WEKA can be freelyy downloaded byy visitingg the Web site
3
WEKA – Data Mining Software
Developed by the Machine Learning Group, University of
Waikato , New Zealand
Vision: Build state-of-the-art software for developing
machine learning (ML) techniques and apply them to real-
real
world data-mining problems
Developed
p in Java
J
4
WELA’s Collection of Machine Learning
Alg ith
Algorithms
Algorithms for data mining tasks
WEKA is open source software issued under the GNU
General Public License
Tools
T l ffor:
Data pre-processing
Classification
Regression
Clustering
Association rules
Visualization
5
After Installing - Start WEKA
6
WEKA Main Interface
7
WEKA Sample Files
C:\Program Files\weka\data
WEKA formatted files (.arff)
Open the contact-lenses file
8
Example – Contact Lens Data
How manyy
data
instances
are in the
file?
How many
attributes?
Numerical
attributes?
Categorical
attributes?
9
Example – Contact Lens Data
Can you
think of
problems
that might
be solved
with
ith thi
this
data?
10
Example – Contact Lens Data
If
supervised
learning
were to be
done,
which
would be
the output
attribute,
attribute
do you
think?
11
Example – Contact Lens Data
12
Example – Contact Lens Data
13
Example – Contact Lens Data
14
E
Example
l – Classify
Cl if - Contact
C t t Lens
L Data
D t
15
E
Example
l – Classify
Cl if - Contact
C t t Lens
L Data
D t
16
E
Example
l – Classify
Cl if - Contact
C t t Lens
L Data
D t
Select the
rule
generator
named
PART from
th list
the li t that
th t
shows up
after you
select
Choose
17
E
Example
l – Classify
Cl if - Contact
C t t Lens
L Data
D t
18
E
Example
l – Classify
Cl if - Contact
C t t Lens
L Data
D t
19
10 Fold Cross-Validation
10-Fold Cross Validation
Data is partitioned into 10 equally (or nearly equally) sized
segments or folds
10 iterations of training and validation are completed
In
I eachh iteration a different
d ff ffold
ld off the
h data
d is held
h ld out ffor
validation, with the remaining 9 folds used for learning
20 http://www.public.asu.edu/~ltang9/papers/ency-cross-validation.pdf
E
Example
l – Classify
Cl if - Contact
C t t Lens
L Data
D t
21
E
Example
l – Classify
Cl if - Contact
C t t Lens
L Data
D t
IF tear
tear-prod-
prod
rate = reduced
THEN contact-
llenses = none
IF astigmatism
tig ti
= no THEN
co tact e ses
contact-lenses
= soft
22
E
Example
l – Classify
Cl if - Contact
C t t Lens
L Data
D t
Coverage =
12
23
E
Example
l – Classify
Cl if - Contact
C t t Lens
L Data
D t
IF tear
tear-prod-
prod
rate = reduced
THEN contact-
llenses = none
IF astigmatism
tig ti
= no THEN
co tact e ses
contact-lenses
= soft
24
E
Example
l – Classify
Cl if - Contact
C t t Lens
L Data
D t
Coverage = 6
Misclassification = 1
Accuracy = 5/6 = 83.3%
25
E
Example
l – Classify
Cl if - Contact
C t t Lens
L Data
D t
26
WEKA
A Data Mining Tool
By Susan L. Miertschin
27