You are on page 1of 27

WEKA

A Data Mining Tool


By Susan L. Miertschin

1
Data Mining
Task Types Numerous Algorithms
 Classification  C4.5 Decision Tree
 Clustering  K-Means
K Means Clustering
 Discovering Association
Rules
 Discovering Sequential
Patterns – Sequence Analysis
 Regression
R i
 Detecting Deviations from
Normal

2
http://www cs waikato ac nz/ml/weka/
http://www.cs.waikato.ac.nz/ml/weka/
WEKA can be freelyy downloaded byy visitingg the Web site

3
WEKA – Data Mining Software
 Developed by the Machine Learning Group, University of
Waikato , New Zealand
 Vision: Build state-of-the-art software for developing
machine learning (ML) techniques and apply them to real-
real
world data-mining problems
 Developed
p in Java
J

4
WELA’s Collection of Machine Learning
Alg ith
Algorithms
 Algorithms for data mining tasks
 WEKA is open source software issued under the GNU
General Public License
 Tools
T l ffor:
 Data pre-processing
 Classification
 Regression
 Clustering
 Association rules
 Visualization

5
After Installing - Start WEKA

6
WEKA Main Interface

7
WEKA Sample Files
 C:\Program Files\weka\data
 WEKA formatted files (.arff)
 Open the contact-lenses file

8
Example – Contact Lens Data
How manyy
data
instances
are in the
file?
How many
attributes?
Numerical
attributes?
Categorical
attributes?

9
Example – Contact Lens Data
Can you
think of
problems
that might
be solved
with
ith thi
this
data?

10
Example – Contact Lens Data
If
supervised
learning
were to be
done,
which
would be
the output
attribute,
attribute
do you
think?

11
Example – Contact Lens Data

12
Example – Contact Lens Data

13
Example – Contact Lens Data

14
E
Example
l – Classify
Cl if - Contact
C t t Lens
L Data
D t

15
E
Example
l – Classify
Cl if - Contact
C t t Lens
L Data
D t

16
E
Example
l – Classify
Cl if - Contact
C t t Lens
L Data
D t
Select the
rule
generator
named
PART from
th list
the li t that
th t
shows up
after you
select
Choose
17
E
Example
l – Classify
Cl if - Contact
C t t Lens
L Data
D t

18
E
Example
l – Classify
Cl if - Contact
C t t Lens
L Data
D t

19
10 Fold Cross-Validation
10-Fold Cross Validation
 Data is partitioned into 10 equally (or nearly equally) sized
segments or folds
 10 iterations of training and validation are completed
 In
I eachh iteration a different
d ff ffold
ld off the
h data
d is held
h ld out ffor
validation, with the remaining 9 folds used for learning

20 http://www.public.asu.edu/~ltang9/papers/ency-cross-validation.pdf
E
Example
l – Classify
Cl if - Contact
C t t Lens
L Data
D t

21
E
Example
l – Classify
Cl if - Contact
C t t Lens
L Data
D t
IF tear
tear-prod-
prod
rate = reduced
THEN contact-
llenses = none

IF astigmatism
tig ti
= no THEN
co tact e ses
contact-lenses
= soft

22
E
Example
l – Classify
Cl if - Contact
C t t Lens
L Data
D t
Coverage =
12

23
E
Example
l – Classify
Cl if - Contact
C t t Lens
L Data
D t
IF tear
tear-prod-
prod
rate = reduced
THEN contact-
llenses = none

IF astigmatism
tig ti
= no THEN
co tact e ses
contact-lenses
= soft

24
E
Example
l – Classify
Cl if - Contact
C t t Lens
L Data
D t
 Coverage = 6
 Misclassification = 1
 Accuracy = 5/6 = 83.3%

25
E
Example
l – Classify
Cl if - Contact
C t t Lens
L Data
D t

26
WEKA
A Data Mining Tool
By Susan L. Miertschin

27

You might also like