Professional Documents
Culture Documents
2 22/01/16
Download and Install WEKA
Website:
http://www.cs.waikato.ac.nz/~ml/weka/index.html
Support multiple platforms (written in java):
Windows, Mac OS X and Linux
3 22/01/16
Main Features
49 data preprocessing tools
76 classification/regression algorithms
8 clustering algorithms
3 algorithms for finding association rules
15 attribute/subset evaluators + 10 search algorithms
for feature selection
4 22/01/16
Main GUI
Three graphical user interfaces
“The Explorer” (exploratory data analysis)
“The Experimenter” (experimental
environment)
“The KnowledgeFlow” (new process model
inspired interface)
Simple CLI- provides users without a graphic
interface option the ability to execute commands
from a terminal window
5 22/01/16
Explorer
The Explorer:
Preprocess data
Classification
Clustering
Association Rules
Attribute Selection
Data Visualization
References and Resources
6 22/01/16
Explorer: pre-processing the data
Data can be imported from a file in various formats: ARFF,
CSV, C4.5, binary
Data can also be read from a URL or from an SQL database
(using JDBC)
Pre-processing tools in WEKA are called “filters”
WEKA contains filters for:
Discretization, normalization, resampling, attribute selection,
transforming and combining attributes, …
7 22/01/16
WEKA only deals with “flat” files
@relation heart-disease-simplified
@data
63,male,typ_angina,233,no,not_present
67,male,asympt,286,yes,present
67,male,asympt,229,yes,present
38,female,non_anginal,?,no,not_present
...
8 22/01/16
WEKA only deals with “flat” files
@relation heart-disease-simplified
@data
63,male,typ_angina,233,no,not_present
67,male,asympt,286,yes,present
67,male,asympt,229,yes,present
38,female,non_anginal,?,no,not_present
...
9 22/01/16
10 University of Waikato 22/01/16
11 University of Waikato 22/01/16
IRIS dataset
5 attributes, one is the classification
3 classes: setosa, versicolor, virginica
15 University of Waikato 22/01/16
Attribute data
Min, max and average value of attributes
distribution of values :number of items for which:
ai = v j | ai ∈ A,v j ∈ V
37 22/01/16
Explore Conjunctive Rules learner
Need a simple dataset with few attributes , let’s select the weather dataset
Select a Classifier
Select training method
Right-click to select parameters
69 22/01/16
The K-Means Clustering Method
Given k, the k-means algorithm is implemented in four steps:
Partition objects into k nonempty subsets
Compute seed points as the centroids of the clusters of the
current partition (the centroid is the center, i.e., mean point, of
the cluster)
Assign each object to the cluster with the nearest seed point
Go back to Step 2, stop when no more new assignment
73 22/01/16
Basic Concepts: Frequent Patterns
Tid Items bought itemset: A set of one or more items
10 Beer, Nuts, Diaper k-itemset X = {x1, …, xk}
20 Beer, Coffee, Diaper
(absolute) support, or, support count of X:
30 Beer, Diaper, Eggs
Frequency or occurrence of an itemset
40 Nuts, Eggs, Milk X
50 Nuts, Coffee, Diaper, Eggs, Milk
(relative) support, s, is the fraction of
Customer Customer
transactions that contains X (i.e., the
buys both buys diaper probability that a transaction contains X)
An itemset X is frequent if X’s support is
no less than a minsup threshold
Customer
buys beer
74 gennaio 22, 2016
Basic Concepts: Association Rules
Tid Items bought Find all the rules X à Y with minimum
10 Beer, Nuts, Diaper
support and confidence
20 Beer, Coffee, Diaper
30 Beer, Diaper, Eggs support, s, probability that a
40 Nuts, Eggs, Milk transaction contains X ∪ Y
50 Nuts, Coffee, Diaper, Eggs, Milk
confidence, c, conditional probability
Customer
buys both
Customer that a transaction having X also
buys contains Y
diaper
Let minsup = 50%, minconf = 50%
Freq. Pat.: Beer:3, Nuts:3, Diaper:4, Eggs:3, {Beer,
Diaper}:3
Customer
buys beer n Association rules: (many more!)
n Beer à Diaper (60%, 100%)
n Diaper à Beer (60%, 75%)
75 gennaio 22, 2016
77 University of Waikato 22/01/16
78 University of Waikato 22/01/16
79 University of Waikato 22/01/16
80 University of Waikato 22/01/16
81 University of Waikato 22/01/16
1. adoption-of-the-budget-resolution=y physician-fee-freeze=n 219 ==>
Class=democrat 219 conf:(1)
2. adoption-of-the-budget-resolution=y physician-fee-freeze=n aid-to-
nicaraguan-contras=y 198 ==> Class=democrat 198 conf:(1)
3. physician-fee-freeze=n aid-to-nicaraguan-contras=y 211 ==>
Class=democrat 210 conf:(1)
ecc.
Explorer: attribute selection
Panel that can be used to investigate which (subsets of)
attributes are the most predictive ones
Attribute selection methods contain two parts:
A search method: best-first, forward selection, random,
exhaustive, genetic algorithm, ranking
An evaluation method: correlation-based, wrapper, information
gain, chi-squared, …
Very flexible: WEKA allows (almost) arbitrary combinations
of these two
83 22/01/16
84 University of Waikato 22/01/16
85 University of Waikato 22/01/16
86 University of Waikato 22/01/16
87 University of Waikato 22/01/16
88 University of Waikato 22/01/16
89 University of Waikato 22/01/16
90 University of Waikato 22/01/16
91 University of Waikato 22/01/16
Explorer: data visualization
Visualization very useful in practice: e.g. helps to determine
difficulty of the learning problem
WEKA can visualize single attributes (1-d) and pairs of
attributes (2-d)
To do: rotating 3-d visualizations (Xgobi-style)
Color-coded class values
“Jitter” option to deal with nominal attributes (and to detect
“hidden” data points)
“Zoom-in” function
92 22/01/16
94 University of Waikato 22/01/16
95 University of Waikato 22/01/16
96 University of Waikato 22/01/16
97 University of Waikato 22/01/16
click on a cell