Professional Documents
Culture Documents
Data Mining With Weka: Ian H. Witten
Data Mining With Weka: Ian H. Witten
Class1 Lesson1
Introduction
IanH.Witten
DepartmentofComputerScience
UniversityofWaikato
NewZealand
weka.waikato.ac.nz
DataMiningwithWeka
apracticalcourseonhowto
useWeka fordatamining
explainsthebasicprinciples
ofseveralpopularalgorithms
IanH.Witten
UniversityofWaikato,NewZealand
DataMiningwithWeka
Whatsdatamining?
Weareoverwhelmedwithdata
Dataminingisaboutgoingfromdatatoinformation,
informationthatcangiveyouusefulpredictions
Examples??
Youreatthesupermarketcheckout.
Yourehappywithyourbargains
andthesupermarketishappyyouveboughtsomemorestuff
Sayyouwantachild,butyouandyourpartnercanthaveone.
Candatamininghelp?
Dataminingvs.machinelearning
DataMiningwithWeka
WhatsWeka?
AbirdfoundonlyinNewZealand?
Dataminingworkbench
WaikatoEnvironmentforKnowledgeAnalysis
Machinelearningalgorithmsfordataminingtasks
100+algorithmsforclassification
75fordatapreprocessing
25toassistwithfeatureselection
20forclustering,findingassociationrules,etc
DataMiningwithWeka
Whatwillyoulearn?
LoaddataintoWeka andlookatit
Usefilterstopreprocessit
Exploreitusinginteractivevisualization
Applyclassificationalgorithms
Interprettheoutput
Understandevaluationmethodsandtheirimplications
Understandvariousrepresentationsformodels
Explainhowpopularmachinelearningalgorithmswork
Beawareofcommonpitfallswithdatamining
UseWeka onyourowndata
andunderstandwhatyouaredoing!
Class1:GettingstartedwithWeka
InstallWeka
ExploretheExplorer interface
Exploresomedatasets
Buildaclassifier
Interprettheoutput
Usefilters
Visualizeyourdataset
Courseorganization
Class1
GettingstartedwithWeka
Lesson1.1
Activity1
Class2
Evaluation
Lesson1.2
Activity2
Lesson1.3
Class3
Simpleclassifiers
Activity3
Lesson1.4
Activity4
Class4
Moreclassifiers
Lesson1.5
Activity5
Lesson1.6
Class5
Puttingitalltogether
Activity6
9
Courseorganization
Class1
GettingstartedwithWeka
Class2
Evaluation
Midclassassessment
1/3
Postclassassessment
2/3
Class3
Simpleclassifiers
Class4
Moreclassifiers
Class5
Puttingitalltogether
10
Textbook
Thistextbookdiscussesdatamining,
andWeka,indepth:
DataMining:Practicalmachine
learningtoolsandtechniques,
byIanH.Witten,Eibe Frankand
MarkA.Hall.MorganKaufmann,2011
Thepublisherhasmadeavailable
partsrelevanttothiscourse
inebook format.
11
12
WorldMapbyDavidNiblack,licensedunderaCreativeCommonsAttribution3.0Unported
License
DataMiningwithWeka
Class1 Lesson2
ExploringtheExplorer
IanH.Witten
DepartmentofComputerScience
UniversityofWaikato
NewZealand
weka.waikato.ac.nz
Lesson1.2:ExploringtheExplorer
Class1
GettingstartedwithWeka
Class2
Evaluation
Class3
Simpleclassifiers
Lesson1.1Introduction
Lesson1.2ExploringtheExplorer
Lesson1.3Exploringdatasets
Lesson1.4Buildingaclassifier
Class4
Moreclassifiers
Class5
Puttingitalltogether
Lesson1.5Usingafilter
Lesson1.6Visualizingyourdata
14
Lesson1.2:ExploringtheExplorer
Downloadfrom
http://www.cs.waikato.ac.nz/ml/weka
(forWindows,Mac,Linux)
Weka 3.6.10
(thelateststableversionofWeka)
(includesdatasetsforthecourse)
(itsimportanttogettherightversion,3.6.10)
15
Lesson1.2:ExploringtheExplorer
Performance
comparisons
Graphical
interface
Commandline
interface
16
Lesson1.2:ExploringtheExplorer
17
Lesson1.2:ExploringtheExplorer
attributes
instances
Outlook
Temp
Humidity
Windy
Play
Sunny
Hot
High
False
No
Sunny
Hot
High
True
No
Overcast
Hot
High
False
Yes
Rainy
Mild
High
False
Yes
Rainy
Cool
Normal
False
Yes
Rainy
Cool
Normal
True
No
Overcast
Cool
Normal
True
Yes
Sunny
Mild
High
False
No
Sunny
Cool
Normal
False
Yes
10
Rainy
Mild
Normal
False
Yes
11
Sunny
Mild
Normal
True
Yes
12
Overcast
Mild
High
True
Yes
13
Overcast
Hot
Normal
False
Yes
14
Rainy
Mild
High
True
No
18
Lesson1.2:ExploringtheExplorer
openfileweather.nominal.arff
19
Lesson1.2:ExploringtheExplorer
attribute
values
attributes
20
Lesson1.2:ExploringtheExplorer
InstallWeka
Getdatasets
OpenExplorer
Openadataset(weather.nominal.arff)
Lookatattributesandtheirvalues
Editthedataset
Saveit?
Coursetext
Section1.2 Theweatherproblem
Chapter10 IntroductiontoWeka
21
DataMiningwithWeka
Class1 Lesson3
Exploringdatasets
IanH.Witten
DepartmentofComputerScience
UniversityofWaikato
NewZealand
weka.waikato.ac.nz
Lesson1.3:Exploringdatasets
Class1
GettingstartedwithWeka
Class2
Evaluation
Class3
Simpleclassifiers
Lesson1.1Introduction
Lesson1.2ExploringtheExplorer
Lesson1.3Exploringdatasets
Lesson1.4Buildingaclassifier
Class4
Moreclassifiers
Class5
Puttingitalltogether
Lesson1.5Usingafilter
Lesson1.6Visualizingyourdata
Lesson1.3:Exploringdatasets
attributes
instances
Outlook
Temp
Humidity
Windy
Play
Sunny
Hot
High
False
No
Sunny
Hot
High
True
No
Overcast
Hot
High
False
Yes
Rainy
Mild
High
False
Yes
Rainy
Cool
Normal
False
Yes
Rainy
Cool
Normal
True
No
Overcast
Cool
Normal
True
Yes
Sunny
Mild
High
False
No
Sunny
Cool
Normal
False
Yes
10
Rainy
Mild
Normal
False
Yes
11
Sunny
Mild
Normal
True
Yes
12
Overcast
Mild
High
True
Yes
13
Overcast
Hot
Normal
False
Yes
14
Rainy
Mild
High
True
No
24
Lesson1.3:Exploringdatasets
openfileweather.nominal.arff
attribute
values
attributes
class
25
Lesson1.3:Exploringdatasets
Classification
sometimescalledsupervisedlearning
Dataset:classifiedexamples
Model thatclassifiesnewexamples
classified
example
attribute1
attribute2
instance:
fixedsetoffeatures
discrete(nominal)
continuous(numeric)
attributen
class
discrete:classification problem
continuous:regression problem
26
Lesson1.3:Exploringdatasets
openfileweather.numeric.arff
attribute
values
attributes
class
27
Lesson1.3:Exploringdatasets
openfileglass.arff
28
Lesson1.3:Exploringdatasets
Theclassificationproblem
weather.nominal,weather.numeric
Nominalvs numericattributes
ARFFfileformat
glass.arff dataset
Sanitycheckingattributes
Coursetext
Section11.1 Preparingthedata
LoadingthedataintotheExplorer
29
DataMiningwithWeka
Class1 Lesson4
Buildingaclassifier
IanH.Witten
DepartmentofComputerScience
UniversityofWaikato
NewZealand
weka.waikato.ac.nz
Lesson1.4:Buildingaclassifier
Class1
GettingstartedwithWeka
Class2
Evaluation
Class3
Simpleclassifiers
Lesson1.1Introduction
Lesson1.2ExploringtheExplorer
Lesson1.3Exploringdatasets
Lesson1.4Buildingaclassifier
Class4
Moreclassifiers
Class5
Puttingitalltogether
Lesson1.5Usingafilter
Lesson1.6Visualizingyourdata
31
Lesson1.4:Buildingaclassifier
UseJ48toanalyzetheglassdataset
Openfileglass.arff
(orleaveitopenfromthe
lastlesson)
Checktheavailableclassifiers
ChoosetheJ48decisiontreelearner(trees>J48)
Runit
Examinetheoutput
Lookatthecorrectlyclassifiedinstances
andtheconfusionmatrix
32
Lesson1.4:Buildingaclassifier
InvestigateJ48
Opentheconfigurationpanel
ChecktheMore information
Examinetheoptions
Useanunpruned tree
Lookatleafsizes
SetminNumObj to15toavoidsmallleaves
Visualizetreeusingrightclickmenu
33
Lesson1.4:Buildingaclassifier
FromC4.5toJ48
ID3(1979)
C4.5 (1993)
C4.8(1996?)
C5.0(commercial)
J48
34
Lesson1.4:Buildingaclassifier
ClassifiersinWeka
Classifyingtheglass dataset
InterpretingJ48output
J48configurationpanel
option:prunedvs unpruned trees
option:avoidsmallleaves
J48~C4.5
Coursetext
Section11.1 Buildingadecisiontree
Examiningtheoutput
35
DataMiningwithWeka
Class1 Lesson5
Usingafilter
IanH.Witten
DepartmentofComputerScience
UniversityofWaikato
NewZealand
weka.waikato.ac.nz
Lesson1.5:Usingafilter
Class1
GettingstartedwithWeka
Class2
Evaluation
Class3
Simpleclassifiers
Lesson1.1Introduction
Lesson1.2ExploringtheExplorer
Lesson1.3Exploringdatasets
Lesson1.4Buildingaclassifier
Class4
Moreclassifiers
Class5
Puttingitalltogether
Lesson1.5Usingafilter
Lesson1.6Visualizingyourdata
37
Lesson1.5:Usingafilter
Useafiltertoremoveanattribute
Openweather.nominal.arff (again!)
Checkthefilters
supervisedvs unsupervised
attributevs instance
Lesson1.5:Usingafilter
Removeinstanceswherehumidity ishigh
Supervisedorunsupervised?
Attributeorinstance?
Lookatthem
SelectRemoveWithValues
SetattributeIndex
SetnominalIndices
Apply
Undo
39
Lesson1.5:Usingafilter
Fewerattributes,betterclassification!
Openglass.arff
RunJ48(trees>J48)
RemoveFe
RemoveallattributesexceptRIandMG
Lookatthedecisiontrees
Userightclickmenutovisualizedecisiontrees
40
Lesson1.5:Usingafilter
FiltersinWeka
Supervisedvs unsupervised,
attributevs instance
Tofindtherightone,youneedtolook!
Filterscanbeverypowerful
Judiciouslyremovingattributescan
improveperformance
increasecomprehensibility
Coursetext
Section11.2 Loadingandfilteringfiles
41
DataMiningwithWeka
Class1 Lesson6
Visualizingyourdata
IanH.Witten
DepartmentofComputerScience
UniversityofWaikato
NewZealand
weka.waikato.ac.nz
Lesson1.6:Visualizingyourdata
Class1
GettingstartedwithWeka
Class2
Evaluation
Class3
Simpleclassifiers
Lesson1.1Introduction
Lesson1.2ExploringtheExplorer
Lesson1.3Exploringdatasets
Lesson1.4Buildingaclassifier
Class4
Moreclassifiers
Class5
Puttingitalltogether
Lesson1.5Usingafilter
Lesson1.6Visualizingyourdata
43
Lesson1.6:Visualizingyourdata
UsingtheVisualizepanel
Openiris.arff
BringupVisualizepanel
Clickoneoftheplots;examinesomeinstances
Setxaxistopetalwidthandyaxistopetallength
ClickonClasscolourtochangethecolour
Barsontherightchangecorrespondtoattributes:clickforxaxis;
rightclickforyaxis
Jitterslider
ShowSelectInstance:Rectangleoption
Submit,Reset,ClearandSave
44
Lesson1.6:Visualizingyourdata
Visualizingclassificationerrors
RunJ48(trees>J48)
Visualizeclassifiererrors(fromResultslist)
Plotpredictedclassagainstclass
Identifyerrorsshownbyconfusionmatrix
45
Lesson1.6:Visualizingyourdata
Getdownanddirtywithyourdata
Visualizeit
Cleanitupbydeletingoutliers
Lookatclassificationerrors
(theresafilterthatallowsyoutoaddclassificationsasanew
attribute)
Coursetext
Section11.2 Visualization
46
DataMiningwithWeka
DepartmentofComputerScience
UniversityofWaikato
NewZealand
CreativeCommonsAttribution3.0Unported License
creativecommons.org/licenses/by/3.0/
weka.waikato.ac.nz