You are on page 1of 45

DataMiningwithWeka

Class1 Lesson1
Introduction

IanH.Witten
DepartmentofComputerScience
UniversityofWaikato
NewZealand

weka.waikato.ac.nz

DataMiningwithWeka
apracticalcourseonhowto
useWeka fordatamining
explainsthebasicprinciples
ofseveralpopularalgorithms

IanH.Witten
UniversityofWaikato,NewZealand

DataMiningwithWeka
Whatsdatamining?
Weareoverwhelmedwithdata
Dataminingisaboutgoingfromdatatoinformation,
informationthatcangiveyouusefulpredictions

Examples??
Youreatthesupermarketcheckout.
Yourehappywithyourbargains
andthesupermarketishappyyouveboughtsomemorestuff
Sayyouwantachild,butyouandyourpartnercanthaveone.
Candatamininghelp?

Dataminingvs.machinelearning

DataMiningwithWeka
WhatsWeka?
AbirdfoundonlyinNewZealand?

Dataminingworkbench
WaikatoEnvironmentforKnowledgeAnalysis
Machinelearningalgorithmsfordataminingtasks
100+algorithmsforclassification
75fordatapreprocessing
25toassistwithfeatureselection
20forclustering,findingassociationrules,etc

DataMiningwithWeka
Whatwillyoulearn?

LoaddataintoWeka andlookatit
Usefilterstopreprocessit
Exploreitusinginteractivevisualization
Applyclassificationalgorithms
Interprettheoutput
Understandevaluationmethodsandtheirimplications
Understandvariousrepresentationsformodels
Explainhowpopularmachinelearningalgorithmswork
Beawareofcommonpitfallswithdatamining

UseWeka onyourowndata
andunderstandwhatyouaredoing!

Class1:GettingstartedwithWeka

InstallWeka
ExploretheExplorer interface
Exploresomedatasets
Buildaclassifier
Interprettheoutput
Usefilters
Visualizeyourdataset

Courseorganization
Class1
GettingstartedwithWeka

Lesson1.1
Activity1

Class2
Evaluation

Lesson1.2
Activity2

Lesson1.3

Class3
Simpleclassifiers

Activity3

Lesson1.4
Activity4

Class4
Moreclassifiers

Lesson1.5
Activity5

Lesson1.6

Class5
Puttingitalltogether

Activity6
9

Courseorganization
Class1
GettingstartedwithWeka

Class2
Evaluation
Midclassassessment

1/3

Postclassassessment

2/3

Class3
Simpleclassifiers

Class4
Moreclassifiers

Class5
Puttingitalltogether
10

Textbook
Thistextbookdiscussesdatamining,
andWeka,indepth:
DataMining:Practicalmachine
learningtoolsandtechniques,
byIanH.Witten,Eibe Frankand
MarkA.Hall.MorganKaufmann,2011

Thepublisherhasmadeavailable
partsrelevanttothiscourse
inebook format.
11

12
WorldMapbyDavidNiblack,licensedunderaCreativeCommonsAttribution3.0Unported
License

DataMiningwithWeka
Class1 Lesson2
ExploringtheExplorer

IanH.Witten
DepartmentofComputerScience
UniversityofWaikato
NewZealand

weka.waikato.ac.nz

Lesson1.2:ExploringtheExplorer
Class1
GettingstartedwithWeka

Class2
Evaluation

Class3
Simpleclassifiers

Lesson1.1Introduction
Lesson1.2ExploringtheExplorer
Lesson1.3Exploringdatasets
Lesson1.4Buildingaclassifier

Class4
Moreclassifiers

Class5
Puttingitalltogether

Lesson1.5Usingafilter
Lesson1.6Visualizingyourdata

14

Lesson1.2:ExploringtheExplorer
Downloadfrom
http://www.cs.waikato.ac.nz/ml/weka
(forWindows,Mac,Linux)

Weka 3.6.10
(thelateststableversionofWeka)
(includesdatasetsforthecourse)
(itsimportanttogettherightversion,3.6.10)

15

Lesson1.2:ExploringtheExplorer

Performance
comparisons
Graphical
interface
Commandline
interface

16

Lesson1.2:ExploringtheExplorer

17

Lesson1.2:ExploringtheExplorer
attributes
instances

Outlook

Temp

Humidity

Windy

Play

Sunny

Hot

High

False

No

Sunny

Hot

High

True

No

Overcast

Hot

High

False

Yes

Rainy

Mild

High

False

Yes

Rainy

Cool

Normal

False

Yes

Rainy

Cool

Normal

True

No

Overcast

Cool

Normal

True

Yes

Sunny

Mild

High

False

No

Sunny

Cool

Normal

False

Yes

10

Rainy

Mild

Normal

False

Yes

11

Sunny

Mild

Normal

True

Yes

12

Overcast

Mild

High

True

Yes

13

Overcast

Hot

Normal

False

Yes

14

Rainy

Mild

High

True

No

18

Lesson1.2:ExploringtheExplorer

openfileweather.nominal.arff

19

Lesson1.2:ExploringtheExplorer

attribute
values
attributes

20

Lesson1.2:ExploringtheExplorer

InstallWeka
Getdatasets
OpenExplorer
Openadataset(weather.nominal.arff)
Lookatattributesandtheirvalues
Editthedataset
Saveit?
Coursetext
Section1.2 Theweatherproblem
Chapter10 IntroductiontoWeka
21

DataMiningwithWeka
Class1 Lesson3
Exploringdatasets

IanH.Witten
DepartmentofComputerScience
UniversityofWaikato
NewZealand

weka.waikato.ac.nz

Lesson1.3:Exploringdatasets
Class1
GettingstartedwithWeka

Class2
Evaluation

Class3
Simpleclassifiers

Lesson1.1Introduction
Lesson1.2ExploringtheExplorer
Lesson1.3Exploringdatasets
Lesson1.4Buildingaclassifier

Class4
Moreclassifiers

Class5
Puttingitalltogether

Lesson1.5Usingafilter
Lesson1.6Visualizingyourdata

Lesson1.3:Exploringdatasets
attributes
instances

Outlook

Temp

Humidity

Windy

Play

Sunny

Hot

High

False

No

Sunny

Hot

High

True

No

Overcast

Hot

High

False

Yes

Rainy

Mild

High

False

Yes

Rainy

Cool

Normal

False

Yes

Rainy

Cool

Normal

True

No

Overcast

Cool

Normal

True

Yes

Sunny

Mild

High

False

No

Sunny

Cool

Normal

False

Yes

10

Rainy

Mild

Normal

False

Yes

11

Sunny

Mild

Normal

True

Yes

12

Overcast

Mild

High

True

Yes

13

Overcast

Hot

Normal

False

Yes

14

Rainy

Mild

High

True

No

24

Lesson1.3:Exploringdatasets

openfileweather.nominal.arff
attribute
values
attributes

class

25

Lesson1.3:Exploringdatasets
Classification

sometimescalledsupervisedlearning

Dataset:classifiedexamples
Model thatclassifiesnewexamples

classified
example

attribute1
attribute2

instance:
fixedsetoffeatures
discrete(nominal)
continuous(numeric)

attributen
class

discrete:classification problem
continuous:regression problem
26

Lesson1.3:Exploringdatasets

openfileweather.numeric.arff
attribute
values
attributes

class

27

Lesson1.3:Exploringdatasets

openfileglass.arff

28

Lesson1.3:Exploringdatasets

Theclassificationproblem
weather.nominal,weather.numeric
Nominalvs numericattributes
ARFFfileformat
glass.arff dataset
Sanitycheckingattributes

Coursetext
Section11.1 Preparingthedata
LoadingthedataintotheExplorer
29

DataMiningwithWeka
Class1 Lesson4
Buildingaclassifier

IanH.Witten
DepartmentofComputerScience
UniversityofWaikato
NewZealand

weka.waikato.ac.nz

Lesson1.4:Buildingaclassifier
Class1
GettingstartedwithWeka

Class2
Evaluation

Class3
Simpleclassifiers

Lesson1.1Introduction
Lesson1.2ExploringtheExplorer
Lesson1.3Exploringdatasets
Lesson1.4Buildingaclassifier

Class4
Moreclassifiers

Class5
Puttingitalltogether

Lesson1.5Usingafilter
Lesson1.6Visualizingyourdata

31

Lesson1.4:Buildingaclassifier
UseJ48toanalyzetheglassdataset
Openfileglass.arff
(orleaveitopenfromthe
lastlesson)

Checktheavailableclassifiers
ChoosetheJ48decisiontreelearner(trees>J48)
Runit
Examinetheoutput
Lookatthecorrectlyclassifiedinstances
andtheconfusionmatrix
32

Lesson1.4:Buildingaclassifier
InvestigateJ48

Opentheconfigurationpanel
ChecktheMore information
Examinetheoptions
Useanunpruned tree
Lookatleafsizes
SetminNumObj to15toavoidsmallleaves
Visualizetreeusingrightclickmenu
33

Lesson1.4:Buildingaclassifier
FromC4.5toJ48
ID3(1979)

C4.5 (1993)
C4.8(1996?)
C5.0(commercial)

J48

34

Lesson1.4:Buildingaclassifier

ClassifiersinWeka
Classifyingtheglass dataset
InterpretingJ48output
J48configurationpanel
option:prunedvs unpruned trees
option:avoidsmallleaves
J48~C4.5
Coursetext
Section11.1 Buildingadecisiontree
Examiningtheoutput

35

DataMiningwithWeka
Class1 Lesson5
Usingafilter

IanH.Witten
DepartmentofComputerScience
UniversityofWaikato
NewZealand

weka.waikato.ac.nz

Lesson1.5:Usingafilter
Class1
GettingstartedwithWeka

Class2
Evaluation

Class3
Simpleclassifiers

Lesson1.1Introduction
Lesson1.2ExploringtheExplorer
Lesson1.3Exploringdatasets
Lesson1.4Buildingaclassifier

Class4
Moreclassifiers

Class5
Puttingitalltogether

Lesson1.5Usingafilter
Lesson1.6Visualizingyourdata

37

Lesson1.5:Usingafilter
Useafiltertoremoveanattribute
Openweather.nominal.arff (again!)
Checkthefilters
supervisedvs unsupervised
attributevs instance

Choosetheunsupervised attribute filterRemove


ChecktheMore information;lookattheoptions
SetattributeIndices to3 andclickOK
Applythefilter
RecallthatyoucanSave theresult
PressUndo
38

Lesson1.5:Usingafilter
Removeinstanceswherehumidity ishigh

Supervisedorunsupervised?
Attributeorinstance?
Lookatthem
SelectRemoveWithValues
SetattributeIndex
SetnominalIndices
Apply
Undo
39

Lesson1.5:Usingafilter
Fewerattributes,betterclassification!

Openglass.arff
RunJ48(trees>J48)
RemoveFe
RemoveallattributesexceptRIandMG
Lookatthedecisiontrees

Userightclickmenutovisualizedecisiontrees
40

Lesson1.5:Usingafilter
FiltersinWeka
Supervisedvs unsupervised,
attributevs instance
Tofindtherightone,youneedtolook!
Filterscanbeverypowerful
Judiciouslyremovingattributescan
improveperformance
increasecomprehensibility
Coursetext
Section11.2 Loadingandfilteringfiles
41

DataMiningwithWeka
Class1 Lesson6
Visualizingyourdata

IanH.Witten
DepartmentofComputerScience
UniversityofWaikato
NewZealand

weka.waikato.ac.nz

Lesson1.6:Visualizingyourdata
Class1
GettingstartedwithWeka

Class2
Evaluation

Class3
Simpleclassifiers

Lesson1.1Introduction
Lesson1.2ExploringtheExplorer
Lesson1.3Exploringdatasets
Lesson1.4Buildingaclassifier

Class4
Moreclassifiers

Class5
Puttingitalltogether

Lesson1.5Usingafilter
Lesson1.6Visualizingyourdata

43

Lesson1.6:Visualizingyourdata
UsingtheVisualizepanel

Openiris.arff
BringupVisualizepanel
Clickoneoftheplots;examinesomeinstances
Setxaxistopetalwidthandyaxistopetallength
ClickonClasscolourtochangethecolour
Barsontherightchangecorrespondtoattributes:clickforxaxis;
rightclickforyaxis
Jitterslider
ShowSelectInstance:Rectangleoption
Submit,Reset,ClearandSave
44

Lesson1.6:Visualizingyourdata
Visualizingclassificationerrors

RunJ48(trees>J48)
Visualizeclassifiererrors(fromResultslist)
Plotpredictedclassagainstclass
Identifyerrorsshownbyconfusionmatrix

45

Lesson1.6:Visualizingyourdata

Getdownanddirtywithyourdata
Visualizeit
Cleanitupbydeletingoutliers
Lookatclassificationerrors
(theresafilterthatallowsyoutoaddclassificationsasanew
attribute)

Coursetext
Section11.2 Visualization
46

DataMiningwithWeka
DepartmentofComputerScience
UniversityofWaikato
NewZealand

CreativeCommonsAttribution3.0Unported License

creativecommons.org/licenses/by/3.0/

weka.waikato.ac.nz

You might also like