You are on page 1of 13

esk vysok uen technick v Praze

Fakulta informanch technologi

Katedra teoretick informatiky


Evropsk soci ln fond Praha ! E"# $nvestu%eme do va& 'udoucnosti

MI-ADM Algorithms of data mining(2010/2011)

Seminar: Rapid Miner eginner!s g"ide


Jan ern, FIT, Czech Technical University in Prague
1

Installing Rapid Miner


Appli#ation do$nload Do$nload and install from Rapid home page http://rapid-i%#om/ So"r#es do$nload &he#'o"t from repositor(
https://rapidminer%s)n%so"r#eforge%net/s)nroot/rapidminer/*ega/

Add tools%+ar (from (o"r lo#al ,D-) as pro+e#t dependen#(% R"n ant "ild s#ript% R"n Rapid miner $ith #lass RapidMiner./I%+a)a in pa#'age #om%rapidminer%g"i%
2

&reating an e0periment
Rapid Miner "ses nested graphs to des#ri e the 'no$ledge flo$ pro#ess% 1his pro#ess #an #ontain loading data2 prepro#essing2 modeling "sing different t(pes of algorithms2 performan#e meas"ring2 report generating and so on%% In this e0ample $e $ill learn step ( step ho$ to #reate a 'no$ledge flo$ that $ill read data and performs a #ross-)alidation to test o"r model 3"alit(%

4perators
-no$ledge flo$s #onsists of 4perators $here ea#h ha)e gi)en n"m er of inp"ts and o"tp"ts $ith t(pe #he#'ing% 5a#h 4perator also ha)e its attri "tes $hi#h #an e set $hen (o" sele#t gi)en operator%

6earning simple model


6et!s #onstr"#t a simple 'no$ledge flo$ that $ill learn o"r model on all data and get its o"tp"t%
1his #onstr"#t $ill read data from arff file and passes them to the S"pport *e#tor Ma#hine model% 1he model is then send to the o"tp"t (the right side) $here $e #an )ie$ it in the report vie(%

7oti#e the red S*M inp"t and error messages in the pro lems dialog
5

6earning simple model


Most of the time (o" #an +"st "se Rapid Miner s"ggested fi0es and it $ill $or' fine% 1he first error tells "s that S*M #annot handle pol(nomial o"tp"t attri "tes and offer "s 8 fi0es:
1) &on)ert them to n"meri#al $hi#h is "sef"l if the attri "tes has defined distan#e to ea#h other (let!s sa( st"dent!s mar' (finite set from A to 9) - $e 'no$ that A is #loser to : than to & and so on%%) 2) &lassifi#ation ( regression $hi#h "ses 1 regression S*M model for ea#h o"tp"t ($e #an "se regression model to sol)e #lassifi#ation tas' $ith this option) 8) ;ol(nomial ( inominal #lassifi#ation $hi#h "ses inominal S*M #lassifier for ea#h #lass (to #lassif( into 2 #lasses m( #lass and others)%

<e add the la el from the a)aila le fi0es(la el identifies o"tp"t attri "te in o"r #ase o"tp"t attri "te is named #lass in arff files)2 sele#t &lassifi#ation ( inominal #lassifi#ation and see $hat happens to the 'no$ledge flo$%
6

6earning simple model


-no$ledge flo$ #hanged and one operator $as added to set the role of attri "te #lass to la el and one nested operator $as added to perform pol(nomial ( inominal #lassifi#ation% Inside of that operator is the logi# ehind #reation of the inominal #lassifiers2 in o"r #ase the S*M operator ((o" #an )ie$ it ( do" le #li#')% 7ested 4perators are identified ( s(m ol on the right ottom%

7ote: (o" ma( see same errors that (o" see here2 "t this is a "g of rapid miner and 'no$ledge flo$ $ill $or' normall(%

Res"lts - model
7o$ $e #an s$it#h to res"lts )ie$ and $e #an loo' at the model $hi#h $as #reated%

=ere $e #an see model des#ription "t $e $ant to also 'no$ its 3"alit( (ie error)% 9or that $e need to modif( the 'no$ledge flo$ e)en f"rther%
8

;erforman#e of model
6et!s meas"re performan#e of o"r model "sing 10 fold #ross)alidation% Add >-)alidation operator and pl"g it instead of the pol(nomial ( inominal #lassifi#ation (;:&) operator% 1hen #"t the ;:& operator and paste it inside the learning )alidation part%

As (o" #an see )alidation has 2 parts inside and a"tomati#all( di)ides the data% 4ne is e0e#"ted $hen model is learned and train data are passed into the inp"t% After learning the model testing part is e0e#"ted and model is tested on the test data%
9

;erforman#e of model

1he testing part "ses Appl( Model operator $hi#h gets o"tp"t from gi)en model on gi)en data follo$ed ( ;erforman#e operator $hi#h #omp"tes )ario"s statisti#s on the o"tp"t of the model% <e are mainl( interested in #lassifi#ation a##"ra#( "t (o" #an sele#t an( other meas"re a)aila le%

10

Res"lts - a##"ra#(
7o$ $e see a##"ra#( of o"r model in#l"ding #onf"sion matri0

11

6oop
7o$2 $e are going to modif( o"r 'no$ledge flo$ to ma'e some statisti#all( signifi#ant e0periment $e need to repeat them large n"m er of times% .o to the top le)el of 'no$ledge flo$2 insert 6oop operator instead the >-)alidation operator and #"t-paste the )alidation operator inside the loop% ?o" #an see that the last line #oming from loop is do" led% 1hat indi#ates it has m"ltiple )al"es in it% So $e #an a)erage a##"ra#( )al"es from different r"ns "sing A)erage operator%

12

-no$ledge flo$ o)er)ie$

13