0 Up votes0 Down votes

6 views42 pagesa

Sep 11, 2013

© Attribution Non-Commercial (BY-NC)

PDF, TXT or read online from Scribd

a

Attribution Non-Commercial (BY-NC)

6 views

a

Attribution Non-Commercial (BY-NC)

- 5_Recognizing Human Actions_ a Local SVM Approach_2004
- Video and Image Processing for Finding Paint Defects using BeagleBone Black
- 2-1
- 9_ASAI_2012
- Validation of the WHO Self-reporting Questionnaire
- IJERTV3IS120479
- IJETR041390
- AI
- Training Data Selection for SVM
- 1-s2.0-S0168169914001458-mainghjgh
- UNSUPERVISED OUTLIER ENSEMBLE DETECTION WITH APPLICATIONS TO THE NIGERIAN FOOTBALL LEAGUE
- 1-s2.0-S1877050917322494-main
- Cm 31588593
- Long-term Prediction Model of Rockburst in Underground Openings Using Heuristic Algorithms and Support Vector Machines
- 10-Weighted-LVQ.pdf
- Deak Krisztian-failure Diagnostics With Svm in Machine Maintenance Engineering
- Fath on y 2017 Adversarial
- yliang07
- 4.pdf
- Prognostic Value of Leptin

You are on page 1of 42

by Tan, Steinbach, Kumar

z When dealing with class imbalance it is often useful to look at recall and precision separately

PREDICTED CLASS

Class=Yes Class=Yes Class=No b (FN) d (TN) a (TP) c (FP)

a TP = z Recall = a + b TP + FN a TP = z Precision = a + c TP + FP

a+d TP + TN = a + b + c + d TP + TN + FP + FN

z F combines recall and precision into one number

z It equals the harmonic mean of recall and precision

2rp 2 = r + p 1/ r +1/ p

z Your book calls it the F1 measure because it weights both recall and precision equally z See http://en.wikipedia.org/wiki/Information_retrieval

z ROC stands for Receiver Operating Characteristic z Since we can turn up or turn down the number of observations being classified as the positive class, we can have many different values of true positive rate (TPR) and false positive rate (FPR) for the same classifier. TP FP TPR= TP + FN FPR= FP + TN z The ROC curve plots TPR on the y-axis and FPR on the x-axis

z The ROC curve plots TPR on the y-axis and FPR on the x-axis z The diagonal represents random guessing z A good classifier lies near the upper left z ROC curves are useful for comparing 2 classifiers z The better classifier will lie on top more often z The Area Under the Curve (AUC) is often used a metric

In class exercise #32: This is textbook question #17 part (a) on page 322. It is part of your homework so we will not do all of it in class. We will just do the curve for M1.

In class exercise #33: This is textbook question #17 part (b) on page 322.

z Decision trees are just one method for classification z We will learn additional methods in this chapter: - Nearest Neighbor - Support Vector Machines - Bagging - Random Forests - Boosting - Nave Bayes

z You can use nearest neighbor classifiers if you have some way of defining distances between attributes z The k-nearest neighbor classifier classifies a point based on the majority of the k closest training points

10

z Here is a plot I made using R showing the 1-nearest neighbor classifier on a 2-dimensional data set.

11

z Nearest neighbor methods work very poorly when the dimensionality is large (meaning there are a large number of attributes) z The scales of the different attributes are important. If a single numeric attribute has a large spread, it can dominate the distance metric. A common practice is to scale all numeric attributes to have equal variance. z The knn() function in R in the library class does a knearest neighbor classification using Euclidean distance.

12

In class exercise #34: Use knn() in R to fit the 1-nearest-neighbor classifier to the last column of the sonar training data at

http://sites.google.com/site/stats202/data/sonar_train.csv

Use all the default values. Compute the misclassification error on the training data and also on the test data at

http://sites.google.com/site/stats202/data/sonar_test.csv

13

In class exercise #34: Use knn() in R to fit the 1-nearest-neighbor classifier to the last column of the sonar training data at

http://sites.google.com/site/stats202/data/sonar_train.csv

Use all the default values. Compute the misclassification error on the training data and also on the test data at

http://sites.google.com/site/stats202/data/sonar_test.csv

14

In class exercise #34: Use knn() in R to fit the 1-nearest-neighbor classifier to the last column of the sonar training data at

http://sites.google.com/site/stats202/data/sonar_train.csv

Use all the default values. Compute the misclassification error on the training data and also on the test data at

http://sites.google.com/site/stats202/data/sonar_test.csv

15

z If the two classes can be separated perfectly by a line in the x space, how do we choose the best line?

16

z If the two classes can be separated perfectly by a line in the x space, how do we choose the best line?

17

z If the two classes can be separated perfectly by a line in the x space, how do we choose the best line?

18

z If the two classes can be separated perfectly by a line in the x space, how do we choose the best line?

19

z If the two classes can be separated perfectly by a line in the x space, how do we choose the best line?

20

z One solution is to choose the line (hyperplane) with the largest margin. The margin is the distance between the two parallel lines on either side.

B1

B2 b21 b22

margin

b11 b12

21

z Here is the notation your book uses:

r r w x + b = 0 r r w x + b = 1 r r w x + b = +1

1 r f (x) = 1

r r if w x + b 1 r r if w x + b 1

22

z This can be formulated as a constrained optimization problem. z We want to maximize

r 2 z This is equivalent to minimizing L( w) = || w || 2

1 r f ( xi ) = 1

r r if w x i + b 1 r r if w x i + b 1

z So we have a quadratic objective function with linear constraints which means it is a convex optimization problem and we can use Lagrange multipliers

23

z What if the problem is not linearly separable?

r 2 || w || N k + C i Minimize L( w) = 2 i =1

1 r Subject to f ( x i) = 1 r r if w x i + b 1 - i r r if w x i + b 1 + i

24

z What if the boundary is not linear?

z Then we can use transformations of the variables to map into a higher dimensional space

25

z The function svm in the package e1071 can fit support vector machines in R z Note that the default kernel is not linear use kernel=linear to get a linear kernel

26

In class exercise #35: Use svm() in R to fit the default svm to the last column of the sonar training data at

http://sites.google.com/site/stats202/data/sonar_train.csv

Compute the misclassification error on the training data and also on the test data at

http://sites.google.com/site/stats202/data/sonar_test.csv

27

In class exercise #35: Use svm() in R to fit the default svm to the last column of the sonar training data at

http://sites.google.com/site/stats202/data/sonar_train.csv

Compute the misclassification error on the training data and also on the test data at

http://sites.google.com/site/stats202/data/sonar_test.csv

28

In class exercise #35: Use svm() in R to fit the default svm to the last column of the sonar training data at

http://sites.google.com/site/stats202/data/sonar_train.csv

Compute the misclassification error on the training data and also on the test data at

http://sites.google.com/site/stats202/data/sonar_test.csv

29

In class exercise #36: Use svm() in R with kernel="linear and cost=100000 to fit the toy 2-dimensional data below. Provide a plot of the resulting classification rule. x y x

1 2

-1 -1 -1 -1 -1 1 1 1 1 1

30

In class exercise #36: Use svm() in R with kernel="linear and cost=100000 to fit the toy 2-dimensional data below. Provide a plot of the resulting classification rule. x y x

1 2

Solution:

-1 -1 -1 -1 -1 1 1 1 1 1

31

In class exercise #36: Use svm() in R with kernel="linear and cost=100000 to fit the toy 2-dimensional data below. Provide a plot of the resulting classification rule. x y x

1 2

Solution (continued):

-1 -1 -1 -1 -1 1 1 1 1 1

fit<-svm (x,y,kernel="linear",cost=100000) big_x<-matrix(runif(200000),ncol=2,byrow=T) points(big_x,col=rgb(.5,.5, .2+.6*as.numeric(predict(fit,big_x)==1)),pch=19 ) points(x pch=19 col=2*as numeric(y) cex=2)

32

1 2

Solution (continued):

-1 -1 -1 -1 -1 1 1 1 1 1

33

z Ensemble methods aim at improving classification accuracy by aggregating the predictions from multiple classifiers (page 276) z One of the most obvious ways of doing this is simply by averaging classifiers which make errors somewhat independently of each other

34

In class exercise #37: Suppose I have 5 classifiers which each classify a point correctly 70% of the time. If these 5 classifiers are completely independent and I take the majority vote, how often is the majority vote correct for that point?

35

In class exercise #37: Suppose I have 5 classifiers which each classify a point correctly 70% of the time. If these 5 classifiers are completely independent and I take the majority vote, how often is the majority vote correct for that point? Solution (continued): 10*.7^3*.3^2 + 5*.7^4*.3^1 + .7^5 or 1-pbinom(2, 5, .7)

36

In class exercise #38: Suppose I have 101 classifiers which each classify a point correctly 70% of the time. If these 101 classifiers are completely independent and I take the majority vote, how often is the majority vote correct for that point?

37

In class exercise #38: Suppose I have 101 classifiers which each classify a point correctly 70% of the time. If these 101 classifiers are completely independent and I take the majority vote, how often is the majority vote correct for that point? Solution (continued): 1-pbinom(50, 101, .7)

38

z Ensemble methods include -Bagging (page 283) -Random Forests (page 290) -Boosting (page 285) z Bagging builds different classifiers by training on repeated samples (with replacement) from the data z Random Forests averages many trees which are constructed with some amount of randomness z Boosting combines simple base classifiers by upweighting data points which are classified incorrectly

39

z One way to create random forests is to grow decision trees top down but at each node consider only a random subset of attributes for splitting instead of all the attributes z Random Forests are a very effective technique z They are based on the paper

L. Breiman. Random forests. Machine Learning, 45:5-32, 2001

z They can be fit in R using the function randomForest() in the library randomForest

40

In class exercise #39: Use randomForest() in R to fit the default Random Forest to the last column of the sonar training data at

http://sites.google.com/site/stats202/data/sonar_train.csv

http://sites.google.com/site/stats202/data/sonar_test.csv

41

In class exercise #39: Use randomForest() in R to fit the default Random Forest to the last column of the sonar training data at

http://sites.google.com/site/stats202/data/sonar_train.csv

http://sites.google.com/site/stats202/data/sonar_test.csv

Solution:

install.packages("randomForest") library(randomForest) train<-read.csv("sonar_train.csv",header=FALSE) test<-read.csv("sonar_test.csv",header=FALSE) y<-as.factor(train[,61]) x<-train[,1:60] y_test<-as.factor(test[,61]) x_test<-test[,1:60] fit<-randomForest(x,y) 1-sum(y_test==predict(fit,x_test))/length(y_test)

42

- 5_Recognizing Human Actions_ a Local SVM Approach_2004Uploaded byAnkit Sharma
- Video and Image Processing for Finding Paint Defects using BeagleBone BlackUploaded byAnonymous kw8Yrp0R5r
- 2-1Uploaded bySaurabh Tiwari
- 9_ASAI_2012Uploaded byrocketzhen
- Validation of the WHO Self-reporting QuestionnaireUploaded byLUTFIYAH
- IJERTV3IS120479Uploaded byanon_18596923
- IJETR041390Uploaded byerpublication
- AIUploaded byKhurram Aziz
- Training Data Selection for SVMUploaded byPawan Singh
- 1-s2.0-S0168169914001458-mainghjghUploaded byayakashimajorin
- UNSUPERVISED OUTLIER ENSEMBLE DETECTION WITH APPLICATIONS TO THE NIGERIAN FOOTBALL LEAGUEUploaded byuduak
- 1-s2.0-S1877050917322494-mainUploaded byAnna Jones
- Cm 31588593Uploaded byAnonymous 7VPPkWS8O
- Long-term Prediction Model of Rockburst in Underground Openings Using Heuristic Algorithms and Support Vector MachinesUploaded byhnavast
- 10-Weighted-LVQ.pdfUploaded byFadhillah Azmi
- Deak Krisztian-failure Diagnostics With Svm in Machine Maintenance EngineeringUploaded bybhaskar
- Fath on y 2017 AdversarialUploaded byMochamad Asryl Aziz
- yliang07Uploaded byMarcos Antonio Sanchez Gonzalez
- 4.pdfUploaded bynadeemq_0786
- Prognostic Value of LeptinUploaded bysorrycoi
- A Hybrid Bankruptcy Prediction Model With Dynamic Loadings on Acct-ratio-based and Market-based InfoUploaded byTitoHeidyYanto
- 1. IJPR - An Intelligent Approach for Backbending Modeling of Odd Mass NucleiUploaded byAnonymous YXUpyMj
- Fatima 2017Uploaded byZim Shah
- Sentiment Analysis of Yelp's Ratings Based on Text ReviewsUploaded byKepheusCian
- NFLUploaded byshrieks8
- Image Based Recognition - Recent Challenges and Solutions Illustrated on ApplicationsUploaded bymlaij
- Slide Baca BiomarkerUploaded byLingga Surya
- 2018,6Uploaded byGonzalo René Ccama Ramos
- SolventsUploaded byswathi16021991
- research article for soft computing.docxUploaded bymani85a

- Johnny Bunko Career GuideUploaded byDanielus Medrano
- Deep Learning ChatbotUploaded byjeromeku
- Largescale3DUploaded byjeromeku
- 01 SummaryUploaded byjeromeku
- Mobile RoboticsUploaded byjeromeku
- How Change Happens Consultation Draft 281015 EnUploaded byjeromeku
- Industrial DLUploaded byjeromeku
- AI 10 to WatchUploaded byjeromeku
- 101 AlphasUploaded bysovacap
- Zhang - Thinking Analysis to the Process of Mathematical CreativityUploaded byjeromeku
- lecun2015Uploaded byJohn Sun
- Targetted LearningUploaded byjeromeku
- UnicornGraphSearchDatabaseUploaded byPiyush Ahuja
- Med Device ParkinsonsUploaded byjeromeku
- Math Mind BendersUploaded byjeromeku
- Deep Speech 2Uploaded byjeromeku
- S5715 Keynote Jen Hsun HuangUploaded byjeromeku
- SalesUploaded byjeromeku
- L27.fa12Uploaded byjeromeku
- FuelUploaded byjeromeku
- Mapping the Connected WorUploaded byjeromeku
- Analytics the Real World Use of Big Data in Financial Services Mai 2013Uploaded byjeromeku
- Systems of Automation WilUploaded byjeromeku
- WW IoT Taxonomy 2013Uploaded byjeromeku
- Can High-Frequency Trading Drive the Stock Market Off a CliffUploaded byjeromeku

- Cluster AnalysisUploaded byMani Kanta
- tsvarnormUploaded byStephen Mailu
- Data Analyst CV TemplateUploaded bydhiraj_sonawane_1
- Advanced Data Analytics by UnisysUploaded byGreg Otto
- Automatic Post Data Acquisition Data Analysis by MassHunter Quantitative Analysis Software and Report PrintingUploaded byinfinity1200
- 2011 Course Brochure InUploaded byVijay Bhilwade
- Analyzing Compositional Data With RUploaded byLubomira Macheva
- 11E Chapter 14(1)Uploaded byslade
- Adel Lahsasna, Raja Noor Ainon, And Teh Ying Wah - Credit Scoring Models Using SoftUploaded byhenrique_oliv
- Business Stats Session 7-8Uploaded byLibin R Philip
- Project Report On Ratio AnalysisUploaded bySiddhi
- OLSUploaded byNiki
- def.pdfUploaded byIgwe Victor
- Impact of Advertisement on Children BehaviourUploaded byWhitney Campbell
- THE CUSTOMER PERCEPTION ON WASTE MANAGEMENT OF THE STALLS AT THE MALOLOS FOOD COURT.Uploaded byIJAR Journal
- 6 methods of data collection.pdfUploaded byKristine Agustin
- Mini_project_1_T5 (1).docxUploaded bysreenu
- wqgewqg.txtUploaded byRăzvanAna
- Materi Kuliah METEDOLOGI PENELITIAN -CONTOH HASIL UJI KLASIKUploaded byAnisa Ayu Adisti M
- Interpreting p ValuesUploaded byglivanos
- Mediation and Multi-group ModerationUploaded byAmeen Khan
- Trimester_III syllabus 2010-11Uploaded byAnshu Jain
- Day 5 - Measures of VariabilityUploaded byLianne Sedurifa
- HA VII on MLRUploaded byMuhammed Ammachandy
- Exercise Stat3Uploaded byTejashwi Kumar
- Multivariatestatistics OverUploaded byyiimusic
- 13063_2017_Article_2275.pdfUploaded byDaniela Quilodrán
- bryde2013Uploaded byAland Hasbi
- Felleisen, Matthias_ Findler, Robert Bruce_ Flatt, Matthew_ Krishnamurthi, Shriram - How to Design Programs_ an Introduction to Programming and Computing-The MIT Press (2018)Uploaded byK Bhargava
- Reliabilitas Dan Validitas ItemUploaded byYusvina Mahza

## Much more than documents.

Discover everything Scribd has to offer, including books and audiobooks from major publishers.

Cancel anytime.