0 Up votes0 Down votes

0 views7 pagesSVM. Machine Learning. KNN

Dec 28, 2018

© © All Rights Reserved

PDF, TXT or read online from Scribd

SVM. Machine Learning. KNN

© All Rights Reserved

0 views

SVM. Machine Learning. KNN

© All Rights Reserved

- Neuromancer
- The E-Myth Revisited: Why Most Small Businesses Don't Work and
- How Not to Be Wrong: The Power of Mathematical Thinking
- Drive: The Surprising Truth About What Motivates Us
- Chaos: Making a New Science
- The Joy of x: A Guided Tour of Math, from One to Infinity
- How to Read a Person Like a Book
- Moonwalking with Einstein: The Art and Science of Remembering Everything
- The Wright Brothers
- The Other Einstein: A Novel
- The 6th Extinction
- The Housekeeper and the Professor: A Novel
- The Power of Discipline: 7 Ways it Can Change Your Life
- The 10X Rule: The Only Difference Between Success and Failure
- A Short History of Nearly Everything
- The Kiss Quotient: A Novel
- The End of Average: How We Succeed in a World That Values Sameness
- Made to Stick: Why Some Ideas Survive and Others Die
- Algorithms to Live By: The Computer Science of Human Decisions
- The Universe in a Nutshell

You are on page 1of 7

Danilo Bzdok, Martin Krzywinski, Naomi Altman. Machine learning: Supervised methods, SVM and

kNN. Nature Methods, Nature Publishing Group, 2018, pp.1-6. <hal-01657491>

https://hal.archives-ouvertes.fr/hal-01657491

Submitted on 6 Dec 2017

archive for the deposit and dissemination of sci- destinée au dépôt et à la diffusion de documents

entific research documents, whether they are pub- scientifiques de niveau recherche, publiés ou non,

lished or not. The documents may come from émanant des établissements d’enseignement et de

teaching and research institutions in France or recherche français ou étrangers, des laboratoires

abroad, or from public or private research centers. publics ou privés.

POINTS OF SIGNIFICANCE

Supervised learning algorithms extract general principles from

observed examples guided by a specific prediction objective.

In supervised learning, a set of input variables, such as blood

metabolite or gene expression levels, are used to predict a

quantitative response variable like hormone level or a qualitative one

such as healthy versus diseased individuals. We have previously

discussed several supervised learning algorithms, including logistic

regression and random forests, and their typical behaviors with

different sample sizes and numbers of predictor variables. This

month, we look at two very common supervised methods in the

context of machine learning: linear support vector machines (SVM)

and k-nearest neighbors (kNN). Both have been successfully applied

to challenging pattern-recognition problems in biology and medicine

[1].

SVM and kNN exemplify several important trade-offs in machine

learning (ML). SVM is less computationally demanding than kNN

and is easier to interpret but can identify only a limited set of patterns.

On the other hand, kNN can find very complex patterns but its output

is more challenging to interpret. To illustrate both algorithms, we will

apply them to classification because they tend to perform better at

predicting categorical outputs (e.g., health versus disease) than at

approximating target functions with numeric outputs (e.g., hormone

level). Both learning techniques can be used to distinguish many

classes at once, use multiple predictors and obtain probabilities for

each class membership.

We’ll illustrate SVM using a two-class problem and begin with a case

in which the classes are linearly separable, meaning that a straight

line can be drawn that perfectly separates the classes with the margin

being the perpendicular distance between the closest points to the

line from each class (Fig. 1a). Many such separating lines are

possible and SVM can be used to find one with the widest margin

(Fig. 1b). When three or more predictors are used, the separating

line becomes a hyperplane, but the algorithm remains the same. The

closest points to the line are called support vectors [1] and are the

only points that ultimately influence the position of the separating

1

line—any points that are further from the line can be moved, removed

or added with no impact on the line. When the classes are linearly

separable, the wider the margin, the higher our confidence in the

classification because it indicates that the classes are less similar.

width of a margin that separates the classes. (a) Points from two classes (grey, blue)

that are perfectly separable by various lines (black) illustrate the concept of a margin

(light yellow highlight), which is the rectangular region that extends from the

separating line to the perpendicularly closest point. (b) An SVM finds the line (black)

that has the largest margin (0.48). Points at the margin’s edge (black outlines) are

called support vectors—the margin is not influenced by moving or adding other

points outside it. (c) Imposing a separating line on linearly non-separable classes will

incur margin violations and misclassification errors. Data same as in (b) but with

two additional points added (those that are misclassified). The margin is now 0.64

with 6 support vectors.

Practically, most data sets are not linearly separable and any

separating line will result in misclassification, no matter how narrow

the margin is. We say that the margin is violated by a sample if it is on

the wrong side of the separating line (Fig 1c, red arrows) or is on the

correct side, but is within the margin (Fig. 1c, orange arrow).

Even when the data are linearly separable, allowing a few points to be

misclassified might improve the classifier by allowing a wider margin

for the bulk of the data (Fig. 2a). To handle violations, we impose a

penalty proportional to the distance between each violating point [1]

and the separating line, with non-violating points having zero

penalty. In SVM, the separating line is chosen by minimizing 1/m+C

∑pi where m is the margin width, pi is the penalty for each point and

C is a hyper-parameter (a parameter used to tune the overall fitting

behavior of an algorithm) balancing the trade-off between margin

2

width and misclassification. A point that has a non-zero penalty is

considered a support vector because it impacts the position of the

separating line and its margin.

When C is large, the margin width has a low impact on the

minimization and the line is placed to minimize the sum of the

violation penalties (Fig. 2, C = 1000). When C is decreased, the

misclassified points have lower impact and the line is placed with

more emphasis on maximizing the margin. (Fig. 2, C = 50 and C =

5). When C is very small, classification penalties become insignificant

and the margin can be encouraged to actually grow to encompass all

points. Typically, C is chosen using cross-validation [2].

Recall we showed previously [3] how regularization can be used to

guard against overfitting which occurs when the prediction equation

is too closely tailored to random variation in the training set. In that

sense, the role of C is similar, except here it tunes the fit by adjusting

the balance of terms being minimized rather than the complexity of

the shape of the boundary. Large values of C force the separating line

to adjust to data far from the center of each class and thus encourage

overfitting. Small values tolerate many margin violations and

encourage underfitting.

Figure 2 | The balance between the width of the margin and penalties for margin

violations is controlled by a regularization parameter, C. Smaller values of C places

more weight on margin width and less on classification constraints.

We can avoid the explicit assumption of a linear class boundary by

using the k-nearest neighbours (kNN) algorithm. This algorithm

determines the class of an unclassified point by counting the majority

class vote from its k-nearest neighbour training points (Fig. 3a). For

example, a patient whose symptoms closely match those of patients

3

with a specific diagnosis would be classified with the same disease

status. Because kNN does not assume a particular boundary between

the classes, its boundary can be closer to the “true” relationship.

However, for a given training set, predictions may be less stable than

for SVMs, especially when k is small, and the algorithm will often

overfit the training data.

The value of k acts to regularize kNN, analogous to C in SVM and is

generally selected by cross-validation. To avoid ties in the vote, k can

be chosen to be odd. Small k gives a finely-textured boundary which is

sensitive to outliers and yields a high model variance (k = 3, Fig. 3b).

Larger k gives more rigid boundaries and high model bias (k = 7, Fig.

3b), pooling the effect of more distant neighbors. The largest possible

value of k is the number of training points—at this extreme, any new

observation is classified based on the majority in the entire training

sample incurring maximal model bias.

Figure 3 | Illustration of the k-nearest neighbours (kNN) classifier. (a) kNN assigns

a class to an unclassified point (black) based on a majority vote of the k nearest

neighbours in the training set (grey and blue points). Shown are cases for k = 1, 3, 5

and 7; the k neighbours are circumscribed in the circle, which is colored by the

majority class vote. (b) For k = 3, the kNN boundaries are relatively rough

(calculated by classifying each point in the plane) and give 10% misclassifications.

The SVM separating line (black) and margin (dashed) is also shown for C = 1000

yielding 15% misclassification. As k is increased (here, k = 7, 13% misclassifications),

single misclassifications have less impact on the emerging boundary, which becomes

smoother.

Neither SVM nor kNN make explicit model specifications about the

the data-generating process such as normality of the data. However,

linear SVM is considered a parametric method because it can only

produce linear boundaries. If the true class boundary is nonlinear,

SVM will struggle to find a satisfying fit even with increased size of

4

the training set. To help the algorithm capture nonlinear boundaries,

functions of the input variables, such as polynomials, could be added

to the set of predictor variables [1]. This extension of the algorithm is

called kernel SVM.

In contrast, kNN is a nonparametric algorithm because it avoids a

priori assumptions about the shape of the class boundary and can

thus adapt more closely to nonlinear boundaries as the amount of

training data increases. kNN has higher variance than linear SVM but

it has the advantage of producing classification fits that adapt to any

boundary. Even though the true class boundary is unknown in most

real-world applications, kNN has been shown to approach the

theoretically optimal classification boundary as the training set

increases to massive data [1]. However, because kNN does not impose

any structure on the boundary, it can create class boundaries that

may be less interpretable than those of linear SVM. The simplicity of

the linear SVM boundary also lends itself more directly to formal tests

of statistical significance that give P values for the relevance of

individual variables.

There are also trade-offs in the number of samples and the number of

variables that can be handled by these approaches. SVM can achieve

good prediction accuracy for new observations despite large numbers

of input variables. SVM therefore serves as an off-the-shelf technique

that is frequently used in genome-wide analysis and brain imaging,

two application domains with often low sample sizes (e.g., hundreds

of participants) but very high numbers of inputs (e.g., hundreds of

thousands of genes or brain locations).

By contrast, the classification performance of kNN rapidly

deteriorates when searching for patterns using high numbers of input

variables [1] when many of the variables may be unrelated to the

classification or contribute only small amounts of information.

Because equal attention is given to all variables, the nearest neighbors

may be defined by irrelevant variables. This so-called curse of

dimensionality occurs for many algorithms that become more flexible

as the number of predictors increases [1].

Finally, computation and memory resources are important practical

considerations [4]. SVM only needs a small subset of training points

(the support vectors) to define the classification rule, making it often

more memory efficient and less computationally demanding when

inferring the class of a new observation. In contrast, kNN typically

5

requires higher computation and memory resources because it needs

to use all input variables and training samples for each new

observation to be classified.

The authors declare no competing financial interests.

Danilo Bzdok, Martin Krzywinski & Naomi Altman

[1] Hastie, T., Tibshirani, R., Friedman, J. Springer Series in

Statistics, Heidelberg (2001).

[2] Lever, J., Krzywinski, M. & Altman, N. (2016) Points of

significance: Model Selection and Overfitting. Nature Methods

13:703–704.

[3] Lever, J., Krzywinski, M. & Altman, N. (2016) Points of

significance: Regularization. Nature Methods 13:803–804.

[4] Bzdok, D. & Yeo, B.T.T. (2017) NeuroImage, 155: 549-564.

Psychiatry, RWTH Aachen University, in Germany and a Visiting

Professor at INRIA/Neurospin Saclay in France. Martin Krzywinski

is a staff scientist at Canada’s Michael Smith Genome Sciences

Centre. Naomi Altman is a Professor of Statistics at The

Pennsylvania State University.

- Managing Intrusion Detection Alerts Using Support Vector MachinesUploaded byAI Coordinator - CSC Journals
- 10.1.1.301.1852.pdfUploaded byAratiKothari
- Privacy Preservation Techniques in Data MiningUploaded byesatjournals
- Learning Vector Quantization (LVQ) and K-Nearest Neighbor for Intrusion ClassificationUploaded byWorld of Computer Science and Information Technology Journal
- Ew 36913917Uploaded byAnonymous 7VPPkWS8O
- SVM-DSmT Combination for Off-Line Signature Verification, by Nassim Abbas and Youcef ChibaniUploaded byAnonymous 0U9j6BLllB
- Classification Algorithm Using Random Concept on a Very Large Data Set a SurveyUploaded byIJMTER
- sensors-14-10691Uploaded bysimun3332000
- Download 2Uploaded bysargam111111
- PptUploaded byShreyans Daga
- kdeUploaded bypostscript
- 17ICEMTE-COMP68Uploaded byBhaskar Rao P
- Data Miner Junior Poster CopyUploaded bySam Low
- 279.pdfUploaded byDuc Nhaty
- PaMIUploaded byDavid Paredes
- 58 1514555629_29-12-2017.pdfUploaded byAnonymous lPvvgiQjR
- 34Uploaded byMerve Topaloğlu
- International Journal on Natural Language Computing (IJNLC)Uploaded byAnonymous qwgN0m7oO
- Bag of FeaturesUploaded byworm123_123
- An Adaptive Appropach Foe Creating Behaviour Profiles And Recognizing Computer UsersUploaded byseventhsensegroup
- 1-s2.0-S0020025506003227-mainUploaded byfrangky200787
- PhoPHONOCARDIOGRAM BASED DIAGNOSTIC SYSTEMnocardiogram Based Diagnostic SystemUploaded byAnonymous UXBSV13c
- W01_29Uploaded byadryyandc
- Alanazi Et Al. - 2010 - Intrusion Detection System OverviewUploaded byGuntur Tif
- Andrew Rosenberg- Support Vector Machines (and Kernel Methods in general)Uploaded byRoots999
- ReadmeUploaded byMuhammad Ulum
- Digital Image Processing Assessment in Multi Slice CT Angiogram using Liner, Non-Liner and Both ClassifiersUploaded byAnonymous 7VPPkWS8O
- Maxent DualUploaded byHiền Trần
- Kernel Based Structured Svm For Speech RecognitionUploaded byseventhsensegroup
- 08a Svm RankingUploaded bynamnh211

- Analysis of the Mechanized CuttingUploaded byRodrigo Maia
- Simulation of the Orthogonal Metal Cutting Process Using an Arbitrary Lagrangian–Eulerian Finite-element MethodUploaded byRodrigo Maia
- Machining FEM Model of Long Fiber Composites for Aeronautical ComponentsUploaded byRodrigo Maia
- Finite Element Simulation of Low Velocity Impact Damage in Composite Laminates-Uploaded byRodrigo Maia
- Mech-HT_13.0_L01_Introduction.pdfUploaded byRodrigo Maia
- Quality of Cut and Basecutter Blade Confi Guration for the Mechanized Harvest OfUploaded byRodrigo Maia
- Tensile, Flexural and Torsional Properties of Chemically Treated Alfa, Coir and Bagasse Reinforced PolypropyleneUploaded byRodrigo Maia
- Simulation Study of Cutting Sugarcane Using Fine Sand Abrasive WaterjeUploaded byRodrigo Maia
- 4-Sugarcane Straw Harvest Effects on Soil Quality and Plant GrowthUploaded byRodrigo Maia
- 5-Effect of Reducing the Preharvest Burning of Sugar Cane on Respiratory Health in BrazilUploaded byRodrigo Maia
- Effect of Blade Oblique Angle and Cutting SpeedUploaded byRodrigo Maia
- RNA-dependent RNA targeting by CRISPR-Cas9Uploaded byRodrigo Maia
- Vibrational Fatigue and Structural Dynamics for Harmonic and Random LoadsUploaded byRodrigo Maia
- 1-s2.0-S0022460X12009911-main.pdfUploaded byRodrigo Maia
- Effect of vibration frequency on agonist and antagonist arm muscle activityUploaded byRodrigo Maia
- Measurement and evaluation of human exposure to vibration transmitted to hand-arm system during leisure cyclist activityUploaded byRodrigo Maia
- Assessment of the Uncertainty in Human Exposure to Vibration: An Experimental StudyUploaded byRodrigo Maia
- A review of the past, present and future of the European loss prevention and safety promotion in the process industrieUploaded byRodrigo Maia
- Methods and models in process safety and risk management: Past, present and futureUploaded byRodrigo Maia
- Mech-HT 13.0 WS02 Heating CoilUploaded byRodrigo Maia
- Mech-HT 13.0 WS01 Thermal BarUploaded byRodrigo Maia
- Introduction to the Finite Element Method - ReddyUploaded byRodrigo Maia
- 1-s2.0-S0921509310008439-mainUploaded byRodrigo Maia
- 1-s2.0-S0020768305003409-main.pdfUploaded byAlexandre Assis Rezende Santos
- Vehicle DynamcisUploaded byapi-3798479
- Ansys Mechanical Suite Brochure 14.0Uploaded byRehan Ali
- 52859.pdfUploaded byRodrigo Maia
- kaynak icin2Uploaded byFatih Can

- ch8Uploaded bysahil_p15
- 6692289 ITiL Foundation Samples 800Uploaded byAlejandro Garcia Amaya
- Serway 9th edition solution Chapter 31Uploaded byNurrahmisr
- Condition Monitoring EssentialsUploaded by4358213
- Orbital Debris VisualizationUploaded byPratik Mevada
- ExercisesUploaded byCristine Ace
- Crossword 1Uploaded byGody
- Power Device AssignmentUploaded byMuhammad Faiz
- MIO Soul Catalog PartsUploaded bynchez
- Selenium 2.0 and WebDriverUploaded byChandra Sekhar K
- adcUploaded byNatasha Morris
- j-scjp-a4Uploaded byapi-3730216
- MIT6 832s09 Exam01 PracticeUploaded byPiero Raurau
- New Leaf Response to RFP - JLM Industries: Development & Adoption of Workplace Wellness ServicesUploaded byKaryn N. Lewis
- Motorola GM338 User GuideUploaded byTaj Deluria
- ECE290_Chap_00Uploaded byhagasy
- CSR activity and reward from csr of British mericn tobacco bangladeshUploaded bySaddam Hossain
- Manual of Playback SoftwareUploaded byArshad Jummani
- m5zn_01185f28b979e22 (1).pdfUploaded bysonurashu
- Resume Khairil Faizi IdrisUploaded byzabidi_78
- chemreactionswkstUploaded byapi-284752912
- UR W28 Welding Procedure Qualification Tests of Steels(Rev.2 Mar 2012)Uploaded byDagoberto Aguilar
- %5CMSDS%5CMettler%5C84uS Conductivity 84Uploaded bydediprayogi
- How to Combine Multiple Workbooks Into One Excel WorkbookUploaded byDante Bauson Julian
- IDC SimonUploaded byfati_boumahdi75
- Liquid ExtractionUploaded byhande_kmrl
- IT-1200S.pdfUploaded byImran Shaukat
- Boolean Search StringsUploaded bykenneth_infante
- APPSC Group 1 Exam Eligibility Details & Salary, Pay Scale of Group 1 Posts _APPSC Material, Group 1 Group 2 Notification, UPSC, Bank PO, IBPS, General Studies MaterialUploaded byRatna Kishore Peetha
- Minn Kota 2013 CatalogUploaded bysurpriseleft

## Much more than documents.

Discover everything Scribd has to offer, including books and audiobooks from major publishers.

Cancel anytime.