Principles of Support Vector Machine

© All Rights Reserved

77 views

Principles of Support Vector Machine

© All Rights Reserved

- Classification and Bayes Rule, Naïve Bayes
- A support vector machine classifier with rough set-based feature selection for breast cancer diagnosis
- CSE176 Introduction to Machine Learning
- Data Mining in Image Processing and it’s Application
- Data Warehousing and Data Mining Lab
- Spam Email Classification 2
- Bajju End Sem Report
- Predicting food demand in food courts by decision tree approaches
- Data Mining Classification Methods for Pediatric Records of Fujairah Hospital
- Contextual Learning Approach to Improve Diagnostic Accuracy for Hybrid (Lung, HIV and Heart) Diseases
- Gomez Jorge Project
- Cs4758 Svm Notes
- Comparative Study on BBA Determination Using Different Distances of Interval Numbers
- A Technique for Offline Handwritten Character Recognition
- [IJCST-V3I3P43]: Er. Manju rani, Er. Lekha Bhambhu,
- Sharda Bia10e Tif 06
- cs229-notes3
- Boilerplate Detection using Shallow Text Features
- Summary
- research

You are on page 1of 26

classification

SVM is a pattern recognition method that is used widely in data mining applications, and

provides a means of supervised classification, as do SIMCA and LDA. SVM was originally

developed for the linear classification of separable data, but is applicable to nonlinear data

with the use of kernel functions. SVM are used in machine learning, optimization,

statistics, bioinformatics, and other fields that use pattern recognition. The algorithm

used within The Unscrambler is based on code developed and released under an

modified BSD license by Chih-Chung Chang and Chih-Jen Lin of the National Taiwan

University. Hsu et al,2009

SVM is a classification method based on statistical learning wherein a function that

describes a hyperplane for optimal separation of classes is determined. As the linear

function is not always able to model such a separation, data are mapped into a new

feature space and a dual representation is used with the data objects represented by their

dot product. A kernel function is used to map from the original space to the feature space,

and can be of many forms, thus providing the ability to handle nonlinear classification

cases. The kernels can be viewed as a mapping of nonlinear data to a higher dimensional

feature space, while providing a computation shortcut by allowing linear algorithms to

work with higher dimensional feature space. The support vector is defined as the reduced

training data from the kernel. The figure below illustrates the principle of applying a

kernel function to achieve separability.

In this new space SVM will search for the samples that lie on the borderline between the

classes, i.e. to find the samples that are ideal for separating the classes; these samples are

named support vectors. The figure below illustrates this in that only the samples marked

with + for the two classes are used to generate the rule for classifying new samples.

A situation where SVM will perform well is when some classes are inhomogeneous and

partly overlapping, and thus, building local PCA models with all samples will not be

successful because one class may encompass other classes if all samples are used.

SVM will in this case find a set of the most relevant samples in terms of discriminating

between the classes and is invariant to samples far from the discrimination line.

SVM has advantages over classification methods such as neural networks, as it has a

unique solution, and has less tendency of overfitting when compared to other nonlinear

classification methodologies. Of course, the model validation is the critical aspect in

avoiding overfitting for any method. SVMs are effective for modeling of nonlinear data,

and are relatively insensitive to variation in parameters. SVM uses an iterative training

algorithm to achieve separation of different classes.

Two SVM classification types are available in The Unscrambler which are based on

different means of minimizing the error function of the classification.

In the c-SVM classification, a capacity factor, C, can be defined. The value of C should be

chosen based on knowledge of the noise in the data being modeled. Its value can be

optimized through cross-validation procedures. When using nu-SVM classification, the nu

value must be defined (default value = 0.5). Nu serves as the upper bound of the fraction

of

errors

and

is

the

lower

bound

for

the

fraction

of

support

vectors.

Increasing nu will allow more errors, while increasing the margin of class separation.

The kernel type to be used as a separation of classes can be chosen from the following

four options:

Linear

Polynomial

Sigmoid

The linear kernel is set as the default option . If the number of variables is very large the

data do not need to be mapped to a higher dimensional space the linear kernel function is

preferred. The radial basis function is also simple function and can model systems of

varying complexity. It is an extension of the linear kernel.

If a polynomial kernel is chosen, the order of the polynomial must also be given. In SVM

classification, the best value for C is often not known a priori. Through a grid search and

applying cross validation to reduce the chance of overfit, one can identify an optimal value

of C so that unknowns can be properly classified using the SVM model.

SVM classification is a supervised method of classification. The data used for SVM must

have a data matrix which includes a single category variable defining which classes are to

be discriminated by the model. The X and Y matrices must have the same number of rows

(samples) for SVM classification, and not have any missing data. The Y matrix must

contain a single column of category variables. The X data must be numerical, and not

contain any missing data.

SVM have been used in drug discovery to identify compounds that may have efficacy, and

also to identify toxicity issues with drugs. They have been used in classification problems

such as that of classifying plastics from their FTIR spectra, meat and bone meal in feed

from NIR imaging spectroscopy, teas from HPLC chromatograms, and many other areas in

pattern recognition and data mining.

When an SVM model is created a new node is added in the project navigator with a folder

for the data used in the model, and the results folder. The results folder has the following

matrices:

Support vectors

Confusion matrix

Parameters

Probabilities

Prediction

The main result of the SVM is the the confusion matrix, which indicates how many

samples were classified is each class, and the prediction matrix, which indicates the

classification determined for each sample in the training set.

The prediction matrix indicates the classification determined for each sample in the

training set.

It is advised to start with the RBF kernel with various settings of C for C-SVM and select

10-segment cross validation. If all samples are correctly classified, which means the

confusion matrix has no values outside the diagonal, one may select this model as

suitable for classifying future samples. Of course, some data will not classify all samples

in the correct class during training.

If the data are expected to be nonlinear, e.g. from looking at the classes in a scores plot

from PCA or PLS-DA, one may try other kernels and change the settings for C or nu.

SVM were used as a multivariate classification tool for the identification of meat and bone

meal in animal feed in response to legislation banning such substances following the

outbreak of mad cow disease.Fernandez Pierna et al, 2004 NIR imaging spectroscopy is

able to detect differences in feeds based on the chemical composition. SVM can be used to

classify feed samples, reducing the need for constant expert analysis of data, thus

providing a rapid tool for analysis that can be utilized for certification of animal feed.

SVM were applied for the classification of plastics in a recycling system. Belousov et al,

2002 A remote FTIR spectrometer was mounted on a conveyor where plastics were being

sorted for recycling. A two-tiered classification model was developed where at the first

level samples were divided into the classes of important plastics (ABS, PC, PC/ABS, SB

and PVC) and reject plastics (PA, PP and PE). The important plastics were then further

categorized into each individual type of plastic.

More details regarding Support Vector Machine classification are given in the method

reference.

classification

The sections that follow list menu options, dialogs and results while using Support Vector

Machine classification in practice accessible from the menu Tasks-Analyze-Support Vector

Machine Classification.

Model input

First the input data for the classification is defined in the Support Vector Machine dialog.

Choose the data matrix which contains the data to be used for the classification as the

first matrix. This matrix of predictors should contain only numerical values, with no

missing values. The second matrix to define is that containing the category, and must

have a single column only. The SVM training requires at least two classes. This

classification information may be from the same matrix or another, but must have the

same number of rows as the first, and have only a single column of category data.

If the appropriate selection is not made for the classifier, the following warning will be

displayed. To build the SVM model go to the column drop-down list, select a single

column containing category variables.

Options

Here one can choose the SVM type of classification to use, either C-SVC or nu-SVM, from

the drop-down list next to SVM type. The kernel type to be used to determine the

hyperplane that best separates the classes can be selected from the following types from

the drop-down list. The default setting of Radial basis function is the simplest, and can

model complex data.

Linear

Polynomial

Sigmoid

For a polynomial kernel type, the degree of the polynomial should be defined. The C-SVM

has an input parameter named C, which is a capacity factor (also called penalty factor), a

measure of the robustness of the model. C must be greater than 0.

When using nu-SVM regression the nu value must be defined (default value = 0.5). Nu

serves as the upper bound of the fraction of errors and is the lower bound for the fraction

of support vectors.

Grid Search

In the options tab the Grid Search button

Search button will open a dialog for grid search. The figure below shows the grid search

dialog after a grid search has been perforemd.

The dialog asks for input for the parameters Gamma and C in the case of C-SVMC and

Gamma and Nu in the case of nu-SVMR. It has been reported in the literature that an

exponentially growing sequence of the parameters is good as a first course grid search.

This is why the inputs Gamma and C are given on the log scale, but not the nu since it is

between 0 and 1. However, in the grid table above the actual values are given. It is

recommended to use cross-validation in grid search to avoid overfitting when many

combinations of the parameters are tried. After an initial grid search it may be refined with

smaller ranges for the parameters once the best range has been found. Click on the Start

button for the calculations to commence. Note that it is possible to click on Stop during

the computations so that if the results become worse for higher values for the parameters

one may stop to save time.The default is to start with five levels of each parameter. Click

on one (the best) value for the Validation accuracy in the grid after completion to see

detailed results. The SVs lists how many samples that were selected and is depending

should be related to the number of samples in the data.

Click on Use setting to return to the previous dialog and for running the SMVC again with

these parameter settings. Notice that since the cross validation is random the RMSE and

the R-square from validation may be different in the second run. This again is a function

of the distribution of the samples.

To understand more in detail how SVMC selects the support vectors (samples that are

lying on the boundary between the classes) one may run a PCA on the same data and

make use of the Sample Grouping option in the score plot to visualize the support vectors.

Weights

If the analysis calls for variables to be weighted for making realistic comparisons to each

other (particularly useful for process and sensory data), click on the Weights tab and the

following dialog box will appear.

Individual variables can be selected from the variable list table provided in this dialog by

holding down the control (Ctrl) key and selecting variables. Alternatively, the variable

numbers can be manually entered into the text dialog box. The Select button can be used

(which will bring up the Define Range dialog), or every variable in the table can be selected

by simply clicking on All.

Once the variables have been selected, to weight them, use the options in the Change

Selected Variable(s) dialog box, under the Select tab. The options include:

A/(SDev +B)

This is a standard deviation weighting process where the parameters A and B can

be defined. The default is A = 1 and B = 0.

Constant

This allows the weighting of selected variables by predefined constant values.

Downweight

This allows the multiplication of selected variables by a very small number, such

that the variables do not participate in the model calculation, but their correlation

structure can still be observed in the scores and loadings plots and in particular,

the correlation loadings plot.

Block weighting

This option is useful for weighting various blocks of variables prior to analysis so

that they have the same weight in the model. Check the Divide by SDev box

to weight the variables with standard deviation in addition to the block weighting.

Use the Advanced tab in the Weights dialog to apply predetermined weights to each

variable. To use this option, set up a row in the data set containing the weights (or create

a separate row matrix in the project navigator). Select the Advanced tab in the Weights

dialog and select the matrix containing the weights from the drop-down list. Use the Rows

option to define the row containing the weights and click on Update to apply the new

weights.

Another feature of the advanced tab is the ability to use the results matrix of another

analysis as weights, using the Select Results Matrix button

an internal project navigator for selecting the appropriate results matrix to use as a

weight.

The dialog box for the Advanced option is provided below.

Once the weighting and variables have been selected, click Update to apply them.

Validation

Validation is an important part of any method applied in modeling data. Settings for the

Validation of the SVM are set under the Validation tab as shown below. First select to cross

validate the model by checking the check box. The number of segments to use can be

chosen in the segments entry. Cross validation is helpful in model development but

should not be a replacement for full model validation using a test set.

There are six result matrices generated after creating a SVM model:

Support vectors

Confusion matrix

Parameters

Probabilities

Prediction

Accuracy

There is only one matrix generated when predicting with a SVM model: Classified range

SVM node

Support vectors

The support vector matrix is comprised of the support vectors which are a subset of the

original samples that are closest to the boundary between classes and define the optimal

separation between classes.

Confusion matrix

The confusion matrix is a matrix used for visualization for classification results from

supervised methods such as support vector machine classification or linear discriminant

analysis classification. It carries information about the predicted and actual classifications

of samples, with each row showing the instances in a predicted class, and each column

representing the instances in an actual class.

In the below confusion matrix, all the Setosa samples are nicely attributed to the Setosa

group.

Two samples with actual value Virginica are predicted as Versicolor.

In the same way two samples with actual value Versicolor are predicted as Virginica.

Confusion matrix

Parameters

The parameters matrix carries information on the following parameters for all the

identified classes:

SVM type

Kernel type - as defined in the options for the SVM learning step

SV Count - the number of support vector needed for the classification of the data

Labels - the labels of the corresponding classes, given as numerical values starting

with 0

Parameters matrix

Probabilities

The probabilities matrix has three rows, for the Rho, and probabilities A and B for each of

the identified classes.

Probabilities matrix

Prediction

The prediction matrix exhibits the predicted class for each sample in the training set.

Prediction

Accuracy

Accuracy holds the % correctly classified samples from calibration and validation. If cross

validation was not chosen it leaves this field blank. However, cross validation is highly

recommended to avoid overfitting. See the Confusion Matrix regarding details for false

positives and false negatives.

This plot shows the various classes as they were classified for a 2D scatter plot of the

original variables. Use the arrows or drop-down list to choose which of the original

variables to show. This is useful to see for which combinations of pairs of variables there

is good separation between the classes. Alternatively perform PCA on the same data and

visualize the the support vectors with the sample grouping option in the score plot and

interpret the loading plot to find the most important variables.The Act and Pre buttons can

be used to toggle if one of them or both should be shown; the predicted are shown with a

smaller markersize. If the predicted class differs from the actual this is shown with a small

symbol with the color for the wrongly assigned class inside the larger marker for the

actual class. In the illustration below two samples (Batch19 and Batch21) are predicted to

belong to class Asia although the actual class is Europe.

Classified range

After an SVM model has been applied to new data to classify them, a new matrix with the

results is added to the project navigator. The Classified_Range matrix contains a category

variable giving the category predicted by the model for each sample.

Classified range

Autopretreatment may be used with SVM. This allows a user to automatically apply the

transforms used with the data in developing the SVM model to data used in the

classification of new samples with this model.

When all of the parameters have been defined, the SVM is run by clicking OK. A new node,

SVM, is added to the project navigator with a folder for Data, and another for Results.

More details regarding Support Vector Machine classification are given in the section SVM

Classify or in the link given under License.

After an SVM classification model has been developed, it can be used to classify new

samples by going to Tasks-Predict-Classification-SVM. In the dialog box, one first

chooses which SVM model to apply from the drop-down list. This requires a valid SVM

model in the current project. One then defines which samples to classify by selecting

samples from the appropriate data matrix, along with the X variables that are to be used

for the classification. The X-variables must contain only numerical data and have the same

number of variables as were used to develop the SVM model.

The SVM classification results are given in a new matrix in the project navigator named

Classified_Range. The matrix has the predicted class for each sample.

There are six result matrices generated after creating a SVM model:

Support vectors

Confusion matrix

Parameters

Probabilities

Prediction

Accuracy

There is only one matrix generated when predicting with a SVM model: Classified range

SVM node

Support vectors

The support vector matrix is comprised of the support vectors which are a subset of the

original samples that are closest to the boundary between classes and define the optimal

separation between classes.

Confusion matrix

The confusion matrix is a matrix used for visualization for classification results from

supervised methods such as support vector machine classification or linear discriminant

analysis classification. It carries information about the predicted and actual classifications

of samples, with each row showing the instances in a predicted class, and each column

representing the instances in an actual class.

In the below confusion matrix, all the Setosa samples are nicely attributed to the Setosa

group.

Two samples with actual value Virginica are predicted as Versicolor.

In the same way two samples with actual value Versicolor are predicted as Virginica.

Confusion matrix

Parameters

The parameters matrix carries information on the following parameters for all the

identified classes:

SVM type

Kernel type - as defined in the options for the SVM learning step

SV Count - the number of support vector needed for the classification of the data

Labels - the labels of the corresponding classes, given as numerical values starting

with 0

Parameters matrix

Probabilities

The probabilities matrix has three rows, for the Rho, and probabilities A and B for each of

the identified classes.

Probabilities matrix

Prediction

The prediction matrix exhibits the predicted class for each sample in the training set.

Prediction

Accuracy

Accuracy holds the % correctly classified samples from calibration and validation. If cross

validation was not chosen it leaves this field blank. However, cross validation is highly

recommended to avoid overfitting. See the Confusion Matrix regarding details for false

positives and false negatives.

This plot shows the various classes as they were classified for a 2D scatter plot of the

original variables. Use the arrows or drop-down list to choose which of the original

variables to show. This is useful to see for which combinations of pairs of variables there

is good separation between the classes. Alternatively perform PCA on the same data and

visualize the the support vectors with the sample grouping option in the score plot and

interpret the loading plot to find the most important variables.The Act and Pre buttons can

be used to toggle if one of them or both should be shown; the predicted are shown with a

smaller markersize. If the predicted class differs from the actual this is shown with a small

symbol with the color for the wrongly assigned class inside the larger marker for the

actual class. In the illustration below two samples (Batch19 and Batch21) are predicted to

belong to class Asia although the actual class is Europe.

Classified range

After an SVM model has been applied to new data to classify them, a new matrix with the

results is added to the project navigator. The Classified_Range matrix contains a category

variable giving the category predicted by the model for each sample.

Classified range

- Classification and Bayes Rule, Naïve BayesUploaded byakirank1
- A support vector machine classifier with rough set-based feature selection for breast cancer diagnosisUploaded byAbdul Rahman
- CSE176 Introduction to Machine LearningUploaded byravigobi
- Data Mining in Image Processing and it’s ApplicationUploaded byInternational Journal Of Emerging Technology and Computer Science
- Data Warehousing and Data Mining LabUploaded byvickkster
- Spam Email Classification 2Uploaded byAustin Kinion
- Bajju End Sem ReportUploaded byBalaVignesh
- Predicting food demand in food courts by decision tree approachesUploaded bySelman Bozkır
- Data Mining Classification Methods for Pediatric Records of Fujairah HospitalUploaded byeditorinchiefijcs
- Contextual Learning Approach to Improve Diagnostic Accuracy for Hybrid (Lung, HIV and Heart) DiseasesUploaded byIJSTE
- Gomez Jorge ProjectUploaded byJorge Luis Gomez Ponce
- Cs4758 Svm NotesUploaded byIvan Avramov
- Comparative Study on BBA Determination Using Different Distances of Interval NumbersUploaded byMia Amalia
- A Technique for Offline Handwritten Character RecognitionUploaded byeditor_ijcat
- [IJCST-V3I3P43]: Er. Manju rani, Er. Lekha Bhambhu,Uploaded byEighthSenseGroup
- Sharda Bia10e Tif 06Uploaded byKong Kasemsuwan
- cs229-notes3Uploaded byRajiv Sharma
- Boilerplate Detection using Shallow Text FeaturesUploaded byqhoxie
- SummaryUploaded byChanpreet Singh
- researchUploaded byAjin Nelson Nv
- Embc2016 YtsUploaded bykalyan
- PaperUploaded byNithya
- IDRISS Multitemp2015Uploaded byDaniel Georgian Pislaru
- whelanch_rpeUploaded bysuryasan
- [3] part2-nnUploaded bydisha210
- Schapire_MachineLearningUploaded byToon Man
- application.pdfUploaded bysubashdas1990
- PREVALENCE EVALUATION OF DIABETIC RETINOPATHYUploaded byInternational Jpurnal Of Technical Research And Applications
- 10.1.1.189.4500Uploaded byAngel Arturo Morales Bravo
- chap3_overfittingUploaded byYosua Siregar

- BIOSOFTLD190.pdfUploaded byJhonatan A Lozano Galeano
- art06Uploaded byJhonatan A Lozano Galeano
- Tds Lubeko 100Uploaded byJhonatan A Lozano Galeano
- Reciclaje de PlasticosUploaded byDavid Gonzalez
- Butvar Properties and UsesUploaded byJhonatan A Lozano Galeano
- Solventes Verdes a Base de GlycerolUploaded byJhonatan A Lozano Galeano
- Ozone RegeneratedUploaded byJhonatan A Lozano Galeano
- Interpretation of the dynamic heat capacity observed in glass-forming liquidsUploaded byJhonatan A Lozano Galeano
- gibbsphaseUploaded byJhonatan A Lozano Galeano

- Intro ScryptographyUploaded byThien Khiem Nguyen
- Frequency Detection FourierUploaded byspreewell
- Algorithmic MathematicsUploaded bymadhurocksktm
- Lempel–Ziv–Welch - Wikipedia, the free encyclopediaUploaded byAkhil Pratap
- UntitledUploaded byapi-251901021
- turingm.pdfUploaded byBhuvnesh Kumar
- Users ManualUploaded bymuhammad_sarwar_27
- STUnit-6Uploaded byRahul Narang
- Neutral CurrentUploaded bycoep05
- Secure Data Sharing For Manifold Users in the CloudUploaded byIJARTET
- SPSS- how toUploaded byNikhil Sharma
- POTWE-13-NP-04-SUploaded byscribd-in-action
- LDCS Course Plan Jan. 2015Uploaded bySuraj Kamath
- Vol4 Iss2 257 - 263 a Decision Support System for ParkiUploaded byBagus Puji
- Power Spectral DensityUploaded bySourav Sen
- aspmtpUploaded byHitesh Soniwal
- Algorithmic Specified Complexity in the Game of LifeUploaded byLelieth Martin Castro
- Abm IntroductionUploaded byolgutza27
- The Splendors and Miseries of Martingales - [Mazliak, Shafer]Uploaded byjimjone
- Small Perturbation ProofUploaded byyalllik
- Fishing RodsUploaded byMatt Shim
- DiscussionsUploaded byNikhil Gupta
- Seminar Report ANNUploaded bykartik143
- MATLAB in Signal & SystemsUploaded byhernanes13
- Privacy Preserving Using CloudUploaded byHitesh ವಿಟ್ಟಲ್ Shetty
- Cse Syllabus-s5 [Full]Uploaded bySachin Dev
- Recursion in C,C++Uploaded bysabarisri
- 0 Finite DifferenceUploaded byvisamarinas6226
- DNA-based cryptography: motivation, progress, challenges, and futureUploaded byDr Muhamma Imran Babar
- modUploaded byClemen Salvador