0 views

Uploaded by ramanadk

lecture14

- Optimal Control With Discounting
- Learning Data Science_ Day 11 - Support Vector Machine – Medium
- Application of Data Mining Diabetes Health Care in Young and Old Patients
- Rainfall Prediction
- 04380267.pdf
- Paper 30-Expensive Optimisation a Metaheuristics Perspective
- Public Economics
- Constrained Optimization
- Km 2417821785
- Report
- 10.1080@13588265.2018.1454289
- InTech-Automatic Facial Feature Extraction for Face Recognition
- SVM-DSmT Combination for Off-Line Signature Verification
- 2008-Kukenys+Mccane-svms for Human Face Detection
- 2005 Jaime Nn
- KKTconditionsMATLAB
- Object Recognition From Image Using Grid Based Color Moments Feature Extraction Method
- 1-s2.0-S0957417410009930-main
- 2003An Algorithm of Estimating the Generalization Performance of RBF-SVM
- GottliebSalisburyShekVaidyanathan-DetectingCorporateFraud

You are on page 1of 19

Lecture 5

David Sontag

New York University

Multi-class SVM

As for the SVM, we introduce slack variables and maximize margin:

To predict, we use:

How to deal with imbalanced data?

imbalanced data sets

• We may want errors to be equally distributed

between the positive and negative classes

• A slight modification to the SVM objective

does the trick!

What’s Next!

• Learn one of the most interesting and

exciting recent advancements in machine

learning

– The “kernel trick”

– High dimensional feature spaces at no extra

cost!

• But first, a detour

– Constrained optimization!

Constrained optimization

No Constraint x ≥ -1 x≥1

Lagrange Multipliers!!!

Lagrange multipliers – Dual variables

Add Lagrange multiplier

Rewrite

Constraint

Introduce Lagrangian (objective):

We will solve:

Why is this equivalent?

• min is fighting max!

x<b (x-b)<0 maxα-α(x-b) = ∞

• min won’t let this happen!

Add new

x>b, α≥0 (x-b)>0 maxα-α(x-b) = 0, α*=0 constraint

• min is cool with 0, and L(x, α)=x2 (original objective)

The min on the outside forces max to behave, so constraints will be satisfied.

Dual SVM derivation (1) – the linearly

separable case

constraints per example

Lagrangian:

Dual SVM derivation (2) – the linearly

separable case

(Primal)

(Dual)

these two optimization problems are equivalent!

x(1)

...

Dual SVM derivation

x (n)

(3) – the linearly

x(1) x(2)

φ(x) = separable case

(1) (3)

x x

...

(Dual)

ex(1)

Can solve for optimal w,. .b. as function of α:

∂L �

∂w

=w− αj yj xj

j

(Dual)

x(1)

...

Dual SVM derivation

x (n)

(3) – the linearly

x(1) x(2)

φ(x) = separable case

(1) (3)

x x

...

(Dual)

ex(1)

Can solve for optimal w,. .b. as function of α:

∂L �

∂w

=w− αj yj xj

j

(Dual)

• w and b are computed from α (if needed)

Dual SVM derivation (3) – the linearly

separable case

Lagrangian:

is tight. We use this to obtain b:

(1)

(2)

(3)

Dual for the non-separable case – same basic

story (we will skip details)

Dual:

What changed?

• Added upper bound of C on αi!

• Intuitive explanation:

• Without slack, αi ∞ when constraints are violated (points

misclassified)

• Upper bound of C limits the αi, so misclassifications are allowed

Wait a minute: why did we learn about the dual

SVM?

algorithms that can solve the dual faster than the

primal

– At least for small datasets

Reminder: What if the data is not

linearly separable?

Use features of features

of features of features….

x(1)

...

x(n)

x(1) x(2)

φ(x) = (1) (3)

x x

...

ex(1)

...

Higher order polynomials

number of monomial terms

d=4

m – input features

d – degree of polynomial

d=3

grows fast!

d = 6, m = 100

d=2

about 1.6 billion terms

number of input dimensions

Dual formulation only depends on

dot-products, not on w!

Remember the

First, we introduce features: examples x only

appear in one dot

product

Next, replace the dot product with a Kernel:

∂w x . . .x

(1) j j j

φ(x) = ex x j�

(1)

(1)e∂Lx(3)

� x �. . .�= w − � αj yj xj

u1 .∂w .. v1 j

= �. .. . = u1 v1 + u2 v2 = u.v

Efficient dot-product

φ(u).φ(v)

∂L

∂L = w

=

− u2� (1)of

�

α �y

w − euα1j yj xj

x j

v2�polynomials

j xj

�

∂w

Polynomials of degree exactly d

v1

∂w 2 =

φ(u).φ(v) jju 2

. = u1 v1 + u2 v2 = u.v

u1 2 v1 v2

d=1 �� �� � � ��. . .

uu u 1u 2

v1 v 1 v 2

2 2

φ(u).φ(v)

φ(u).φ(v) =

= 11

φ(u).φ(v) = u u u. v1�.

u 2 1

. == uu v v2+ =

+u2uu

v221vv= + u.v

u.v

1= 2u1 v1 u2 v2 + u

∂Lu22 21 u1 uv222 v

1

2 v1

1 11

1 2

v v

φ(u).φ(v)

=

= u

w

2 − α

.

2

vj2y j1x 2

j = u21 v12 + 2u1 v1 u2 v2 +

d=2 u∂w

1

2 2

uv21u12 v2 v1

u1 u2 v1uv22 j 2

v2 v )2

φ(u).φ(v) = .

�u2 u1 �� = (u v 2

+ 2

u 2 2

v2 v1 �1 11 1 2 21 1 2 2

= u v + 2u v u v + u 2 v2

u

u2 1 v 2v1 = (u v + u v ) 2

φ(u).φ(v) = 2 . 2 = 1u11 v1 + 2 2u v = u.v

2 2

u2 v2 = (u.v) 2

φ(u).φ(v) = (u.v)d

For any d (we will skip proof): φ(u).φ(v) = (u.v) d

= (u.v)=d (u.v)d

φ(u).φ(v)φ(u).φ(v)

• Cool! Taking a dot product and exponentiating gives same

results as mapping into high dimensional space and then taking

dot produce

Finally: the “kernel trick”!

– Compute dot products in closed form

• Constant-time high-dimensional dot-

products for many classes of features

• But, O(n2) time in size of dataset to

compute objective

– Naïve implements slow

– much work on speeding up

Common kernels

• Polynomials of degree exactly d

• Polynomials of degree up to d

• Gaussian kernels

• Sigmoid

- Optimal Control With DiscountingUploaded byElphego Torres
- Learning Data Science_ Day 11 - Support Vector Machine – MediumUploaded byAri Clecius
- Application of Data Mining Diabetes Health Care in Young and Old PatientsUploaded byHandsome Rob
- Rainfall PredictionUploaded byoutrider
- 04380267.pdfUploaded bySudhakar Spartan
- Public EconomicsUploaded byLe Tung
- Paper 30-Expensive Optimisation a Metaheuristics PerspectiveUploaded byEditor IJACSA
- Constrained OptimizationUploaded bythavaselvan
- Km 2417821785Uploaded byAnonymous 7VPPkWS8O
- ReportUploaded bypivan hebbs
- 10.1080@13588265.2018.1454289Uploaded byCalabria Calabria
- InTech-Automatic Facial Feature Extraction for Face RecognitionUploaded bykhoaa7
- SVM-DSmT Combination for Off-Line Signature VerificationUploaded byAnonymous 0U9j6BLllB
- 2008-Kukenys+Mccane-svms for Human Face DetectionUploaded bykhoaa7
- 2005 Jaime NnUploaded byabhishekbmc
- KKTconditionsMATLABUploaded byKnosos
- Object Recognition From Image Using Grid Based Color Moments Feature Extraction MethodUploaded byesatjournals
- 1-s2.0-S0957417410009930-mainUploaded byAnonymous PsEz5kGVae
- 2003An Algorithm of Estimating the Generalization Performance of RBF-SVMUploaded byEmin Kültürel
- GottliebSalisburyShekVaidyanathan-DetectingCorporateFraudUploaded bycrackend
- Seminar Doc 2Uploaded byanjali9myneni
- Performance Assessment of Multiple Classifiers Based on Ensemble Feature Selection Scheme for Sentiment AnalysisUploaded byLas Ukcu
- 46 Prognostic Analysis Based on Hybrid Prediction Method for Axial Piston Pump (1)Uploaded byOliver Ruiz
- Notes Problem Set2Uploaded bysebollie
- Cheng ThesisUploaded byMichelle Synder
- Important Instructions to Be Followed When Thesis is WrittenUploaded byRam Ssk
- Tvc, IrelandUploaded byfil2011
- ch7Uploaded byNazmul-Hassan Sumon
- Nonconvex OptUploaded byayush89
- Hattori 2015Uploaded byjtoreroc

- Biology for 12th class cbseUploaded byramanadk
- Telugu Panchangam, Telugu Panchanga for Hyderabad, Telangana, IndiaUploaded byramanadk
- K meansUploaded bySaurabh Mishra
- knime_seventechniquesdatadimreductionUploaded byramanadk
- 12 Biology CBSE Sample Papers 2018Uploaded bySakshi Godara
- The Best of the Machine Learning Algorithms Used in Artificial IntelligenceUploaded byramanadk
- knime_seventechniquesdatadimreduction.pdfUploaded byramanadk
- Sattriya and BharatanatyamUploaded byramanadk
- A Study on Motivational Factors Formotivation in OrganizationsUploaded byramanadk
- A Tutorial on Principal Component AnalysisUploaded byCristóbal Alberto Campos Muñoz
- Olan..pdfUploaded byMazharul Islam
- Land Records and Titles in IndiaUploaded byramanadk
- Environment SetupUploaded byramanadk
- Selenium - A Brief Overview.pdfUploaded byramanadk
- Chapter 7 Edition 1Uploaded byramanadk
- 5952 Frequently Asked Questions 9001 2015Uploaded byramanadk
- agileqaprocess-110409110217-phpapp02Uploaded byramanadk
- 12-02 Ekagra QA FIREBIRD Monthly Status 02-29-2012 508 CompliantUploaded byramanadk
- Analytical Thinking TrainingUploaded bySachin Methree
- 20027-InfosysImprovingProductivity[1]_6e9a1b1c-2425-4a9d-a2bf-319feb9b3da4.pdfUploaded byramanadk
- STATISTICS for Mgt Summary of ChaptersUploaded byramanadk
- Statistics for ManagementUploaded byramanadk
- Statistics for Management by Levin and Rubin SolutionUploaded byramanadk
- KSUploaded byramanadk
- ls11Uploaded byramanadk
- LSUploaded byramanadk

- SAER-5565Uploaded bysethu1091
- Darcy Ripper User ManualUploaded byMichael Morrow
- 237654933-mathematics-t-form-6-150913054948-lva1-app6892Uploaded by绝醒
- Quantum Mechanics TheoryUploaded byPridana Ynwa
- I-105 3rd 2007 Fiscal Measurement Systems for Hydrocarbon LiquidUploaded bymechmohan26
- Client-Guide-to-3D-Scanning-and-Data-Capture.pdfUploaded byBorja Martin Melchor
- Moment Area Method.pdfUploaded byJose DAniel Figueroa
- EXPLAIN DemystifiedUploaded byRoxana Sarbu
- Geoprocessing data drone.docxUploaded byYockie Taupan Putra
- Block Trace Analysis and Storage System OptimizationUploaded byAndras Szilagyi
- Remote Controlled Fan Regulator Project ReportUploaded byMaruthi Jacs
- Machine Learning an introductionUploaded byAnoop Cadlord
- Final-solutions.pdfUploaded byMahendra Reddy J
- PostGIS Essentials - Sample ChapterUploaded byPackt Publishing
- ADVANCE 3.1-Flow DiagramUploaded byJuanSebastiánGuzmánFeria
- Improved System OperationUploaded byRafi Muhammed
- 14-2111Uploaded byAcie Lastri
- 02_Frobenius-31-12-14_ppUploaded byChristian Malacapo Mortel
- Trigo WorksheetUploaded byEdwin Tan Pei Ming
- 9.mechanical properties of fiber reinforced lightweight concrete composites.pdfUploaded byManuel Gutarra
- macc020ffUploaded byJay Lendra
- Application of Forces Acting on Jetty StructureUploaded byIJSTE
- BT151Uploaded byitamar_jr
- Chapter23 Television[1]Uploaded byVictor Gomez Dy Lampadio
- jawapan kertas 3 bio julai.docUploaded byzulhariszan abd manan
- Hyperbola.pdfUploaded byRaju Raju
- Combined Applications of Mems and Nano Technology in Obtaining High Data Storage DensityUploaded byvaalgatamilram
- Bertagnoli Giordano ManciniUploaded byAle85to
- 6th sem syllabus ip university btech maeUploaded byAshish
- CryogenicUploaded byzohaib_farooq