1 views

Uploaded by boss_bandor

Machine learning foundation lecture-01

- EEG Dong 2015
- 103_2017_Progress in Aerospace Sciences_Review of Design Optimization Methods
- Paper 19- A Fuzzy Similarity Based Concept Mining Model for Text Classification
- Tutorial ZQLuo
- convex analysis
- Emelie Project Work
- 59545fd5d9e68198115b81ff81d0ae9f6821
- isotone
- Automated
- OptimizationHW5problems
- Questions
- p74-yener
- evgeniou-reviewall
- TWO-LEVEL DECOMPOSITION METHOD FOR RESOURCE ALLOCATION IN TELECOMMUNICATION NETWORKS
- A Satellite-based Global Landslide Model
- artificial intelligence_dp2
- 2013-10-28 Progress report
- Introduction
- A Study on Sentiment Analysis Algorithms and its Application on Movie Reviews-A Review
- ISO-8859-1''XAT QA 2006 ACTUAL PAPER

You are on page 1of 33

Lecture 1

Kristiaan Pelckmans

September 8, 2015

Overview

Today:

I Overview of the course.

I Support Vector Machines (SVMs) - the

separable case.

I Convex Optimization.

I Analysis.

I Kernels.

I SVMs - the inseparable case.

Overview (Ctd)

Organization:

I 10 Lectures.

I 1 computer lab (mid october).

(content)

I Miniprojects (due end october).

(content)

I Participants giving lectures using

material.

Overview (Ctd)

Course:

1. Introduction.

2. Support Vector Machines (SVMs).

3. Probably Approximatively Correct (PAC) analysis.

4. Boosting.

5. Online Learning.

6. Multi-class classification (*).

7. Ranking (*).

8. Regression (*).

9. Stability-based analysis (*).

10. Dimensionality reduction (*).

11. Reinforcement learning (*).

12. Presentations of the results of the mini-projects.

Introduction

I Classification.

I Regression.

I Ranking.

I Clustering.

I Dimensionality Reduction

or manifold learning.

Introduction (Ctd)

Definitions and Terminology:

I m Examples.

I Features xi X .

I Labels yi Y.

I Fixed, unknown distribution underlying samples D.

I Training sample Sm X Y.

I Validation sample S 0 .

I Test sample S 00 .

I Loss function L.

I Hypothesis set H = {h : X Y}.

I Learning algorithm A() : {S} H : hS

I where are all the free tuning parameters

I (true and average loss) Risk R and Rm .

Introduction (Ctd)

n-fold Cross-validation

I Let Sm = {(xi , yi )}m

i=1 be the original training set.

I Divide set Sm into n disjunct folds so that every

point included once.

I Make n sets with n 1 folds, denote them as Si .

I Let Si = {(xij , yij )}m i

j=1 be the training set of the

i-th iteration.

I Hence hSi the outcome of A() applied to the i-th

training set.

n

1X 1 X

RCV () = L hSi (xij ), yij

n mi

i=1 j=1

Introduction (Ctd)

Learning Scenarios

I Supervised learning.

I Unsupervised learning.

I Semi-supervised learning.

I Transductive inference.

I Online learning.

I Reinforcement learning.

I Active Learning.

SVM - separable case

Support Vector Machine (SVM)

I Assume that there is a f s.t. y = f (x).

I Find h with minimal

characterised by (w, b):

(

w x + b > 0 h(x) = +1

w x + b < 0 h(x) = 1

or

n o

H = x sign(w x + b) : w RN , b R

f (x)h(x) = yh(x) > 0 x D.

SVM - separable case (Ctd)

Maximal Margin

I Hyperplane {x : w x + b = 0}.

I Normalise such that mini |w xi + b| = 1

(w.l.o.g.).

I Distance point x0 - margin:

|w x0 + b|

kwk

I Thus margin is given as

mini |w xi + b| 1

= = .

kwk kwk

SVM - separable case (Ctd)

Maximal Margin

I Maximal Hyperplane:

yi (w xi + b) 0 i

|wxi +b|

max s.t. = mini kwk

,w,b

mini |w xi + b| = 1

I Maximal hyperplane:

1

max s.t. yi (w xi + b) 1 i, =

,w,b kwk

I Or

1

min kwk2 s.t. yi (w xi + b) 1 i.

w,b 2

SVM - separable case (Ctd)

Maximal Margin

I Convex objective.

I Affine inequality constraints.

I Quadratic Programming problem.

I Dual problem: proberties!

Convex Optimization

Convex

I A set X is convex iff for any two points

x, x0 X , the segment

{x + (1 )x0 : 0 1} X .

I A function f : X R is convex iff for all

x, x0 X and all 0 1 one has that

is convex if and only if X is convex and

Convex Optimization (Ctd)

Convex Programming

I Constrained optimisation problem.

xX

I Lagrangian:

X

x X , 0 : L(x, ) = f (x)+ i gi (x).

i

0 : F () = inf L(x, ).

xX

so that F () p .

I Dual problem:

d = max F ()

0

Convex Optimization (Ctd)

Convex Programming

I Weak duality: p d .

I Strong duality: p = d .

I Duality gap: p d .

I Strong duality holds when Constraint

qualifications hold.

I Strong constraint qualification (Slater):

x int(C) : gi (x) < 0 i

I Weak constraint qualification (weak

Slater): x int(C) : gi (x) < 0

or gi is affine, gi (x) = 0 i

Convex Optimization (Ctd)

Assume that f , gi : X R for all i are convex

and differentiable, and that the constraints are

qualified, then x is a solution of the

constrained program if and only there exists an

such that

x L(x, ) = 0

L(x, ) 0

i gi (x) = 0 i

Analysis of SVMs

I Lagrangian:

m

1 X

L(w, b, ) = kwk2 i (yi (w xi + b) 1)

2

i=1

I KKT conditions

w= m

P

w L = 0 P

i=1 i yi xi

m

b L = 0 i=1 i yi = 0

i : i (yi (w xi + b) 1) = 0

Analysis of SVMs (Ctd)

I Dual problem max0 inf w,b L(w, b, )

I Eliminate w and b using KKT conditions:

I Dual problem

m

X m X

X m

max i i j yi yj (xi xj )

0

i=1 i=1 j=1

m

X

s.t. i yi = 0 (1)

i=1

P

i=1 i yi xi .

I

Pm

I and b = yi i=1 j yj xj xi

I Hence we can predict

h(x) = sign(w x + b)

Analysis of SVMs (Ctd)

Generalization error

I Leave-one-out analysis.

I In terms of NSV .

I Margin-based analysis.

Analysis of SVMs (Ctd)

Leave-one-out analysis

m

1 X

RLOO (A(), S) = 1(hS/(xi ,yi ) (xi ) = yi )

m

i=1

I A()(S) = hS .

I 1(z) = 1 iff z is true, 1(z) = 0.

I In terms of NSV .

I Then

Analysis of SVMs (Ctd)

Proof:

m

1 X

E SD m [RLOO (A(), S)] = ESD m 1(hS/(xi ,yi ) (xi ) = yi )

m

i=1

= ESD m 1(hS/(x1 ,y1 ) (x1 ) = y1 )

= ESD m 1(hS/(x1 ,y1 ) (x1 ) = y1 )

= ES 0 D m1 [Ex1 D [1(hS 0 (x1 ) = y1 )]]

= ES 0 D m1 [R(hS 0 )].

Analysis of SVMs (Ctd)

returned by SVMs for a sample S, and let NSV (S) be the number

of Support Vectors that define hS . Then

NSV (S 0 )

ESD m [R(hS )] ES 0 D m+1

m+1

hS/(xi ,yi ) (xi ) = f (xi )

NSV (S)

RLOO (A(), S)

m+1

SVM - Margin analysis (Ctd)

Vapnik-Chervonenkis (VC) dimension:

I Distance point x0 with label y0 to a

hyperplane {x : w x + b = 0} is

y0 (w x0 + b)

(x) =

kwk

I Margin is given as

yi (w xi + b)

= min

i kwk

I capacity of H (Structural Risk

Minimisation: see next lecture)

I VC dimension (try) of hyperplane is N + 1

...

I But high-dimensions?

SVM - Margin analysis (Ctd)

1

I Margin = kwk .

I H = {h(x) = sign(w x + b), kw k }.

I How many points can be shattered?

I Measures capacity of H.

I Relates to Rademacher complexity:

m

" #

1 X

RS (H) = E1 ,...,m sup i h(xi ) .

m hH i=1

SVM - non-separable case

Maximal Soft Margin:

I Non-separable case: (w, b)

i : yi (w xi + b) 6 1

yi (w xi + b) 1 i

m

1 X

min kwk2 + C i

w,b, 2

i=1

(

yi (w xi + b) 1 i i

s.t. (2)

i 0 i.

SVM - non-separable case (Ctd)

Dual problem:

I Dual problem

m

X m X

X m

max i i j yi yj (xi xj )

0C

i=1 i=1 j=1

m

X

s.t. i yi = 0 (3)

i=1

P

i=1 i yi xi .

I

Pm

I and byi = 1 yi i=1 j yj xj xi when

i = 0.

SVM - Analysis.

Rademacher complexity:

m

1 X

R (h) = (h(xi ) yi )

m

i=1

I

m

" #

1 X

RS (H) = E1 ,...,m sup i h(xi )

m hHi=1

I H = {h(x) = sign(w x + b), kw k , b R}.

I Theorem: Let H be a set of real-valued functions, fix

> 0. For any > 0, with probability exceeding 1

one has that

s

2 log 2

h H : R(h) R (h) + RS (H) + 3 .

2m

SVM - Analysis (Ctd).

Rademacher complexity:

m

" #

1 X

RS (H) = E1 ,...,m sup i h(xi )

m hH i=1

I Theorem: Let S be a sample of size m with kxi k R,

then r

R 2 2

RS (H) .

m

SVM - Analysis (Ctd)

Proof:

m

" #

1 X

RS (H) , E sup i (w xi )

m kwk i=1

m

" #

1 X

= E sup w i xi

m kwk i=1

m

" #

X

E k i xi k

m

i=1

" m #1/2

X

E i kxi k2

m

i=1

r

mR 2 2 R 2

= . (4)

m m

SVM - Analysis (Ctd)

RS ( H) LRS (H)

hypothesis set.

m

" #

1 X

, E1 ,...,m sup i ( h)(xi )

m hH i=1

1

= E1 ,...,m1 Em sup m ( h)(xm ) + um1 (5)

m hH

Pm1

with um1 (h) = i=1 i ( h)(xi ).

SVM - Analysis (Ctd).

1

, E1 ,...,m1 Em sup um1 (h) + m ( h)(xm )

m hH

1 1

[um1 (h1 ) + (h1 (xm ))] + [um1 (h2 ) (h2 (xm ))]

2 2

1 1

[um1 (h1 ) + um1 (h2 )] + sL [(h1 (xm )) (h2 (xm ))]

2 2

E sup um1 (h) + m Lh(xm ) , (6)

hH

Kernels.

in (xi xj ).

I Lets generalise it to ((xi ) (xj )) with

: RN R .

I No explicit mapping, just inner product needed!

I ((xi ) (xj )) = K (xi , xj ).

I iff K PSD!

kxi xj k2

I Typical choice K (xi , xj ) = exp 2

I Rademacher complexity of

Xm

H={ i yi K (xi , ), kk }.

i=1

Conclusions

I SVMs: optimisation, analysis.

I Separable case, non-separable case.

I Linear + kernels.

I Analysis.

I Margin and high-dimensional.

- EEG Dong 2015Uploaded byBudi Setyawan
- 103_2017_Progress in Aerospace Sciences_Review of Design Optimization MethodsUploaded byBhaskar Nandi
- Paper 19- A Fuzzy Similarity Based Concept Mining Model for Text ClassificationUploaded byEditor IJACSA
- Tutorial ZQLuoUploaded byaatest
- convex analysisUploaded byFranklin Gálvez
- Emelie Project WorkUploaded byChikwuo Clinton
- 59545fd5d9e68198115b81ff81d0ae9f6821Uploaded byBrightworld Projects
- isotoneUploaded bytristaloid
- AutomatedUploaded byVivek Ramamoorthy
- OptimizationHW5problemsUploaded byJason Wu
- QuestionsUploaded byNitish Sehgal
- p74-yenerUploaded byOber Van Gomez
- evgeniou-reviewallUploaded byansmechit
- TWO-LEVEL DECOMPOSITION METHOD FOR RESOURCE ALLOCATION IN TELECOMMUNICATION NETWORKSUploaded byIJDIWC
- A Satellite-based Global Landslide ModelUploaded byParag Jyoti Dutta
- artificial intelligence_dp2Uploaded byupender_kalwa
- 2013-10-28 Progress reportUploaded bySyllogismRXS
- IntroductionUploaded byTapasKumarDash
- A Study on Sentiment Analysis Algorithms and its Application on Movie Reviews-A ReviewUploaded byIJRASETPublications
- ISO-8859-1''XAT QA 2006 ACTUAL PAPERUploaded byapi-3708715
- 2015-Use of Landsat and Corona Data for Mapping Forest Cover Change From the Mid-1960s to 2000s Case Studies From the Eastern United States and Central BrazilUploaded bySilviu Doru
- Semi Supervised Apprenticeship Learning Valko2012Uploaded byfromatob3404
- 2016-01-18 - BSPC-S-16-00029Uploaded bypwierzgala
- Carbon Dioxide Compressibility Factor Determination Using a Robust Intelligent Method, Erfan Mohagheghian, 2015Uploaded byjoreli
- Analisis David Lewin.pdfUploaded byalejandro nova villalba
- Chap 5Uploaded byLogan Cheng
- Book1.xlsxUploaded byBambang Hermansyah
- Classification of Acute Leukemia using Fuzzy Neural NetworksUploaded byEditor IJRITCC
- IEEE Transactions on Automatic Control Volume 54 Issue 10 2009 [Doi 10.1109%2Ftac.2009.2028959] Verscheure, D.; Demeulenaere, B.; Swevers, J.; De Schutter, J.; -- Time-Optimal Path Tracking for Robots (1)Uploaded byAnggara T Nugraha
- 15Vol83No2.pdfUploaded bySayemAbrar

- Huawei HLR9820 FunctionUploaded byboss_bandor
- Rubiks Cube 3x3 SolutionUploaded bycamildriessen
- Basic Ali 2009 Part 2Uploaded byboss_bandor
- Sharp oven manualUploaded byboss_bandor
- MU-MIMO BC handy formulaeUploaded byboss_bandor
- Basic Ali 1Uploaded byPriyabrata Ray
- আধুনিক এ্যালোপ্যাথিক চিকিৎসা_LQUploaded byboss_bandor
- ts_136331v130000p_ModofiedUploaded byAnshul Srivastava
- TortoiseSVN-1.9.1-enUploaded byvinipnsilva
- Huffman Coding on MatlabUploaded byboss_bandor
- 3gpp LteUploaded byMouna Sadaqi
- GSM SIM file systemUploaded byboss_bandor
- DT for IP ShiftUploaded byboss_bandor
- Analysis of the existence of equal chances of higher educationUploaded byboss_bandor
- Huawei HLR9820 FunctionUploaded byboss_bandor
- Koborer ProhoriUploaded byboss_bandor
- Solution manual- Wireless Communications MolischUploaded byboss_bandor
- Barron FlashCardUploaded byapi-3708135
- 113260680 OAU004101 UMG8900 Hardware and Architecture Principle R003 ISSUE2!1!20050622Uploaded byboss_bandor
- GSM Terminating Call FlowUploaded byxxxhere
- RE Practices in Bangladesh Biogas TechnologyUploaded byboss_bandor
- Flexi Multiradio Antenna System Technical OverviewUploaded byboss_bandor
- GSM RR Call FlowsUploaded bygalal_a2005
- 3G CN dimesioning exerciseUploaded byboss_bandor
- Digital Communications Using ChaosUploaded byboss_bandor
- Huawei Core Nodes Monitoring GuidelinesUploaded byboss_bandor
- Concordance InformationUploaded byboss_bandor
- C7 SignallingUploaded byboss_bandor

- Obituary of Neera Desai (1924-2009) by Prof. Vibhuti Patel published in Economic and Political Weekly (EPW) 11-17 July 09Uploaded byVibhuti Patel
- Modern Russian - Book 1Uploaded bykeppelation
- Beginning Intermediate Algebra Custom Edition for Middle Tennessee State University Taken From Beginning Intermediate Algebra 3rd Edtion by Terry McGinnisUploaded byrob
- Qualitative research Article.pdfUploaded byLydiaL.Wong
- Prosto FixUploaded byNiken Tri Hapsari
- Yoga and Meditation for Nontraditional PopulationsUploaded byMakshika Guptaa
- rmd630 fall16Uploaded byapi-263046848
- Hodge,Judaea TECA 1318Uploaded byBrandon Jurand
- School MusicUploaded byhugoportas
- Creative Thinking Facilitation for PWUploaded byriddock
- Performance Management PPT SL Edit BS Lecture SummaryUploaded byBeth Bauzon
- PEQB Rev 2.pdfUploaded byVsrisai Chaitanya
- Playing_the_Field_Conference_CompleteUploaded byyorku_anthro_conf
- Minutes 06-28-13 (CB)Uploaded bysanggusecgen
- Quasi-Experimental and Single-Case Designs M:C 10Uploaded byElaine Louise O. Foronda
- Gordons Functional PatternUploaded byGee Villa
- 1. Questionnaire for Accreditation Re-Accreditation (PEC AC-1 Form)Uploaded byzeeya
- School FormUploaded byvtalex
- up 11Uploaded byRiaz Hussain
- 594-kalender-akademik-2016-2017-uph-1606060810.pdfUploaded byWa Ba
- Delivery Room ChecklistUploaded byapi-3739910
- Basic Math Presentation - Building a ModelUploaded bykellyting90
- OSP Supervisor JDUploaded byRej Reyes
- coaching skills and EI.pdfUploaded bytom
- compositionrubricUploaded byapi-239620960
- Assignment 2 Revised DraftUploaded byRadha Patel
- The Natural PharmacyUploaded byelmin
- mcs cv for eportfolioUploaded byapi-252472051
- Effective Clinical TeachingUploaded bySharon Denham
- Creative Problem SolvingUploaded bySyed Babar Ali