You are on page 1of 54

1

CS6208 : Advanced Topics in Artificial Intelligence


Graph Machine Learning

Lecture 4 : Graph SVM


Semester 2 2022/23

Xavier Bresson
https://twitter.com/xbresson

Department of Computer Science


National University of Singapore (NUS)

Xavier Bresson 1
2

Course lectures

Introduction to Graph Machine Learning Part 3 : GML with deep feature learning,
Part 1: GML without feature learning a.k.a. GNNs (after 2016)
(before 2014) Graph Convolutional Networks
Introduction to Graph Science (spectral and spatial)
Graph Analysis Techniques without Weisfeiler-Lehman GNNs
Feature Learning Graph Transformer & Graph
Graph clustering ViT/MLP-Mixer
Graph SVM Benchmarking GNNs
Recommendation Molecular science and generative GNNs
Dimensionality reduction GNNs for combinatorial optimization
Part 2 : GML with shallow feature learning GNNs for recommendation
(2014-2016) GNNs for knowledge graphs
Shallow graph feature learning Integrating GNNs and LLMs

Xavier Bresson 2
3

Outline

Supervised classification

Linear SVM

Soft-margin SVM

Kernel techniques

Non-linear/kernel SVM

Graph SVM

Conclusion

Xavier Bresson 3
4

Outline

Supervised classification

Linear SVM

Soft-margin SVM

Kernel techniques

Non-linear/kernel SVM

Graph SVM

Conclusion

Xavier Bresson 4
5

Learning techniques

As of Feb 2023, there are five main classes of learning algorithms :


Supervised learning (SL) : Algorithms that use labeled data, i.e. data annotated by
humans.
Unsupervised learning : Algorithms that learn the underlying data distribution without
relying on label information, e.g. data generation.
Semi-supervised learning : Algorithms that use both labeled and unlabeled data.
Reinforcement learning (RL) : Algorithms that learn sequence of actions to maximize a
future reward over time, e.g. winning games.
Self-supervised learning (SSL) : Algorithms that learn data representation by self-
labeling, without requiring human annotations.

Xavier Bresson 5
6

Support vector machine

In this lecture, we will focus on two specific topics :


Supervised classification using Support Vector Machine (SVM).
Semi-supervised learning that leverage graph structure to improve learning from
partially labeled data.
SVM stands as a theoretically robust and widely successful technique deployed across
various applications.
It was the prevailing machine learning model prior to the advent of deep learning.
Its decline in popularity can be attributed primarily to the absence of a feature learning
mechanism. SVM relies on features engineered by humans, which were surpassed with
features learned by neural network architectures.

Xavier Bresson 6
7

Support vector machine

SVM elegantly connects important topics in machine learning :


Geometric interpretation of classification tasks.
Ability to handle non-linear class boundaries using higher-dimensional feature maps.
Efficient use of the kernel trick to maintain the complexity of input data.
High-dimensional interpolation with the representer theorem.
Use of graph representations to capture data distribution regardless of labels.
Incorporation of graph regularization to propagate label information throughout the
graph domain.
Primal and dual optimization methods for solving quadratic programming problems.

Xavier Bresson 7
8

Outline

Supervised classification

Linear SVM

Soft-margin SVM

Kernel techniques

Non-linear/kernel SVM

Graph SVM

Conclusion

Xavier Bresson 8
9

SVM formulation

Goal : Given a set 𝑉 of labeled data with two classes, the goal is to construct a
classification function 𝑓 that assigns the class for new, previously unseen data
point by maximizing the margin between the two classes[1]. Vladimir Vapnik

f : x 2 Rd ! { 1, 1}
<latexit sha1_base64="CLoGCCHhadNxGLb+eNOlZmbIsFc=">AAACvXicbVFNb9NAEF2bj5bw0QBHLisiqiIFyy5RiypVVOLCsSCSVsqmZr0eJ0vXa2t3TBJZ/pOIC/+GTWIJSBhppac3783MziSlkhbD8Jfn37l77/7e/oPOw0ePnxx0nz4b2aIyAoaiUIW5TrgFJTUMUaKC69IAzxMFV8nth1X+6jsYKwv9BZclTHI+1TKTgqOj4u7Pw+xswaRmOcdZktSfm5uUMiOnM+TGFHPK6jdRP2INY51DyhAWaPKaziXOaENH56xexLLPQKlYsiau5XnU3Og+dexO1ZktuYA6DE5F3vypdZRy5DQDjpUB+5q2rVr122CwVm86uJLtPNt+xRNQzkydO+72wiBcB90FUQt6pI3LuPuDpYWoctAoFLd2HIUlTmpuUAoFTYdVFtw0t3wKYwc1z8FO6vX2G/rKMSnNCuOeRrpm/3bUPLd2mSdOuVqH3c6tyP/lxhVm7ya11GWFoMWmUVYpigVdnZKm0oBAtXSACyPdrFTMuOEC3cFXS4i2v7wLRsdBdBIMPg16F8ftOvbJC/KSHJGInJIL8pFckiER3pn31ZPeN/+9D77y9Ubqe63nOfkn/PlvtZfW7w==</latexit>

with V = {xi , `i }ni=1 , xi 2 Rd (data features)


`i 2 { 1, 1} (data label)

Rd
<latexit sha1_base64="Yb/z2pnwgpGBQdVCOsOzQaFOQYo=">AAAB83icbVDLSsNAFL2pr1pfVZduBovgqiSlVJcFNy6r2Ac0sUwmk3boZBJmJkIJ/Q03LhRx68+482+ctFlo64GBwzn3cs8cP+FMadv+tkobm1vbO+Xdyt7+weFR9fikp+JUEtolMY/lwMeKciZoVzPN6SCRFEc+p31/epP7/ScqFYvFg54l1IvwWLCQEayN5LoR1hPfz+7nj8GoWrPr9gJonTgFqUGBzqj65QYxSSMqNOFYqaFjJ9rLsNSMcDqvuKmiCSZTPKZDQwWOqPKyReY5ujBKgMJYmic0Wqi/NzIcKTWLfDOZZ1SrXi7+5w1THV57GRNJqqkgy0NhypGOUV4ACpikRPOZIZhIZrIiMsESE21qqpgSnNUvr5Neo+606s27Zq3dKOoowxmcwyU4cAVtuIUOdIFAAs/wCm9War1Y79bHcrRkFTun8AfW5w80F5HB</latexit>

Positive label : Margin Negative label :


<latexit sha1_base64="dtIfBgzPMaR9DA2BFBqHb26yVPU=">AAACIHicbZBLSwMxFIUzvq2vqks3waIISpkpxYogCG5cVrC10ClDJnOnDWYeJneKZehPceNfceNCEd3przGtXfg6EPg4517CPX4qhUbbfrempmdm5+YXFgtLyyura8X1jaZOMsWhwROZqJbPNEgRQwMFSmilCljkS7jyr89G+VUflBZJfImDFDoR68YiFJyhsbxibZe6CLeooryeaIGiD1QyHyQ9pkPqugWT32QsoLeeOHBBSk+c7Du04BVLdtkei/4FZwIlMlHdK765QcKzCGLkkmndduwUOzlTKLiEYcHNNKSMX7MutA3GLALdyccHDumOcQIaJsq8GOnY/b6Rs0jrQeSbyYhhT//ORuZ/WTvD8KiTizjNEGL+9VGYSYoJHbVFA6GAoxwYYFyZejjlPaYYR9PpqATn98l/oVkpO4fl6kW1dFqZ1LFAtsg22SMOqZFTck7qpEE4uSMP5Ik8W/fWo/VivX6NTlmTnU3yQ9bHJ95doNQ=</latexit> <latexit sha1_base64="dor5rDHQJs3ZxFvKCsIBUwgtGWU=">AAACIHicbVDJSgNBEO1xjXGLevTSGBQPGmZCMCIIAS+eJIKJQiYMPZ1KbNKz2F0TEoZ8ihd/xYsHRfSmX2NnObg9KHi8V0VVPT+WQqNtf1gzs3PzC4uZpezyyuraem5js66jRHGo8UhG6sZnGqQIoYYCJdzECljgS7j2u2cj/7oHSosovMJBDM2AdULRFpyhkbxceY+6CH1UQXoBHSP2gErmg6QndEhdN2v8u4S1aN8TBy5I6YnTQ4dmvVzeLthj0L/EmZI8maLq5d7dVsSTAELkkmndcOwYmylTKLiEYdZNNMSMd1kHGoaGLADdTMcPDumuUVq0HSlTIdKx+n0iZYHWg8A3nQHDW/3bG4n/eY0E28fNVIRxghDyyaJ2IilGdJQWbQkFHOXAEMaVMLdSfssU42gyHYXg/H75L6kXC85RoXRZyleK0zgyZJvskH3ikDKpkHNSJTXCyT15JM/kxXqwnqxX623SOmNNZ7bID1ifX6uloLY=</latexit>

xi , `i = +1 x i , `i = 1
C+
Classification function : Classification function :
<latexit sha1_base64="EJWUjaSzWZgHb2oUtPBf7AJ4psE=">AAACL3icbVBNS8NAEN34bf2qevQyWBRFKUkpKoIgFMSjgtVCU8pmu9Glm03cnUhL6D/y4l/xIqKIV/+F21pEqw8GHu/NMDMvSKQw6LrPztj4xOTU9Mxsbm5+YXEpv7xyaeJUM15lsYx1LaCGS6F4FQVKXks0p1Eg+VXQrvT9qzuujYjVBXYT3ojotRKhYBSt1MyfbIKPvIM6yiqSGvNtQZgqNiCH0APfh5ztvE1pC8KtzvbRjrcLPnR8oaDS3IFmvuAW3QHgL/GGpECGOGvmH/1WzNKIK2T9vXXPTbCRUY2CSd7L+anhCWVtes3rlioacdPIBv/2YMMq9o5Y21IIA/XnREYjY7pRYDsjijdm1OuL/3n1FMODRiZUkiJX7GtRmErAGPrhQUtozlB2LaFMC3srsBuqKUMbcc6G4I2+/JdcloreXrF8Xi4cl4ZxzJA1sk62iEf2yTE5JWekShi5J4/khbw6D86T8+a8f7WOOcOZVfILzscnUQWmAQ==</latexit>

<latexit sha1_base64="NkE77X1zamHj+UaAtCddTCFZRZQ=">AAACL3icbVDLSgNBEJz1bXxFPXppDIqCht0QVARBCIhHBWOEbAizk1kdMju7zvRKwpI/8uKveBFRxKt/4eSB+CpoKKq66e4KEikMuu6zMzY+MTk1PTObm5tfWFzKL69cmjjVjFdZLGN9FVDDpVC8igIlv0o0p1EgeS1oV/p+7Y5rI2J1gd2ENyJ6rUQoGEUrNfMnm+Aj76COsoqkxnxZEKaKDcgh9MD3IWc7b1PagnCrs3206+2ADx1fKKg0d6GZL7hFdwD4S7wRKZARzpr5R78VszTiCll/b91zE2xkVKNgkvdyfmp4QlmbXvO6pYpG3DSywb892LCKvSPWthTCQP0+kdHImG4U2M6I4o357fXF/7x6iuFBIxMqSZErNlwUphIwhn540BKaM5RdSyjTwt4K7IZqytBGnLMheL9f/ksuS0Vvr1g+LxeOS6M4ZsgaWSdbxCP75JickjNSJYzck0fyQl6dB+fJeXPeh61jzmhmlfyA8/EJVzOmBQ==</latexit>

f (x) = +1, x 2 C+ f (x) = 1, x 2 C


C-

[1] Vapnik, Chervonenkis, On a perceptron class, 1964

Xavier Bresson 9
10

Linear SVM

Assumption[1] : Training and test datasets are linearly separable, i.e. data can be separated
with a straight line in 2D, a plane in 3D and a hyper-plane in higher dimensions.
A hyper-plane is parameterized with two variables (𝑤, 𝑏), where 𝑤 is the normal vector of the
hyper-plane, i.e. determining its slope, and 𝑏 is the offset or bias term :

Hyper-plane equation : {x : wT x + b = cte}, x, w 2 Rd , b 2 R


<latexit sha1_base64="pG5RaaOkCVV/RyfAWUHkJBSYUKY=">AAACVnicbVBdSxwxFM2Mter2w7F97Muli1CoXWZE2lIoCL74aIurws66JNm7Gkwy0+SO7jLMn6wv7U/pSzGzbqF+HAg5nHMPuTmi1MpTmv6O4qUny09XVtc6z56/eLmebLw68kXlJPZloQt3IrhHrSz2SZHGk9IhN0LjsbjYa/3jS3ReFfaQZiUODT+zaqIkpyCNEpMTTsmZej+Y7kOpuUXAH9Xchi/QQF7DNJCr08NwvwcBX+FfRhIGv9mCHKZbV7myueF0LkT9vTkdt6q4o3U6o6Sb9tI54CHJFqTLFjgYJT/zcSErg5ak5t4PsrSkYc0dKamx6eSVx5LLC36Gg0AtN+iH9byWBjaDMoZJ4cKxBHP1/0TNjfczI8Jku6O/77XiY96gosnnYa1sWRFaefvQpNJABbQdw1g5lKRngXDpVNgV5Dl3PPTlfFtCdv/LD8nRdi/72Nv5ttPd3V7UscresLfsHcvYJ7bL9tkB6zPJrtmfKI6Wol/R33g5XrkdjaNF5jW7gzi5AXs0sdo=</latexit>

Rd
<latexit sha1_base64="Yb/z2pnwgpGBQdVCOsOzQaFOQYo=">AAAB83icbVDLSsNAFL2pr1pfVZduBovgqiSlVJcFNy6r2Ac0sUwmk3boZBJmJkIJ/Q03LhRx68+482+ctFlo64GBwzn3cs8cP+FMadv+tkobm1vbO+Xdyt7+weFR9fikp+JUEtolMY/lwMeKciZoVzPN6SCRFEc+p31/epP7/ScqFYvFg54l1IvwWLCQEayN5LoR1hPfz+7nj8GoWrPr9gJonTgFqUGBzqj65QYxSSMqNOFYqaFjJ9rLsNSMcDqvuKmiCSZTPKZDQwWOqPKyReY5ujBKgMJYmic0Wqi/NzIcKTWLfDOZZ1SrXi7+5w1THV57GRNJqqkgy0NhypGOUV4ACpikRPOZIZhIZrIiMsESE21qqpgSnNUvr5Neo+606s27Zq3dKOoowxmcwyU4cAVtuIUOdIFAAs/wCm9War1Y79bHcrRkFTun8AfW5w80F5HB</latexit>

Rd
<latexit sha1_base64="Yb/z2pnwgpGBQdVCOsOzQaFOQYo=">AAAB83icbVDLSsNAFL2pr1pfVZduBovgqiSlVJcFNy6r2Ac0sUwmk3boZBJmJkIJ/Q03LhRx68+482+ctFlo64GBwzn3cs8cP+FMadv+tkobm1vbO+Xdyt7+weFR9fikp+JUEtolMY/lwMeKciZoVzPN6SCRFEc+p31/epP7/ScqFYvFg54l1IvwWLCQEayN5LoR1hPfz+7nj8GoWrPr9gJonTgFqUGBzqj65QYxSSMqNOFYqaFjJ9rLsNSMcDqvuKmiCSZTPKZDQwWOqPKyReY5ujBKgMJYmic0Wqi/NzIcKTWLfDOZZ1SrXi7+5w1THV57GRNJqqkgy0NhypGOUV4ACpikRPOZIZhIZrIiMsESE21qqpgSnNUvr5Neo+606s27Zq3dKOoowxmcwyU4cAVtuIUOdIFAAs/wCm9War1Y79bHcrRkFTun8AfW5w80F5HB</latexit>

w
<latexit sha1_base64="/LwItP9ylVqseki+Y10NAb1LyKE=">AAAB6XicbVBNS8NAEJ3Ur1q/qh69LBbBU0lKUY8FLx6r2FZoQ9lsN+3SzSbsTpRS+g+8eFDEq//Im//GTZuDtj4YeLw3w8y8IJHCoOt+O4W19Y3NreJ2aWd3b/+gfHjUNnGqGW+xWMb6IaCGS6F4CwVK/pBoTqNA8k4wvs78ziPXRsTqHicJ9yM6VCIUjKKV7p5K/XLFrbpzkFXi5aQCOZr98ldvELM04gqZpMZ0PTdBf0o1Cib5rNRLDU8oG9Mh71qqaMSNP51fOiNnVhmQMNa2FJK5+ntiSiNjJlFgOyOKI7PsZeJ/XjfF8MqfCpWkyBVbLApTSTAm2dtkIDRnKCeWUKaFvZWwEdWUoQ0nC8FbfnmVtGtV76Jav61XGrU8jiKcwCmcgweX0IAbaEILGITwDK/w5oydF+fd+Vi0Fpx85hj+wPn8ARdljQg=</latexit>

C+
Hyperplane b
<latexit sha1_base64="tlrRUikETn3wJ5FJHdiRRQ+2QOM=">AAAB6HicbVBNS8NAEJ34WetX1aOXxSJ4Kkkp6rHgxWML9gPaUDbbSbt2swm7G6GE/gIvHhTx6k/y5r9x2+agrQ8GHu/NMDMvSATXxnW/nY3Nre2d3cJecf/g8Oi4dHLa1nGqGLZYLGLVDahGwSW2DDcCu4lCGgUCO8Hkbu53nlBpHssHM03Qj+hI8pAzaqzUDAalsltxFyDrxMtJGXI0BqWv/jBmaYTSMEG17nluYvyMKsOZwFmxn2pMKJvQEfYslTRC7WeLQ2fk0ipDEsbKljRkof6eyGik9TQKbGdEzVivenPxP6+XmvDWz7hMUoOSLReFqSAmJvOvyZArZEZMLaFMcXsrYWOqKDM2m6INwVt9eZ20qxXvulJr1sr1ah5HAc7hAq7Agxuowz00oAUMEJ7hFd6cR+fFeXc+lq0bTj5zBn/gfP4Awv+M3w==</latexit>

wT x + b = cte
<latexit sha1_base64="FZKYOFtyAOVGM9rMrXf9jguB0Fo=">AAACBHicbVC7SgNBFJ31GdfXqmWawSAIQtgNQW2EgI1lhLwgiWF2MpsMmX0wc1cTlhQ2/oqNhSK2foSdf+NssoUmHrhwOOfemXuPGwmuwLa/jZXVtfWNzdyWub2zu7dvHRw2VBhLyuo0FKFsuUQxwQNWBw6CtSLJiO8K1nRH16nfvGdS8TCowSRiXZ8MAu5xSkBLPSv/cFfDY3yGXXyFO8DGIP2EApuaptmzCnbRngEvEycjBZSh2rO+Ov2Qxj4LgAqiVNuxI+gmRAKnQj/ZiRWLCB2RAWtrGhCfqW4yO2KKT7TSx14odQWAZ+rviYT4Sk18V3f6BIZq0UvF/7x2DN5lN+FBFAML6PwjLxYYQpwmgvtcMgpiogmhkutdMR0SSXQKUqUhOIsnL5NGqeicF8u35UKllMWRQ3l0jE6Rgy5QBd2gKqojih7RM3pFb8aT8WK8Gx/z1hUjmzlCf2B8/gCEh5YP</latexit>

separator between
the two classes
C-

[1] Vapnik, Chervonenkis, On a perceptron class, 1964

Xavier Bresson 10
11

SVM classifier

Classification function :
<latexit sha1_base64="rgV+0kBEqj64uFz0l9SqELyaSYs=">AAACgHicfVFNa9tAEF2paZOqH3GbYy5LTIuLY0cKoS2BQiCXHlOIk4DXFav1SF6yWondUWMj9Dv6v3rrjylk5YjSJqXv9Hhv3uzsTFIqaTEMf3r+o43HTza3ngbPnr94ud179frCFpURMBGFKsxVwi0oqWGCEhVclQZ4nii4TK5PW//yGxgrC32OqxJmOc+0TKXg6KS49z2N65v9pBks39FPlCEs0eS1lZluBjdfz+mSDmnSWgFTkCKrA5ZAJnXNjeGrplaqCYYRffs7StPC0MblmNT0NB4yFoz+44+cz0DPu34BMzJb4DiIe/1wHK5BH5KoI33S4Szu/WDzQlQ5aBSKWzuNwhJnri1KocA1riyUXFzzDKaOap6DndXrBTb0jVPm68nSQiNdq38map5bu8oTV5lzXNj7Xiv+y5tWmH6c1VKXFYIWdw+llaJY0PYadC4NCFQrR7gw0s1KxYIbLtDdrF1CdP/LD8nF4Th6Pz76ctQ/OezWsUV2yR4ZkIh8ICfkMzkjEyLIL6/v7Xsj3/cH/oEf3ZX6XpfZIX/BP74F8rG9Aw==</latexit>


+1 for x 2 C+
fw,b (x) = sign(wT x + b) =
1 for x 2 C

{x : wT x + b > 0}
<latexit sha1_base64="PhjA9Nkkb5ELNsT/sLnXHzZ3K6I=">AAACAXicbVDLSsNAFJ3UV42vqBvBzWARBKEkpai4kIIblxX6gjaWyXTSDp1MwsxEW0Ld+CtuXCji1r9w5984abPQ1gMXDufcy733eBGjUtn2t5FbWl5ZXcuvmxubW9s71u5eQ4axwKSOQxaKlockYZSTuqKKkVYkCAo8Rpre8Dr1m/dESBrymhpHxA1Qn1OfYqS01LUOOskIXsKHuxocwVPowStodyamaXatgl20p4CLxMlIAWSodq2vTi/EcUC4wgxJ2XbsSLkJEopiRiZmJ5YkQniI+qStKUcBkW4y/WACj7XSg34odHEFp+rviQQFUo4DT3cGSA3kvJeK/3ntWPkXbkJ5FCvC8WyRHzOoQpjGAXtUEKzYWBOEBdW3QjxAAmGlQ0tDcOZfXiSNUtE5K5Zvy4VKKYsjDw7BETgBDjgHFXADqqAOMHgEz+AVvBlPxovxbnzMWnNGNrMP/sD4/AFcJZOQ</latexit>

Rd
<latexit sha1_base64="Yb/z2pnwgpGBQdVCOsOzQaFOQYo=">AAAB83icbVDLSsNAFL2pr1pfVZduBovgqiSlVJcFNy6r2Ac0sUwmk3boZBJmJkIJ/Q03LhRx68+482+ctFlo64GBwzn3cs8cP+FMadv+tkobm1vbO+Xdyt7+weFR9fikp+JUEtolMY/lwMeKciZoVzPN6SCRFEc+p31/epP7/ScqFYvFg54l1IvwWLCQEayN5LoR1hPfz+7nj8GoWrPr9gJonTgFqUGBzqj65QYxSSMqNOFYqaFjJ9rLsNSMcDqvuKmiCSZTPKZDQwWOqPKyReY5ujBKgMJYmic0Wqi/NzIcKTWLfDOZZ1SrXi7+5w1THV57GRNJqqkgy0NhypGOUV4ACpikRPOZIZhIZrIiMsESE21qqpgSnNUvr5Neo+606s27Zq3dKOoowxmcwyU4cAVtuIUOdIFAAs/wCm9War1Y79bHcrRkFTun8AfW5w80F5HB</latexit>

C- w
<latexit sha1_base64="/LwItP9ylVqseki+Y10NAb1LyKE=">AAAB6XicbVBNS8NAEJ3Ur1q/qh69LBbBU0lKUY8FLx6r2FZoQ9lsN+3SzSbsTpRS+g+8eFDEq//Im//GTZuDtj4YeLw3w8y8IJHCoOt+O4W19Y3NreJ2aWd3b/+gfHjUNnGqGW+xWMb6IaCGS6F4CwVK/pBoTqNA8k4wvs78ziPXRsTqHicJ9yM6VCIUjKKV7p5K/XLFrbpzkFXi5aQCOZr98ldvELM04gqZpMZ0PTdBf0o1Cib5rNRLDU8oG9Mh71qqaMSNP51fOiNnVhmQMNa2FJK5+ntiSiNjJlFgOyOKI7PsZeJ/XjfF8MqfCpWkyBVbLApTSTAm2dtkIDRnKCeWUKaFvZWwEdWUoQ0nC8FbfnmVtGtV76Jav61XGrU8jiKcwCmcgweX0IAbaEILGITwDK/w5oydF+fd+Vi0Fpx85hj+wPn8ARdljQg=</latexit>

b C+
<latexit sha1_base64="tlrRUikETn3wJ5FJHdiRRQ+2QOM=">AAAB6HicbVBNS8NAEJ34WetX1aOXxSJ4Kkkp6rHgxWML9gPaUDbbSbt2swm7G6GE/gIvHhTx6k/y5r9x2+agrQ8GHu/NMDMvSATXxnW/nY3Nre2d3cJecf/g8Oi4dHLa1nGqGLZYLGLVDahGwSW2DDcCu4lCGgUCO8Hkbu53nlBpHssHM03Qj+hI8pAzaqzUDAalsltxFyDrxMtJGXI0BqWv/jBmaYTSMEG17nluYvyMKsOZwFmxn2pMKJvQEfYslTRC7WeLQ2fk0ipDEsbKljRkof6eyGik9TQKbGdEzVivenPxP6+XmvDWz7hMUoOSLReFqSAmJvOvyZArZEZMLaFMcXsrYWOqKDM2m6INwVt9eZ20qxXvulJr1sr1ah5HAc7hAq7Agxuowz00oAUMEJ7hFd6cR+fFeXc+lq0bTj5zBn/gfP4Awv+M3w==</latexit>

{x : wT x + b = 0}
<latexit sha1_base64="zTp8TLAm+EMRA+Ai2HxvsrR649c=">AAACAXicbVDLSsNAFJ3UV42vqBvBzWARBKEkpagIQsGNywp9QRvLZDpph04mYWaiLaFu/BU3LhRx61+482+ctFlo64ELh3Pu5d57vIhRqWz728gtLa+sruXXzY3Nre0da3evIcNYYFLHIQtFy0OSMMpJXVHFSCsSBAUeI01veJ36zXsiJA15TY0j4gaoz6lPMVJa6loHnWQEL+HDXQ2O4Cn04BW0OxPTNLtWwS7aU8BF4mSkADJUu9ZXpxfiOCBcYYakbDt2pNwECUUxIxOzE0sSITxEfdLWlKOASDeZfjCBx1rpQT8UuriCU/X3RIICKceBpzsDpAZy3kvF/7x2rPwLN6E8ihXheLbIjxlUIUzjgD0qCFZsrAnCgupbIR4ggbDSoaUhOPMvL5JGqeicFcu35UKllMWRB4fgCJwAB5yDCrgBVVAHGDyCZ/AK3own48V4Nz5mrTkjm9kHf2B8/gBampOP</latexit>

T
<latexit sha1_base64="YsDtYFXaqftXPThk9jElJWeYG+g=">AAACAXicbVDLSsNAFJ3UV42vqBvBzWARBKEkpaiIi4IblxX6gjaWyXTSDp1MwsxEW0Ld+CtuXCji1r9w5984abPQ1gMXDufcy733eBGjUtn2t5FbWl5ZXcuvmxubW9s71u5eQ4axwKSOQxaKlockYZSTuqKKkVYkCAo8Rpre8Dr1m/dESBrymhpHxA1Qn1OfYqS01LUOOskIXsKHuxocwVPowStodyamaXatgl20p4CLxMlIAWSodq2vTi/EcUC4wgxJ2XbsSLkJEopiRiZmJ5YkQniI+qStKUcBkW4y/WACj7XSg34odHEFp+rviQQFUo4DT3cGSA3kvJeK/3ntWPkXbkJ5FCvC8WyRHzOoQpjGAXtUEKzYWBOEBdW3QjxAAmGlQ0tDcOZfXiSNUtE5K5Zvy4VKKYsjDw7BETgBDjgHFXADqqAOMHgEz+AVvBlPxovxbnzMWnNGNrMP/sD4/AFZD5OO</latexit>

{x : w x + b < 0}

Xavier Bresson 11
12

Maximizing class margin

Hyper-plane 𝑤𝑇𝑥 + 𝑏 = 0 is the class separator.


Hyper-planes 𝑤𝑇𝑥 + 𝑏 = ±1 are the class margins.
Why do we want to maximize the margin?
Note that multiple hyper-plane solutions exist to separate the two classes.
Let us select the solution that generalizes the best, i.e. the solution with the largest
margin between the classes.

d
R
<latexit sha1_base64="Yb/z2pnwgpGBQdVCOsOzQaFOQYo=">AAAB83icbVDLSsNAFL2pr1pfVZduBovgqiSlVJcFNy6r2Ac0sUwmk3boZBJmJkIJ/Q03LhRx68+482+ctFlo64GBwzn3cs8cP+FMadv+tkobm1vbO+Xdyt7+weFR9fikp+JUEtolMY/lwMeKciZoVzPN6SCRFEc+p31/epP7/ScqFYvFg54l1IvwWLCQEayN5LoR1hPfz+7nj8GoWrPr9gJonTgFqUGBzqj65QYxSSMqNOFYqaFjJ9rLsNSMcDqvuKmiCSZTPKZDQwWOqPKyReY5ujBKgMJYmic0Wqi/NzIcKTWLfDOZZ1SrXi7+5w1THV57GRNJqqkgy0NhypGOUV4ACpikRPOZIZhIZrIiMsESE21qqpgSnNUvr5Neo+606s27Zq3dKOoowxmcwyU4cAVtuIUOdIFAAs/wCm9War1Y79bHcrRkFTun8AfW5w80F5HB</latexit>

Rd
<latexit sha1_base64="Yb/z2pnwgpGBQdVCOsOzQaFOQYo=">AAAB83icbVDLSsNAFL2pr1pfVZduBovgqiSlVJcFNy6r2Ac0sUwmk3boZBJmJkIJ/Q03LhRx68+482+ctFlo64GBwzn3cs8cP+FMadv+tkobm1vbO+Xdyt7+weFR9fikp+JUEtolMY/lwMeKciZoVzPN6SCRFEc+p31/epP7/ScqFYvFg54l1IvwWLCQEayN5LoR1hPfz+7nj8GoWrPr9gJonTgFqUGBzqj65QYxSSMqNOFYqaFjJ9rLsNSMcDqvuKmiCSZTPKZDQwWOqPKyReY5ujBKgMJYmic0Wqi/NzIcKTWLfDOZZ1SrXi7+5w1THV57GRNJqqkgy0NhypGOUV4ACpikRPOZIZhIZrIiMsESE21qqpgSnNUvr5Neo+606s27Zq3dKOoowxmcwyU4cAVtuIUOdIFAAs/wCm9War1Y79bHcrRkFTun8AfW5w80F5HB</latexit>

Margin Margin

C+
wT x + b = +1
<latexit sha1_base64="cYt5un+mE+nRyI3b/HVx6a9CQyk=">AAAB+nicbVBNS8NAEJ34WeNXqkcvi0UQCiUpRb0IBS8eK/QL2lg22227dLMJuxtrif0pXjwo4tVf4s1/Y9LmoK0PBh7vzTAzzws5U9q2v4219Y3Nre3cjrm7t39waOWPmiqIJKENEvBAtj2sKGeCNjTTnLZDSbHvcdryxjep33qgUrFA1PU0pK6Ph4INGME6kXpWfnJfR4+oiDx0jYqOaZo9q2CX7DnQKnEyUoAMtZ711e0HJPKp0IRjpTqOHWo3xlIzwunM7EaKhpiM8ZB2EiqwT5Ubz0+fobNE6aNBIJMSGs3V3xMx9pWa+l7S6WM9UsteKv7ndSI9uHJjJsJIU0EWiwYRRzpAaQ6ozyQlmk8Tgolkya2IjLDERCdppSE4yy+vkma55FyUKneVQrWcxZGDEziFc3DgEqpwCzVoAIEJPMMrvBlPxovxbnwsWteMbOYY/sD4/AFv+ZDT</latexit>

wT x + b = 0
<latexit sha1_base64="6Hj3XCzQ4PFOiFUZ++lbAS8/eIQ=">AAAB+XicbVDLSsNAFL2prxpfUZduBosgCCUpRd0IBTcuK/QFbSyT6aQdOnkwM6mW0D9x40IRt/6JO//GSZuFth64cDjnXu69x4s5k8q2v43C2vrG5lZx29zZ3ds/sA6PWjJKBKFNEvFIdDwsKWchbSqmOO3EguLA47TtjW8zvz2hQrIobKhpTN0AD0PmM4KVlvqW9fjQQE/oAnnoBtmmafatkl2250CrxMlJCXLU+9ZXbxCRJKChIhxL2XXsWLkpFooRTmdmL5E0xmSMh7SraYgDKt10fvkMnWllgPxI6AoVmqu/J1IcSDkNPN0ZYDWSy14m/ud1E+VfuykL40TRkCwW+QlHKkJZDGjABCWKTzXBRDB9KyIjLDBROqwsBGf55VXSqpSdy3L1vlqqVfI4inACp3AODlxBDe6gDk0gMIFneIU3IzVejHfjY9FaMPKZY/gD4/MHBG6QnQ==</latexit>

C- C+ C- wT x + b = 1
<latexit sha1_base64="UOVpsKwvB/lWimSbUDA4X0IvhZQ=">AAAB+nicbVDLSgNBEOz1GdfXRo9eBoMgiGE3BPUiBLx4jJAXJGuYncwmQ2YfzMwaQ8ynePGgiFe/xJt/42yyB00saCiquunu8mLOpLLtb2NldW19YzO3ZW7v7O7tW/mDhowSQWidRDwSLQ9LyllI64opTluxoDjwOG16w5vUbz5QIVkU1tQ4pm6A+yHzGcFKS10rP7qvoUd0hjx0jc4d0zS7VsEu2jOgZeJkpAAZql3rq9OLSBLQUBGOpWw7dqzcCRaKEU6nZieRNMZkiPu0rWmIAyrdyez0KTrRSg/5kdAVKjRTf09McCDlOPB0Z4DVQC56qfif106Uf+VOWBgnioZkvshPOFIRSnNAPSYoUXysCSaC6VsRGWCBidJppSE4iy8vk0ap6FwUy3flQqWUxZGDIziGU3DgEipwC1WoA4ERPMMrvBlPxovxbnzMW1eMbOYQ/sD4/AFzCZDV</latexit>

Xavier Bresson 12
13

Maximizing class margin

What are the parameters (𝑤, 𝑏) that maximize the margin 𝑑 between the training points?
x 2 Rd
<latexit sha1_base64="yeq7Vg59o/CPRf/bnZ9g6CAvwtU=">AAAE4XichVRNb9NAEHWbACV8tIUjlxEtqCjESqIKEFKlSpWAA5VK1S+pm1hre52sul6n3nWc4ObGhQMIceVfceOXcGXWcdMkRbCXTGbmvXkzu2O3J7jS9fqvhcVS+cbNW0u3K3fu3ru/vLL64EhFSeyxQy8SUXziUsUEl+xQcy3YSS9mNHQFO3bPdkz8uM9ixSN5oIc91gppR/KAe1Sjy1ld/P0UiGYDHYfZLo07XAJX4LMA+XxIue6C7jLoM09HMYzA3xo41drAqREuSUh113Wz/VHbJ6RyRfSW95lEHNWISNsHgBioggtbUG0AmeQBlf4ko1Zk1Brz8RlulbhKx9TTXHaMNMVApxGY/hW8Ltg2CpHPkK+J9fZ5p6tpHEcpYHmT4eeRaeKDrtFMzwpekFEc5oTkIiUXTtMmF775NUCY5cR/kxgJUFzWHGUFbE7+Gy6pEMPnhjekA8f/DxbIexboeLoWArl0UjIW1m46zemRKlvbhlwgjGQV4jK80wyxdDjKhBhVxtPm+bRJh52bK7nSBzxAtInjBcOOU0X11xC1fyFqiCBM+kXNCsnF27Nj2KUDHvKPl7P2BFUKwsn7Y+cJ71PBpAYdAbZrktnVtUQBrKfrc6NNu1wwUPiwVTC8ZBbUZQK8SJpHw6VWtkFBpeKsrNXten7gutEojDWrOHvOyk/iR14Soqhc7mmj3tMt7FJzTzDsM1GsR70z2mGnaEoaMtXK8g0dwRP0+BDgCgURNpV7pxEZDZUahi5mmqVS8zHj/FvsNNHBq1bGZS/RTHrjQkEizNjMuoPPY1xdMUSDejFHreB1qdkf/CiYITTmW75uHDXtxgt788Pm2nazGMeS9ch6bG1YDeultW29s/asQ8srtUufSl9KX8te+XP5W/n7OHVxocA8tGZO+ccfZYKRAQ==</latexit>

Margin is defined with the vector d = x+


Given that wT x+ + b = +1 and wT x + b = 1 and
substracting these two lines : wT (x+ x ) = 2 ) wT d = 2
2
Then taking the norm : kwk2 .kdk2 = 2 ) kdk2 =
kwk2
⇢ T
2 w xi + b +1 if xi 2 C+
Finally, max kdk2 = , min kwk22 s.t.
d kwk2 w w T xi + b 1 if xi 2 C
Maximizing the class margin is equivalent to minimize the norm of w
while satisfying the label constraints.
Margin x+ Rd
<latexit sha1_base64="Yb/z2pnwgpGBQdVCOsOzQaFOQYo=">AAAB83icbVDLSsNAFL2pr1pfVZduBovgqiSlVJcFNy6r2Ac0sUwmk3boZBJmJkIJ/Q03LhRx68+482+ctFlo64GBwzn3cs8cP+FMadv+tkobm1vbO+Xdyt7+weFR9fikp+JUEtolMY/lwMeKciZoVzPN6SCRFEc+p31/epP7/ScqFYvFg54l1IvwWLCQEayN5LoR1hPfz+7nj8GoWrPr9gJonTgFqUGBzqj65QYxSSMqNOFYqaFjJ9rLsNSMcDqvuKmiCSZTPKZDQwWOqPKyReY5ujBKgMJYmic0Wqi/NzIcKTWLfDOZZ1SrXi7+5w1THV57GRNJqqkgy0NhypGOUV4ACpikRPOZIZhIZrIiMsESE21qqpgSnNUvr5Neo+606s27Zq3dKOoowxmcwyU4cAVtuIUOdIFAAs/wCm9War1Y79bHcrRkFTun8AfW5w80F5HB</latexit>

<latexit sha1_base64="UoY53Ma0XCiBQIRb12Z9eG6lvMs=">AAACHXicbVBNSwMxEM36WetX1aOXYFEEseyWol4EwYsXQcHaQreUbDptg0l2SWalZekf8eJf8eJBEQ9exH9jWnvw60HC470ZZuZFiRQWff/Dm5qemZ2bzy3kF5eWV1YLa+vXNk4NhyqPZWzqEbMghYYqCpRQTwwwFUmoRTenI792C8aKWF/hIIGmYl0tOoIzdFKrUNmhYc8mjEMWcDUMEfpoVHbOTFfoYRjmd2ibHtN+a4/uu3+fOinfKhT9kj8G/UuCCSmSCS5ahbewHfNUgUYumbWNwE+wmTGDgksY5sPUgtvhhnWh4ahmCmwzG183pNtOadNObNzTSMfq946MKWsHKnKVimHP/vZG4n9eI8XOUTMTOkkRNP8a1EklxZiOoqJtYYCjHDjCuBFuV8p7zDCOLtBRCMHvk/+S63IpOChVLivFk/IkjhzZJFtklwTkkJyQM3JBqoSTO/JAnsizd+89ei/e61fplDfp2SA/4L1/Ai4mn1k=</latexit>

<latexit sha1_base64="RO+zjSPBLXohA4XeeAQ4af+o74c=">AAAB7XicbVBNSwMxEJ2tX3X9qnr0EiyCIJTdUtRjwYvHCvYD2qVk02wbm02WJCuW0v/gxYMiXv0/3vw3Zts9aOuDgcd7M8zMCxPOtPG8b6ewtr6xuVXcdnd29/YPSodHLS1TRWiTSC5VJ8SaciZo0zDDaSdRFMchp+1wfJP57UeqNJPi3kwSGsR4KFjECDZWaj31L1zX7ZfKXsWbA60SPydlyNHol756A0nSmApDONa663uJCaZYGUY4nbm9VNMEkzEe0q6lAsdUB9P5tTN0ZpUBiqSyJQyaq78npjjWehKHtjPGZqSXvUz8z+umJroOpkwkqaGCLBZFKUdGoux1NGCKEsMnlmCimL0VkRFWmBgbUBaCv/zyKmlVK/5lpXZXK9ereRxFOIFTOAcfrqAOt9CAJhB4gGd4hTdHOi/Ou/OxaC04+cwx/IHz+QOgEI3P</latexit>

d = x+ x
x
<latexit sha1_base64="CuYha6GAIPUTTiVQT/emKuYidJQ=">AAAB7XicbVBNSwMxEJ2tX3X9qnr0EiyCF8tuKeqx4MVjBfsB7VKyabaNzSZLkhVL6X/w4kERr/4fb/4bs+0etPXBwOO9GWbmhQln2njet1NYW9/Y3Cpuuzu7e/sHpcOjlpapIrRJJJeqE2JNORO0aZjhtJMoiuOQ03Y4vsn89iNVmklxbyYJDWI8FCxiBBsrtZ76F67r9ktlr+LNgVaJn5My5Gj0S1+9gSRpTIUhHGvd9b3EBFOsDCOcztxeqmmCyRgPaddSgWOqg+n82hk6s8oARVLZEgbN1d8TUxxrPYlD2xljM9LLXib+53VTE10HUyaS1FBBFouilCMjUfY6GjBFieETSzBRzN6KyAgrTIwNKAvBX355lbSqFf+yUrurlevVPI4inMApnIMPV1CHW2hAEwg8wDO8wpsjnRfn3flYtBacfOYY/sD5/AGjHo3R</latexit>

C+

wT x + b = +1
<latexit sha1_base64="cYt5un+mE+nRyI3b/HVx6a9CQyk=">AAAB+nicbVBNS8NAEJ34WeNXqkcvi0UQCiUpRb0IBS8eK/QL2lg22227dLMJuxtrif0pXjwo4tVf4s1/Y9LmoK0PBh7vzTAzzws5U9q2v4219Y3Nre3cjrm7t39waOWPmiqIJKENEvBAtj2sKGeCNjTTnLZDSbHvcdryxjep33qgUrFA1PU0pK6Ph4INGME6kXpWfnJfR4+oiDx0jYqOaZo9q2CX7DnQKnEyUoAMtZ711e0HJPKp0IRjpTqOHWo3xlIzwunM7EaKhpiM8ZB2EiqwT5Ubz0+fobNE6aNBIJMSGs3V3xMx9pWa+l7S6WM9UsteKv7ndSI9uHJjJsJIU0EWiwYRRzpAaQ6ozyQlmk8Tgolkya2IjLDERCdppSE4yy+vkma55FyUKneVQrWcxZGDEziFc3DgEqpwCzVoAIEJPMMrvBlPxovxbnwsWteMbOYY/sD4/AFv+ZDT</latexit>

wT x + b = 0
<latexit sha1_base64="6Hj3XCzQ4PFOiFUZ++lbAS8/eIQ=">AAAB+XicbVDLSsNAFL2prxpfUZduBosgCCUpRd0IBTcuK/QFbSyT6aQdOnkwM6mW0D9x40IRt/6JO//GSZuFth64cDjnXu69x4s5k8q2v43C2vrG5lZx29zZ3ds/sA6PWjJKBKFNEvFIdDwsKWchbSqmOO3EguLA47TtjW8zvz2hQrIobKhpTN0AD0PmM4KVlvqW9fjQQE/oAnnoBtmmafatkl2250CrxMlJCXLU+9ZXbxCRJKChIhxL2XXsWLkpFooRTmdmL5E0xmSMh7SraYgDKt10fvkMnWllgPxI6AoVmqu/J1IcSDkNPN0ZYDWSy14m/ud1E+VfuykL40TRkCwW+QlHKkJZDGjABCWKTzXBRDB9KyIjLDBROqwsBGf55VXSqpSdy3L1vlqqVfI4inACp3AODlxBDe6gDk0gMIFneIU3IzVejHfjY9FaMPKZY/gD4/MHBG6QnQ==</latexit>

C-
wT x + b =
<latexit sha1_base64="UOVpsKwvB/lWimSbUDA4X0IvhZQ=">AAAB+nicbVDLSgNBEOz1GdfXRo9eBoMgiGE3BPUiBLx4jJAXJGuYncwmQ2YfzMwaQ8ynePGgiFe/xJt/42yyB00saCiquunu8mLOpLLtb2NldW19YzO3ZW7v7O7tW/mDhowSQWidRDwSLQ9LyllI64opTluxoDjwOG16w5vUbz5QIVkU1tQ4pm6A+yHzGcFKS10rP7qvoUd0hjx0jc4d0zS7VsEu2jOgZeJkpAAZql3rq9OLSBLQUBGOpWw7dqzcCRaKEU6nZieRNMZkiPu0rWmIAyrdyez0KTrRSg/5kdAVKjRTf09McCDlOPB0Z4DVQC56qfif106Uf+VOWBgnioZkvshPOFIRSnNAPSYoUXysCSaC6VsRGWCBidJppSE4iy8vk0ap6FwUy3flQqWUxZGDIziGU3DgEipwC1WoA4ERPMMrvBlPxovxbnzMW1eMbOYQ/sD4/AFzCZDV</latexit>

1
Xavier Bresson 13
14

Primal optimization problem

Optimal value 𝑤 ∗ is the solution of a constrained quadratic programming (QP) problem :

min kwk22 s.t. `i .si


<latexit sha1_base64="JTgpKNBiGehlbyj4zioHTNf5tcQ=">AAADvnicjVJbb9MwFE4TLiPcOnjk5YiKCaksaqoJkBBSpb3wOKS1m1R3keM4janjpLZDW2X9k4gX/g12VhjrLuI8fTqX73z+fOKSM6V7vV8t17t3/8HDnUf+4ydPnz1v774YqaKShA5JwQt5GmNFORN0qJnm9LSUFOcxpyfx7NDWT75TqVghjvWqpJMcTwVLGcHapKLd1s89QDkT0QIQoPMFOo/6Z32LNV1qmdegAh3A2mYo5xELVMQATekcwnc2mRYScw4MMQEjQMg3fPMKJ5cEC6azhsBOfobF2fEyYt3YQMRpqlHto5hOmaixlHi1rjlf+82Cbgh7lzRmkWFZgl10GHXNJjM+h/07mvZtExXJhtlHkk0zHfh/27FIYP3nZXcKuqqFpVtS9m8v3yLiBqcWGSMZECwgpkCKvMRE8xXQpflRpWgCWFkf/+MbonanF/SagOsg3ICOs4mjqP0DJQWpcio04Vipcdgr9cRI1oxwakRXiho5MzylYwMFzqma1M35reFNZcVZ69NCaGiy/07UOFdqlcemM8c6U9s1m7ypNq50+nFSM1FWmgpysSitOOgC7C1DwiRtLEoYJpIZrUAyLI1t5uJ9Y0K4/eTrYNQPwvfBwdeDzqC/sWPHeeW8dt46ofPBGThfnCNn6BD3k4vdb+7MG3ipl3vFRavb2sy8dK6Et/wNCsQqZQ==</latexit>

1, 8i 2 V
w
⇢ ⇢
+1 for x 2 C+ +1 if x 2 C+
with si = wT xi + b = and `i =
 1 for x 2 C 1 if x 2 C
which can be compactly expressed as `i .si 1, 8i 2 V

s i = w T xi + b 
<latexit sha1_base64="+XyepDgxYLrRMch+kySwgDc5B64=">AAACEnicbVDLSgNBEJz1GddX1KOXwWBQJGE3BPUiBLx4jJAXZOMyO+kkQ2YfzsyqIeQbvPgrXjwo4tWTN//GSbIHTSxoKKq66e7yIs6ksqxvY2FxaXllNbVmrm9sbm2nd3ZrMowFhSoNeSgaHpHAWQBVxRSHRiSA+B6Hute/HPv1OxCShUFFDSJo+aQbsA6jRGnJTR9nsXQZvsD3N5UHl5142OFwi3M2dhwzix3gfGLnbDedsfLWBHie2AnJoARlN/3ltEMa+xAoyomUTduKVGtIhGKUw8h0YgkRoX3ShaamAfFBtoaTl0b4UCtt3AmFrkDhifp7Ykh8KQe+pzt9onpy1huL/3nNWHXOW0MWRLGCgE4XdWKOVYjH+eA2E0AVH2hCqGD6Vkx7RBCqdIqmDsGefXme1Ap5+zRfvC5mSoUkjhTaRwfoCNnoDJXQFSqjKqLoET2jV/RmPBkvxrvxMW1dMJKZPfQHxucP8MSZ0g==</latexit>

1
`i = 1
C+

s i = w T xi + b
<latexit sha1_base64="xuYfKjuvVO3aoiLHDLNfPa20Io4=">AAACEnicbVDLSgNBEJyNrxhfUY9eBoNBCYTdENSLEPDiMUJekF2X2UknGTL7cGZWDSHf4MVf8eJBEa+evPk3TjY5aGJBQ1HVTXeXF3EmlWl+G6ml5ZXVtfR6ZmNza3snu7vXkGEsKNRpyEPR8ogEzgKoK6Y4tCIBxPc4NL3B5cRv3oGQLAxqahiB45NewLqMEqUlN3uSx9Jl+ALf39QeXFbwsN2DW1ywsG1n8tgGzhO7YLnZnFk0E+BFYs1IDs1QdbNfdieksQ+BopxI2bbMSDkjIhSjHMYZO5YQETogPWhrGhAfpDNKXhrjI610cDcUugKFE/X3xIj4Ug59T3f6RPXlvDcR//PaseqeOyMWRLGCgE4XdWOOVYgn+eAOE0AVH2pCqGD6Vkz7RBCqdIoZHYI1//IiaZSK1mmxfF3OVUqzONLoAB2iY2ShM1RBV6iK6oiiR/SMXtGb8WS8GO/Gx7Q1Zcxm9tEfGJ8/4oyZyQ==</latexit>

+1
`i = +1
C-

Xavier Bresson 14
15

Primal optimization problem

There exists a unique solution to the QP[1,2,3] optimization problem, if the assumption of
linearly separable data points is satisfied.
Variable 𝑤 is called the primal variable.

w? = arg min kwk22 s.t. `i .si fSVM (x) = sign((w? )T x + b? )


<latexit sha1_base64="t/NEqmYtEOplFwiKgSkaz2KOwLM=">AAACmXicbZFtb9MwEMed8DTKUwGJN3tzogJ1AkVJNQEvQBpCoAkJabC1m1S3keM6qTXHyewLbZX1O/FZeMe3wemCBBsnWfr7d3e+811SKmkxDH95/rXrN27e2rrduXP33v0H3YePRraoDBdDXqjCnCTMCiW1GKJEJU5KI1ieKHGcnH5o/MffhbGy0Ee4KsUkZ5mWqeQMHYq7P57DYkotMgPvgDKT0VzqeAEU6PmCnseD6aDRKJZo8hpsgAGsGyKUimVgYwk0E2cQvWxgWhimFEgqNYwA6FnFZkC/yWzuCphi0ZI0/vPg4ejLur/caWq3xMpMr/v9tqmd6dESXkDS3ijtxN1eGIQbg6siakWPtHYQd3/SWcGrXGjkilk7jsISJzUzKLkS6w6trCgZP2WZGDupWS7spN5Mdg3PHHH9FsYdjbChf2fULLd2lScuMmc4t5d9Dfyfb1xh+mZSS11WKDS/KJRWCrCAZk0wk0ZwVCsnGDfS9Qp8zgzj6JbZDCG6/OWrYjQIolfB7tfd3t6gHccW2SZPSZ9E5DXZI/vkgAwJ9554b72P3id/23/v7/ufL0J9r815TP4x//A3bMrHFw==</latexit>

1, 8i 2 V )
w
Quadratic Convex set SVM
function (polytope) classifier

George Dantzig
1914-2005

[1] Dantzig, Orden, Wolfe, The generalized simplex method for minimizing a linear form under linear inequality restraints, 1955
[2] Wolfe, The Simplex Method for Quadratic Programming, 1959
[3] Boyd, Vandenberghe, Convex Optimization, 2004

Xavier Bresson 15
16

Support vectors

Support vectors are the data points exactly localized on the margin hyper-planes :

`i .si = 1, 8xsv
<latexit sha1_base64="V4T0E9KR9HoBqvzqfuZLkhgwJpE=">AAADYHicjVJBb9MwFHaTAaVsrB03uDytAnUCqmSaGBekSVw4DmndJjVt5DhOa82xg+1krbL8SW477MIvwe0C2hZAPCnR8/v8fe/ly4syzrTxvOuW4248evyk/bTzbHPr+Xa3t3OqZa4IHRHJpTqPsKacCToyzHB6nimK04jTs+ji8wo/K6jSTIoTs8zoJMUzwRJGsLGlsNcq3kBAOQ8ZDEHb9yfw30EAQSIV5hwWIZuWgaELo9JSF1UFvw4w0HmWSWWgoMRIpffAgkHnjt4ABpfTQBus9qYnTam3UQ2umtbUGr6cMzKHGSuo/q36Lccx1BTLqLu8/3eP+7JYxCAF0EVmR1478DfxRGFS+lV51VC8shSdp2HZQKrg/4cKu31v6K0DmolfJ31Ux3HY/R7EkuQpFYZwrPXY9zIzKbEyjHBadYJc0wyTCzyjY5sKnFI9KdcLUsFrW4nB/lL7CAPr6l1GiVOtl2lkb6bYzPVDbFX8EzbOTfJxUjKR5YYKctsoyTkYCattg5gp6zRf2gQTxeysQObYemvsTnasCf7DT24mp/tD/8Pw4OtB/2i/tqONXqFdNEA+OkRH6As6RiNEWjeO62w6W84Pt+1uu73bq06r5rxA98J9+RNc9hE1</latexit>

i (support vectors)

`i .((w? )T xsv ?
i +b )=1 Margin planes
which gives
C+
b? = ` i (w? )T xsv
i
si = wT xi + b = +1
<latexit sha1_base64="I+j9mYg45bVnn9QiKKDho7QHubY=">AAACEHicbZDLSsNAFIYnXmu8RV26GSxWoVCSUtSNUHDjskJv0MQwmU7aoZMLMxO1lD6CG1/FjQtF3Lp059s4SbPQ1gMDH/9/DmfO78WMCmma39rS8srq2nphQ9/c2t7ZNfb22yJKOCYtHLGIdz0kCKMhaUkqGenGnKDAY6Tjja5Sv3NHuKBR2JTjmDgBGoTUpxhJJbnGSQkKl8JLeH/bfHBp2VNYtqBt6yVoE8Yyr2zprlE0K2ZWcBGsHIogr4ZrfNn9CCcBCSVmSIieZcbSmSAuKWZkqtuJIDHCIzQgPYUhCohwJtlBU3islD70I65eKGGm/p6YoECIceCpzgDJoZj3UvE/r5dI/8KZ0DBOJAnxbJGfMCgjmKYD+5QTLNlYAcKcqr9CPEQcYakyTEOw5k9ehHa1Yp1Vaje1Yr2ax1EAh+AInAILnIM6uAYN0AIYPIJn8AretCftRXvXPmatS1o+cwD+lPb5A3UCmGM=</latexit>

and on expectation
? 1 X `i = +1
b = sv `i (w? )T xsv
i
|xi | xsv
s i = w T xi + b =
<latexit sha1_base64="Xb9bjndfD92dKgr+M5yXg5Qs6Ds=">AAACEHicbZDLSsNAFIYn9VbjrerSzWCxCtKSlKJuhIIblxV6gyaGyXTaDp1cmJmoJfQR3Pgqblwo4talO9/GSZqFth4Y+Pj/czhzfjdkVEjD+NZyS8srq2v5dX1jc2t7p7C71xZBxDFp4YAFvOsiQRj1SUtSyUg35AR5LiMdd3yV+J07wgUN/KachMT20NCnA4qRVJJTOC5B4VB4Ce9vmw8OPXUVlk1oWXoJWoSx1CubulMoGhUjLbgIZgZFkFXDKXxZ/QBHHvElZkiInmmE0o4RlxQzMtWtSJAQ4TEakp5CH3lE2HF60BQeKaUPBwFXz5cwVX9PxMgTYuK5qtNDciTmvUT8z+tFcnBhx9QPI0l8PFs0iBiUAUzSgX3KCZZsogBhTtVfIR4hjrBUGSYhmPMnL0K7WjHPKrWbWrFezeLIgwNwCE6ACc5BHVyDBmgBDB7BM3gFb9qT9qK9ax+z1pyWzeyDP6V9/gB7PJhn</latexit>

i 1
C- `i = 1
Support vectors

Xavier Bresson 16
17

Dual variable

We can represent the weight vector 𝑤 as a linear combination ∝ of the training data points 𝑥" .
The coefficient vector ∝ is referred to as the dual variable of 𝑤.
The dual problem naturally introduces the linear kernel matrix 𝐾 𝑥, 𝑦 = 𝑥 # 𝑦 :
<latexit sha1_base64="LFflOdPS+iq/M1z0mQNHG+uPKJY=">AAAEWHichVNba9swFFaTdG29S9PucS+HhY2EhRCH7gqFsj5ssA66rUkLURpkRYlFZdlYci4Y/8nBHra/spfJzmV1GjaB8edzjr7vnE+WEwiudLP5c6tQLG3f29nds+4/ePhov3xw2FF+FFLWpr7wwyuHKCa4ZG3NtWBXQciI5wh26dycpvnLMQsV9+WFngWs55GR5ENOiTah/kFBPges2VSHXvyBj5mEBGACx4BV5PU5YMBEBC5JIRPCvKYp5BJ7RLuOE39Nrgf1VVEuARhbf9knDFwyZhn/9QVM/6ORlWyic1VAKItb1Ev+RfGpakjq0xqsGoAJ166RX2aOFyobaTM2kz0z1bX6ir+efuanl3U4y0diiTX3mAKZzMWzY4onLtcsifF7PqolYOXMORVEqdWxwDCSNAPvTL8Yhv1l3bfO58R0kHa4iCg+kknVODqFF+BkvQUe2LCaqNl4m860cqGa2RCE3CMCxiTkxPwrtQRjgFs+vGwsncjp5H3JK+J1jUG0pmBZVr9caTaa2YK7wF6AClqs8375Ox74NPKY1DQ1qWs3A92LSag5FSyxcKSY6fiGjFjXQEmM8b04czyBZyYygKEfmkdqyKK3d8TEU2rmOaYyPT61nkuDm3LdSA/f9GIug0gzSedCw0iA9iG9ZTDgIaNazAwgNOSmV6AuCQnV5i6mJtjrI98FnVbDftU4+nJUOWkt7NhFT9BTVEU2eo1O0Ed0jtqIFn4UfhdLxe3irxIq7ZT25qWFrcWexyi3Sod/AOxJY4A=</latexit>

X
Given w = ↵ i ` i x i 2 Rd , ↵i 2 R
i
X
T
we have w x = ↵i `i xTi x 2 R
i
X
= ↵i `i K(xi , x) with K(xi , x) = xTi x
i

T n n⇥n
= ↵ LK(x), ↵, K(x) 2 R , L 2 R
Classification function : fSVM (x) = sign(wT x + b) 2 ±1 (with primal variable)
= sign(↵T LK(x) + b) 2 ±1 (with dual variable)

Xavier Bresson 17
18

Dual optimization problem

The primal optimization problem can be solved with the dual problem[1,2,3] :

min kwk22 s.t. `i .si


<latexit sha1_base64="QcBbCbBYBn5B9hf5lyRz5eq4PyE=">AAAEPHiclVPLbtNAFHUdHiW8UliyuaKiSqVg2aFAN5UqsUFqF03Vl9RJrfFkkkw7Hrsz4yaR6w9jw0ewY8WGBQixZc2Mk/RdJEaydObc17nXd6KUM6V9/+uMW7lz99792QfVh48eP3lam3u2o5JMErpNEp7IvQgrypmg25ppTvdSSXEccbobHX2w9t0TKhVLxJYepbQd455gXUawNlQ4524uAIqZCAeAAJ0O0GnYPGharOlQyzgH5WkPCstQzkPmqZAB6tFjCBqW7CYScw4MMQE7gPoqxYTmvveGxMV5knoqWYw5tDYglYkRFy9CgVB14cyFKaDHGTvBnAoNOrElS7sVlyPM0z4e1/VLMV2JSR4UedPcSuPBFrQmEF6fc0EobupmarZNwQr4F33qneyfUgdM902O1sr62jqYvlGMdT+K8s3iIBdIs5gqEFP5ZwNZfmsnsm6KTfN0GO4Vdatg8X/TTGSXwWHQ8DyvUUJxPdMtGdZurdiAtTBnh8XKMGRmQsPw8JLrpUnZtcMSjqgU1HRR1qqGtXnf88sD10EwAfPO5GyEtS+ok5AsNn+ecKzUfuCnup1jqRnhtKiiTFEj/gj36L6BAhuh7bxc/gJeGaYDZg3NZzanZC9G5DhWahRHxtP2oK7aLHmTbT/T3eV2zkSaaSrIuFA343Y37UuCDpOUaD4yABPJjFYgfWzWUpv3ZocQXG35OthpesE7b6m1NL/anIxj1nnhvHTqTuC8d1adj86Gs+0Q95P7zf3h/qx8rnyv/Kr8Hru6M5OY586lU/nzF3g5XHM=</latexit>

1, 8i 2 V (primal QP problem)
w
is equivalent to
1
min ↵T Q↵ ↵T 1n s.t. ↵T ` = 0 (dual QP problem)
↵ 0 2

with Q = LKL 2 Rn⇥n


L = diag(`) 2 Rn⇥n
` = (`1 , ..., `n ) 2 Rn
K 2 Rn⇥n , Kij = xTi xj 2 R (linear kernel)
Leonid Kantorovich
1912-1986

[1] Kantorovich, The Mathematical Method of Production Planning and Organization, 1939
[2] Dantzig, Orden, Wolfe, The generalized simplex method for minimizing a linear form under linear inequality restraints, 1955
[3] Boyd, Vandenberghe, Convex Optimization, 2004

Xavier Bresson 18
19

Optimization algorithm

Solution α* can be computed with a simple primal-dual[1,2] iterative scheme :

Initialization : ↵k=0 = k=0


= 0n 2 Rn
<latexit sha1_base64="siVgg0Xz55XhBrvShoCErqA6YVA=">AAAEh3icfVNbT9swFA4026C7ANvjXo6GNrUrdAlDDE2qBOJlaJ0EGzcJt5HjOq3VxAmxw+hS/5T9qb3t38xpSpsAmiVLn8/l8zmfj93IZ0Ja1t+FxYr56PGTpeXq02fPX6ysrr08E2ESE3pKQj+ML1wsqM84PZVM+vQiiikOXJ+eu8ODzH9+TWPBQn4iRxHtBLjPmccIltrkrC3+fgdI0hsZB+khZ5Jhn/2a+OAzKEDYjwa4mw5blmohl8pbbDkcMY4CLAeum35XXQ4IVedcJyygICSNBAhNJ7xRxiZx4uSUOcwIkU+vAHkxJqmtUjQ+RmM0bqOxmnGBaOJmOb1VTlAbMCcs+nKeYmGHksZY0qy73H6V4N68z4atoDULPlJOikgvlKivi7Q0lcv6NagVKoHjxoxaObzeTTdtVZvxKWhAKRo2S+d2LuoQ6hPu+sOiTmqEW/2nNU5P8xsyA7RLvfz3jfYlkJDr6ehTTugG/KQwwNe0+PBISByXsw58LMRsgsBLOJmPC3jObdyPs2+qdlMvqClYn6taWqRW3RNof83CGuDmpokEUQC2vrXqrK5bTWuy4D6wp2DdmK4jZ/UP6oUkCSiXJCv00rYi2UlxLBnxqaqiRNAIkyHu00sNOQ6o6KSTf6Tgrbb0wAtjvXkmjrYWM1IcCDEKXB2ZiSru+jLjQ77LRHq7nZTxKJFa6vwiL/FBhpB9SuixmBLpjzTAJNa/kAAZYD3FelhFJoJ9t+X74Gyrae80t4+31/e2pnIsGa+NN0bNsI1Pxp7xxTgyTg1SMSvvKx8r2+ay+cHcMXfz0MWFac4ro7TM/X+rS3mq</latexit>

1 1 1
Time steps satisfy ⌧↵ ⌧  s.a. ⌧↵ = ,⌧ =
kQkkLk kQk kLk
Iterate :
↵k+1 = P· 0 (⌧↵ Q + In ) 1
(↵k + ⌧↵ Q ⌧↵ L k
) 2 Rn
k+1 k
= + ⌧ L↵k+1 2 Rn
At convergence, we have : ↵?

Narendra Karmarkar
Classification function : fSVM (x) = sign(↵? T LK(x) + b? ) 2 ±1

[1] Karmarkar, A new polynomial-time algorithm for linear programming, 1984


[2] Boyd, Vandenberghe, Convex Optimization, 2004

Xavier Bresson 19
20

Lab 1 : Linear SVM


Run code01.ipynb and analyze linear SVM result on
Linearly separable data points
Non-linear data points

Linearly separable data points

Non-linear data points


Xavier Bresson 20
21

Outline

Supervised classification

Linear SVM

Soft-margin SVM

Kernel techniques

Non-linear/kernel SVM

Graph SVM

Conclusion

Xavier Bresson 21
22

Noise

Real-world data often contains noise and outliers, which do not satisfy the assumption of
linearly separable data points.
When dealing with non-linearly separable data, there is no mathematical solution for
standard or hard-margin SVM because there does not exist a linear separator that can split
the two classes perfectly, i.e. without errors.
A new technique is necessary, refered as soft-margin SVM[1].

? Rd
<latexit sha1_base64="Yb/z2pnwgpGBQdVCOsOzQaFOQYo=">AAAB83icbVDLSsNAFL2pr1pfVZduBovgqiSlVJcFNy6r2Ac0sUwmk3boZBJmJkIJ/Q03LhRx68+482+ctFlo64GBwzn3cs8cP+FMadv+tkobm1vbO+Xdyt7+weFR9fikp+JUEtolMY/lwMeKciZoVzPN6SCRFEc+p31/epP7/ScqFYvFg54l1IvwWLCQEayN5LoR1hPfz+7nj8GoWrPr9gJonTgFqUGBzqj65QYxSSMqNOFYqaFjJ9rLsNSMcDqvuKmiCSZTPKZDQwWOqPKyReY5ujBKgMJYmic0Wqi/NzIcKTWLfDOZZ1SrXi7+5w1THV57GRNJqqkgy0NhypGOUV4ACpikRPOZIZhIZrIiMsESE21qqpgSnNUvr5Neo+606s27Zq3dKOoowxmcwyU4cAVtuIUOdIFAAs/wCm9War1Y79bHcrRkFTun8AfW5w80F5HB</latexit>

Negative label :
<latexit sha1_base64="dor5rDHQJs3ZxFvKCsIBUwgtGWU=">AAACIHicbVDJSgNBEO1xjXGLevTSGBQPGmZCMCIIAS+eJIKJQiYMPZ1KbNKz2F0TEoZ8ihd/xYsHRfSmX2NnObg9KHi8V0VVPT+WQqNtf1gzs3PzC4uZpezyyuraem5js66jRHGo8UhG6sZnGqQIoYYCJdzECljgS7j2u2cj/7oHSosovMJBDM2AdULRFpyhkbxceY+6CH1UQXoBHSP2gErmg6QndEhdN2v8u4S1aN8TBy5I6YnTQ4dmvVzeLthj0L/EmZI8maLq5d7dVsSTAELkkmndcOwYmylTKLiEYdZNNMSMd1kHGoaGLADdTMcPDumuUVq0HSlTIdKx+n0iZYHWg8A3nQHDW/3bG4n/eY0E28fNVIRxghDyyaJ2IilGdJQWbQkFHOXAEMaVMLdSfssU42gyHYXg/H75L6kXC85RoXRZyleK0zgyZJvskH3ikDKpkHNSJTXCyT15JM/kxXqwnqxX623SOmNNZ7bID1ifX6uloLY=</latexit>

Positive label :
<latexit sha1_base64="dtIfBgzPMaR9DA2BFBqHb26yVPU=">AAACIHicbZBLSwMxFIUzvq2vqks3waIISpkpxYogCG5cVrC10ClDJnOnDWYeJneKZehPceNfceNCEd3przGtXfg6EPg4517CPX4qhUbbfrempmdm5+YXFgtLyyura8X1jaZOMsWhwROZqJbPNEgRQwMFSmilCljkS7jyr89G+VUflBZJfImDFDoR68YiFJyhsbxibZe6CLeooryeaIGiD1QyHyQ9pkPqugWT32QsoLeeOHBBSk+c7Du04BVLdtkei/4FZwIlMlHdK765QcKzCGLkkmndduwUOzlTKLiEYcHNNKSMX7MutA3GLALdyccHDumOcQIaJsq8GOnY/b6Rs0jrQeSbyYhhT//ORuZ/WTvD8KiTizjNEGL+9VGYSYoJHbVFA6GAoxwYYFyZejjlPaYYR9PpqATn98l/oVkpO4fl6kW1dFqZ1LFAtsg22SMOqZFTck7qpEE4uSMP5Ik8W/fWo/VivX6NTlmTnU3yQ9bHJ95doNQ=</latexit>

xi , `i = +1 x i , `i = 1
C+
Classification function :
<latexit sha1_base64="NkE77X1zamHj+UaAtCddTCFZRZQ=">AAACL3icbVDLSgNBEJz1bXxFPXppDIqCht0QVARBCIhHBWOEbAizk1kdMju7zvRKwpI/8uKveBFRxKt/4eSB+CpoKKq66e4KEikMuu6zMzY+MTk1PTObm5tfWFzKL69cmjjVjFdZLGN9FVDDpVC8igIlv0o0p1EgeS1oV/p+7Y5rI2J1gd2ENyJ6rUQoGEUrNfMnm+Aj76COsoqkxnxZEKaKDcgh9MD3IWc7b1PagnCrs3206+2ADx1fKKg0d6GZL7hFdwD4S7wRKZARzpr5R78VszTiCll/b91zE2xkVKNgkvdyfmp4QlmbXvO6pYpG3DSywb892LCKvSPWthTCQP0+kdHImG4U2M6I4o357fXF/7x6iuFBIxMqSZErNlwUphIwhn540BKaM5RdSyjTwt4K7IZqytBGnLMheL9f/ksuS0Vvr1g+LxeOS6M4ZsgaWSdbxCP75JickjNSJYzck0fyQl6dB+fJeXPeh61jzmhmlfyA8/EJVzOmBQ==</latexit>

Classification function :
<latexit sha1_base64="EJWUjaSzWZgHb2oUtPBf7AJ4psE=">AAACL3icbVBNS8NAEN34bf2qevQyWBRFKUkpKoIgFMSjgtVCU8pmu9Glm03cnUhL6D/y4l/xIqKIV/+F21pEqw8GHu/NMDMvSKQw6LrPztj4xOTU9Mxsbm5+YXEpv7xyaeJUM15lsYx1LaCGS6F4FQVKXks0p1Eg+VXQrvT9qzuujYjVBXYT3ojotRKhYBSt1MyfbIKPvIM6yiqSGvNtQZgqNiCH0APfh5ztvE1pC8KtzvbRjrcLPnR8oaDS3IFmvuAW3QHgL/GGpECGOGvmH/1WzNKIK2T9vXXPTbCRUY2CSd7L+anhCWVtes3rlioacdPIBv/2YMMq9o5Y21IIA/XnREYjY7pRYDsjijdm1OuL/3n1FMODRiZUkiJX7GtRmErAGPrhQUtozlB2LaFMC3srsBuqKUMbcc6G4I2+/JdcloreXrF8Xi4cl4ZxzJA1sk62iEf2yTE5JWekShi5J4/khbw6D86T8+a8f7WOOcOZVfILzscnUQWmAQ==</latexit>

f (x) = +1, x 2 C+ f (x) = 1, x 2 C


C-

Errors or
outliers
[1] Cortes, Vapnik, Support-vector networks, 1995

Xavier Bresson 22
23

Soft-margin SVM

Slack variables 𝑒" quantifies the error for each data 𝑥" to be an outlier.
These errors 𝑒" will be minimized while simultaneously maximizing the margin :

wT xi + b +1
<latexit sha1_base64="Nz119qOa+zjZrJoHKYqJ1nYMSic=">AAAEF3icpVNNb9NAEHUdPor5aApHLiMiqqKQyA4VcEGqVA5ckIrapJWyibVer5NVd9dhd40buf4XXPgrXDiAEFe48W9YJxGkH4AEc3qamTfz5tkbTTjTxve/r7i1S5evXF295l2/cfPWWn39dk+nmSK0S1KeqsMIa8qZpF3DDKeHE0WxiDg9iI52qvrBG6o0S+W+mU7oQOCRZAkj2NhUuO62NgAJJsMcATrJ0cmwE3bAYkOPjRIF6LZpQwmI08SgwkMRHTFZYKXwtCw4L718uA/HIYMmRIBG9DU0A9j4xU9SZelVA2ISdsImQqcp3FJaf6S0LAVRGS+2ekix0diq+knY3DNYxljFsNd7+aBECDw7bqwnmNDiERFW/vM0l5ae5mCHLU4u8oe0XD67adVY62IMSGciLNizoBxKoJWS/3SkNZvyD740/0as3JkprDb5v+/swcU2LrmYJqYlsLLnzI20DABvOcJ6w2/7s4DzIFiAhrOI3bD+DcUpyQSVhnCsdT/wJ2ZgJRhGOLUiMk3tZzrCI9q3UGJB9aCY/dcl3LeZeHZGkkoDs+wyo8BC66mIbKfAZqzP1qrkRbV+ZpKng4LJSWaoJPNFScbBpFA9EoiZosTwqQWYKGa1AhljhYmxT6kyITh78nnQ67SDx+2tV1uN7c7CjlXnrnPP2XQC54mz7bxwdp2uQ9y37nv3o/up9q72ofa59mXe6q4sOHecU1H7+gOnVETs</latexit>

for xi 2 C+
min kwk22 s.t. (Standard SVM)
w w T xi + b  1 for xi 2 C
+
8 T
n
X < w xi + b +1 ei for xi 2 C+
min kwk22 + ei s.t. w T xi + b  1 + ei for xi 2 C (Soft-margin SVM)
w,e :
i=1 ei 0 for xi 2 V
Trade-off between large
margin and small errors
Margin x i , ei = 0
<latexit sha1_base64="bK8McQi0J5tnfLhk2Nk6gweUrUY=">AAAB8HicbVBNSwMxEJ31s9avqkcvwSJ4KGVXinoRCl48VrAf0i5LNs22oUl2SbJiWforvHhQxKs/x5v/xrTdg7Y+GHi8N8PMvDDhTBvX/XZWVtfWNzYLW8Xtnd29/dLBYUvHqSK0SWIeq06INeVM0qZhhtNOoigWIaftcHQz9duPVGkWy3szTqgv8ECyiBFsrPTwFLAKDdi1G5TKbtWdAS0TLydlyNEISl+9fkxSQaUhHGvd9dzE+BlWhhFOJ8VeqmmCyQgPaNdSiQXVfjY7eIJOrdJHUaxsSYNm6u+JDAutxyK0nQKboV70puJ/Xjc10ZWfMZmkhkoyXxSlHJkYTb9HfaYoMXxsCSaK2VsRGWKFibEZFW0I3uLLy6R1XvUuqrW7WrleyeMowDGcwBl4cAl1uIUGNIGAgGd4hTdHOS/Ou/Mxb11x8pkj+APn8wf2z4/N</latexit>

<latexit sha1_base64="84b001QnW8g9Egp0fO7ziLRMp/I=">AAAB6nicbVBNS8NAEJ34WetX1aOXxSJ4kJJIUY8FLx4r2g9oQ9lsJ+3SzSbsbsQS+hO8eFDEq7/Im//GbZuDtj4YeLw3w8y8IBFcG9f9dlZW19Y3Ngtbxe2d3b390sFhU8epYthgsYhVO6AaBZfYMNwIbCcKaRQIbAWjm6nfekSleSwfzDhBP6IDyUPOqLHS/VOP90plt+LOQJaJl5My5Kj3Sl/dfszSCKVhgmrd8dzE+BlVhjOBk2I31ZhQNqID7FgqaYTaz2anTsipVfokjJUtachM/T2R0UjrcRTYzoiaoV70puJ/Xic14bWfcZmkBiWbLwpTQUxMpn+TPlfIjBhbQpni9lbChlRRZmw6RRuCt/jyMmleVLzLSvWuWq6d53EU4BhO4Aw8uIIa3EIdGsBgAM/wCm+OcF6cd+dj3rri5DNH8AfO5w9c4I3L</latexit>

xi <latexit sha1_base64="84b001QnW8g9Egp0fO7ziLRMp/I=">AAAB6nicbVBNS8NAEJ34WetX1aOXxSJ4kJJIUY8FLx4r2g9oQ9lsJ+3SzSbsbsQS+hO8eFDEq7/Im//GbZuDtj4YeLw3w8y8IBFcG9f9dlZW19Y3Ngtbxe2d3b390sFhU8epYthgsYhVO6AaBZfYMNwIbCcKaRQIbAWjm6nfekSleSwfzDhBP6IDyUPOqLHS/VOP90plt+LOQJaJl5My5Kj3Sl/dfszSCKVhgmrd8dzE+BlVhjOBk2I31ZhQNqID7FgqaYTaz2anTsipVfokjJUtachM/T2R0UjrcRTYzoiaoV70puJ/Xic14bWfcZmkBiWbLwpTQUxMpn+TPlfIjBhbQpni9lbChlRRZmw6RRuCt/jyMmleVLzLSvWuWq6d53EU4BhO4Aw8uIIa3EIdGsBgAM/wCm+OcF6cd+dj3rri5DNH8AfO5w9c4I3L</latexit>

xi
<latexit sha1_base64="GfWHQeR4V3BH9JsrtpfJ7OGO/EE=">AAAB6nicbVBNS8NAEJ3Ur1q/qh69LBbBg5REinosePFY0X5AG8pmO2mXbjZhdyOU0J/gxYMiXv1F3vw3btsctPXBwOO9GWbmBYng2rjut1NYW9/Y3Cpul3Z29/YPyodHLR2nimGTxSJWnYBqFFxi03AjsJMopFEgsB2Mb2d++wmV5rF8NJME/YgOJQ85o8ZKD9jn/XLFrbpzkFXi5aQCORr98ldvELM0QmmYoFp3PTcxfkaV4UzgtNRLNSaUjekQu5ZKGqH2s/mpU3JmlQEJY2VLGjJXf09kNNJ6EgW2M6JmpJe9mfif101NeONnXCapQckWi8JUEBOT2d9kwBUyIyaWUKa4vZWwEVWUGZtOyYbgLb+8SlqXVe+qWruvVeoXeRxFOIFTOAcPrqEOd9CAJjAYwjO8wpsjnBfn3flYtBacfOYY/sD5/AE/7o24</latexit>

ei
C- <latexit sha1_base64="GfWHQeR4V3BH9JsrtpfJ7OGO/EE=">AAAB6nicbVBNS8NAEJ3Ur1q/qh69LBbBg5REinosePFY0X5AG8pmO2mXbjZhdyOU0J/gxYMiXv1F3vw3btsctPXBwOO9GWbmBYng2rjut1NYW9/Y3Cpul3Z29/YPyodHLR2nimGTxSJWnYBqFFxi03AjsJMopFEgsB2Mb2d++wmV5rF8NJME/YgOJQ85o8ZKD9jn/XLFrbpzkFXi5aQCORr98ldvELM0QmmYoFp3PTcxfkaV4UzgtNRLNSaUjekQu5ZKGqH2s/mpU3JmlQEJY2VLGjJXf09kNNJ6EgW2M6JmpJe9mfif101NeONnXCapQckWi8JUEBOT2d9kwBUyIyaWUKa4vZWwEVWUGZtOyYbgLb+8SlqXVe+qWruvVeoXeRxFOIFTOAcPrqEOd9CAJjAYwjO8wpsjnBfn3flYtBacfOYY/sD5/AE/7o24</latexit>

ei C+
<latexit sha1_base64="GfWHQeR4V3BH9JsrtpfJ7OGO/EE=">AAAB6nicbVBNS8NAEJ3Ur1q/qh69LBbBg5REinosePFY0X5AG8pmO2mXbjZhdyOU0J/gxYMiXv1F3vw3btsctPXBwOO9GWbmBYng2rjut1NYW9/Y3Cpul3Z29/YPyodHLR2nimGTxSJWnYBqFFxi03AjsJMopFEgsB2Mb2d++wmV5rF8NJME/YgOJQ85o8ZKD9jn/XLFrbpzkFXi5aQCORr98ldvELM0QmmYoFp3PTcxfkaV4UzgtNRLNSaUjekQu5ZKGqH2s/mpU3JmlQEJY2VLGjJXf09kNNJ6EgW2M6JmpJe9mfif101NeONnXCapQckWi8JUEBOT2d9kwBUyIyaWUKa4vZWwEVWUGZtOyYbgLb+8SlqXVe+qWruvVeoXeRxFOIFTOAcPrqEOd9CAJjAYwjO8wpsjnBfn3flYtBacfOYY/sD5/AE/7o24</latexit>

ei
x i , ei = 0
<latexit sha1_base64="bK8McQi0J5tnfLhk2Nk6gweUrUY=">AAAB8HicbVBNSwMxEJ31s9avqkcvwSJ4KGVXinoRCl48VrAf0i5LNs22oUl2SbJiWforvHhQxKs/x5v/xrTdg7Y+GHi8N8PMvDDhTBvX/XZWVtfWNzYLW8Xtnd29/dLBYUvHqSK0SWIeq06INeVM0qZhhtNOoigWIaftcHQz9duPVGkWy3szTqgv8ECyiBFsrPTwFLAKDdi1G5TKbtWdAS0TLydlyNEISl+9fkxSQaUhHGvd9dzE+BlWhhFOJ8VeqmmCyQgPaNdSiQXVfjY7eIJOrdJHUaxsSYNm6u+JDAutxyK0nQKboV70puJ/Xjc10ZWfMZmkhkoyXxSlHJkYTb9HfaYoMXxsCSaK2VsRGWKFibEZFW0I3uLLy6R1XvUuqrW7WrleyeMowDGcwBl4cAl1uIUGNIGAgGd4hTdHOS/Ou/Mxb11x8pkj+APn8wf2z4/N</latexit>

Rd
<latexit sha1_base64="Yb/z2pnwgpGBQdVCOsOzQaFOQYo=">AAAB83icbVDLSsNAFL2pr1pfVZduBovgqiSlVJcFNy6r2Ac0sUwmk3boZBJmJkIJ/Q03LhRx68+482+ctFlo64GBwzn3cs8cP+FMadv+tkobm1vbO+Xdyt7+weFR9fikp+JUEtolMY/lwMeKciZoVzPN6SCRFEc+p31/epP7/ScqFYvFg54l1IvwWLCQEayN5LoR1hPfz+7nj8GoWrPr9gJonTgFqUGBzqj65QYxSSMqNOFYqaFjJ9rLsNSMcDqvuKmiCSZTPKZDQwWOqPKyReY5ujBKgMJYmic0Wqi/NzIcKTWLfDOZZ1SrXi7+5w1THV57GRNJqqkgy0NhypGOUV4ACpikRPOZIZhIZrIiMsESE21qqpgSnNUvr5Neo+606s27Zq3dKOoowxmcwyU4cAVtuIUOdIFAAs/wCm9War1Y79bHcrRkFTun8AfW5w80F5HB</latexit>

<latexit sha1_base64="84b001QnW8g9Egp0fO7ziLRMp/I=">AAAB6nicbVBNS8NAEJ34WetX1aOXxSJ4kJJIUY8FLx4r2g9oQ9lsJ+3SzSbsbsQS+hO8eFDEq7/Im//GbZuDtj4YeLw3w8y8IBFcG9f9dlZW19Y3Ngtbxe2d3b390sFhU8epYthgsYhVO6AaBZfYMNwIbCcKaRQIbAWjm6nfekSleSwfzDhBP6IDyUPOqLHS/VOP90plt+LOQJaJl5My5Kj3Sl/dfszSCKVhgmrd8dzE+BlVhjOBk2I31ZhQNqID7FgqaYTaz2anTsipVfokjJUtachM/T2R0UjrcRTYzoiaoV70puJ/Xic14bWfcZmkBiWbLwpTQUxMpn+TPlfIjBhbQpni9lbChlRRZmw6RRuCt/jyMmleVLzLSvWuWq6d53EU4BhO4Aw8uIIa3EIdGsBgAM/wCm+OcF6cd+dj3rri5DNH8AfO5w9c4I3L</latexit>

xi

Xavier Bresson 23
24

Regularization

What is the effect of varying λ, the regularization parameter?


For small λ values, more misclassification errors are allowed, the margin is larger.
For large λ values, misclassification errors are penalized, leading to either no errors or very
few, resulting in a smaller margin.

Margin Margin Margin

C- C+ C- C+ C- C+

Small λ value Intermediate λ value Large λ value

Xavier Bresson 24
25

Hinge loss

The soft-margin SVM technique penalizes :


Misclassifications of training data points.
Correct classifications of training points that fall inside the margin area.
The constrained optimization problem can be reformulated as an unconstrained problem :
<latexit sha1_base64="8h5NQPXz82F6uiUOhj7c0Ql+hR8=">AAACFnicbVA9SwNBEN3zM8avqKXNYjBEMOFORG2EgE0KCwVjArkQ9vY2yZLdvXN3ThKO/Aob/4qNhSK2Yue/cRNT+PVg4PHeDDPzglhwA6774czMzs0vLGaWsssrq2vruY3NaxMlmrIajUSkGwExTHDFasBBsEasGZGBYPWgfzb267dMGx6pKxjGrCVJV/EOpwSs1M6VCuft1Ac2AC3TKlejES6Ge9j3swX/JiEhPsW+JIOiu++Vwr12Lu+W3QnwX+JNSR5NcdHOvfthRBPJFFBBjGl6bgytlGjgVLBR1k8Miwntky5rWqqIZKaVTt4a4V2rhLgTaVsK8ET9PpESacxQBrZTEuiZ395Y/M9rJtA5aaVcxQkwRb8WdRKBIcLjjHDINaMghpYQqrm9FdMe0YSCTTJrQ/B+v/yXXB+UvaPy4eVhvnIwjSODttEOKiIPHaMKqqILVEMU3aEH9ISenXvn0XlxXr9aZ5zpzBb6AeftExTynLg=</latexit>

<latexit sha1_base64="ZsIhOjm9b+ZaUYIqbeUVo9KWe0M=">AAAEKniclVNLT9tAEDZOHzR9ENpjL6OioqCQyE7p4xIJiQuHHqhEAIkN1nq9dlas18a7boiMf08v/Su9cGiFeu0P6Ti4JUCrqnMaz3zfzDczXj+VQhvHuViwG3fu3ru/+KD58NHjJ0ut5ad7OskzxocskUl24FPNpVB8aISR/CDNOI19yff9460qv/+RZ1okatdMUz6KaaREKBg1GPKW7c1VILFQXjFZ5yUBcjYhZ0d9rw8dIBLrBBSIzmOvEAO3PFLAPQEIM/zUZHEBumd6UCKUh4YUTeLzSKiCZhmdloWUZXNytAunyOmADyTiJ9BxoTursnpVJUwyLFLBiFCw5XUIuU6USOy66P+D2EXiTGHVyfk7cg8QSLgKaqlNkolojKMQAtBE2linlPHiFYtxuGEaJBOFyGRS8f5zYySmp0B8EbWddbdLuJQoQldCMLa2Xlc8yWnwW+1kzDNerbXCDa5t4mqitmYJosJcseqWaxWezGl3e68r9e+94hdlW6iyhHbgiTUY3NAVzAkC/BjMC51rijUiDjLRum7otVacnjMzuO24tbNi1bbjtc5JkLA85sowSbU+dJ3UjPAQRjDJ8RS55jjAMY34IbqKxlyPitmvXsJLjASzY4aJMjCLzjMKGms9jX1ExtSM9c1cFfxT7jA34btRIVSaG67YZaMwl2ASqN4NBCLjzMgpOpRlArUCG9OMMoOvq1qCe3Pk285ev+e+6W182FjZ7NfrWLSeWy+stuVab61Na9vasYYWsz/ZX+yv9rfG58Z546Lx/RJqL9ScZ9Y1a/z4CUNRTqI=</latexit>

8 T LHin (d)
n
X < w xi + b +1 ei for xi 2 C+ = max(0, 1 d)
min kwk22 + ei s.t. w T xi + b  1 + ei for xi 2 C
w,e :
i=1 ei 0 for xi 2 V
m
1
<latexit sha1_base64="vE/iYxHLbdiJOoPF/qCGrkfIPuY=">AAAB6HicbVBNS8NAEJ34WetX1aOXxSJ4Kkkp6rHgxWML9gPaUDbbSbt2swm7G6GE/gIvHhTx6k/y5r9x2+agrQ8GHu/NMDMvSATXxnW/nY3Nre2d3cJecf/g8Oi4dHLa1nGqGLZYLGLVDahGwSW2DDcCu4lCGgUCO8Hkbu53nlBpHssHM03Qj+hI8pAzaqzU9AalsltxFyDrxMtJGXI0BqWv/jBmaYTSMEG17nluYvyMKsOZwFmxn2pMKJvQEfYslTRC7WeLQ2fk0ipDEsbKljRkof6eyGik9TQKbGdEzVivenPxP6+XmvDWz7hMUoOSLReFqSAmJvOvyZArZEZMLaFMcXsrYWOqKDM2m6INwVt9eZ20qxXvulJr1sr1ah5HAc7hAq7Agxuowz00oAUMEJ7hFd6cR+fFeXc+lq0bTj5zBn/gfP4AeLuMrg==</latexit>

n
X
min kwk22 + max 0, 1 `i si , <latexit sha1_base64="evPSnb1RgAlrah9WUc3ZCpVAhM8=">AAAB6HicbVBNS8NAEJ34WetX1aOXxSJ4Kkkp6rHgxWML9gPaUDababt2swm7G6GE/gIvHhTx6k/y5r9x2+agrQ8GHu/NMDMvSATXxnW/nY3Nre2d3cJecf/g8Oi4dHLa1nGqGLZYLGLVDahGwSW2DDcCu4lCGgUCO8Hkbu53nlBpHssHM03Qj+hI8iFn1FipGQ5KZbfiLkDWiZeTMuRoDEpf/TBmaYTSMEG17nluYvyMKsOZwFmxn2pMKJvQEfYslTRC7WeLQ2fk0iohGcbKljRkof6eyGik9TQKbGdEzVivenPxP6+XmuGtn3GZpAYlWy4apoKYmMy/JiFXyIyYWkKZ4vZWwsZUUWZsNkUbgrf68jppVyvedaXWrJXr1TyOApzDBVyBBzdQh3toQAsYIDzDK7w5j86L8+58LFs3nHzmDP7A+fwBxgeM4Q==</latexit>

d
w,e
i=1 d<1 d 1
<latexit sha1_base64="XZjEwAp1aqpwE1jLMTVxPIGJR3M=">AAACKXicbVDLSgMxFM34tr6qLt0Ei1I3ZUZEXYgU3LhQULBW6JSSydxqMJMZkjtiGeZ33PgrbhQUdeuPmD4EbT0QOJxzH7knSKQw6Lofztj4xOTU9MxsYW5+YXGpuLxyaeJUc6jxWMb6KmAGpFBQQ4ESrhINLAok1IPbo65fvwNtRKwusJNAM2LXSrQFZ2ilVrG6ScMD6lHfL2zSk1bmI9yjjrJjofKclsMtekjdvvtjnQrDJTPGToEwbxVLbsXtgY4Sb0BKZICzVvHFD2OeRqCwN6bhuQk2M6ZRcAl5wU8NJIzfsmtoWKpYBKaZ9S7N6YZVQtqOtX0KaU/93ZGxyJhOFNjKiOGNGfa64n9eI8X2fjMTKkkRFO8vaqeSYky7sdFQaOAoO5YwroX9K+U3TDOONtyCDcEbPnmUXG5XvN3KzvlOqbo9iGOGrJF1UiYe2SNVckzOSI1w8kCeyCt5cx6dZ+fd+eyXjjmDnlXyB87XN/ktpCA=</latexit> <latexit sha1_base64="c01dNhC1X5YrvbWRKxN0r3LFqLc=">AAACM3icbVDLSsNAFJ34rPVVdelmsCi6KYmIuhEEN0VcKFgVmlImk9s6OJnEmRuxhPyTG3/EhSAuFHHrPzhtI/g6MHA45z7mniCRwqDrPjkjo2PjE5OlqfL0zOzcfGVh8czEqebQ4LGM9UXADEihoIECJVwkGlgUSDgPrg76/vkNaCNidYq9BFoR6yrREZyhldqVwzUa+l24ph71/fIaPWpnPsIt6iirC5XndD3coHvUHbpf1kGsNXCUPcolM8bOgzBvV6puzR2A/iVeQaqkwHG78uCHMU8jUDgY0/TcBFsZ0yi4hLzspwYSxq9YF5qWKhaBaWWDm3O6apWQdmJtn0I6UL93ZCwyphcFtjJieGl+e33xP6+ZYme3lQmVpAiKDxd1Ukkxpv0AaSiK00PBuBb2r5RfMs042pjLNgTv98l/ydlmzduubZ1sVfc3izhKZJmskHXikR2yT+rkmDQIJ3fkkbyQV+feeXbenPdh6YhT9CyRH3A+PgGItqiO</latexit>

T
where si = w xi + b (score function) LHin (d) > 0 LHin (d) = 0
LHin (di ) = max 0, 1 di , di = `i si (Hinge loss) Misclassified Correctly classified

Xavier Bresson 25
26

Loss functions

There exist multiple loss functions[1] :

<latexit sha1_base64="/PM7JG0a6LOEOs+Nbi8yWNBctvM=">AAAEzHicrVNbaxNBFN42Uet6S/XRl8FgSdGG3bXeoEJRChVCqWDaQiaG2cnZzdDZizOzNmE6r/5A33z1lzibJpqmCQo6MHA437l83zkzYc6ZVJ73fWW1Ur12/cbaTffW7Tt379XW7x/JrBAU2jTjmTgJiQTOUmgrpjic5AJIEnI4Dk/flfjxFxCSZelHNcqhm5A4ZRGjRFlXb331xwZq9YJGv8c232AOkcLaxSHELNVECDIymnPjNvytMuJTgDYQVjBUItGIRcgg697xMXa9WSRTAxBnTAIyLoa0PynlYsHigWoiPJA5oaADmpjfWY1WgHgm5SYytmDJawrtHZi/Z/gEWVQRdD72nP8fxjMs9ziRilGUglpCd78I/8gXR4JQ7RsdGLRVclrE06Z/Rp6tPhO9fBXezj8twy+XMaPTqgCxRGEri6cKYZhPKCH8q9az5uu51WYxG09tWq9Exs9Xnw2YAqPxWxZvGuTOz5Kl004JGTa8pwt6Pb/cy6bEMEO8V6t7TW980FXDnxh1Z3IOe7VvuJ/RIoFUUbts2fG9XHXt3KwADnZyhQTb+pTE0LFmShKQXT1WY9Bj6+mjKBP2pgqNvbMZmiRSjpLQRiZEDeQ8VjoXYZ1CRa+6mqV5oSClF42igiOVofJnoz4TQBUfWYNQwcph0wGx70bZ/18OwZ+XfNU4Cpr+i+b2h+36bjAZx5rz0HnkNBzfeensOvvOodN2aOV9JasMK6PqQVVVddVchK6uTHIeOJdO9etP2XOMQw==</latexit>

⇢ 2 LHub
<latexit sha1_base64="wzN/pflnLMYac9u0PLAng0oN3H4=">AAAB+nicbVBNS8NAEN34WeNXqkcvwSJ4Kkkp6rHgpQcPFewHtCFstpt26e4m7E7UEvtTvHhQxKu/xJv/xqTNQVsfDDzem2FmXhBzpsFxvo219Y3Nre3Sjrm7t39waJWPOjpKFKFtEvFI9QKsKWeStoEBp71YUSwCTrvB5Dr3u/dUaRbJO5jG1BN4JFnICIZM8q3yjT8A+ghKpM0kmJmm6VsVp+rMYa8StyAVVKDlW1+DYUQSQSUQjrXuu04MXooVMMLpzBwkmsaYTPCI9jMqsaDaS+enz+yzTBnaYaSykmDP1d8TKRZaT0WQdQoMY73s5eJ/Xj+B8MpLmYwToJIsFoUJtyGy8xzsIVOUAJ9mBBPFslttMsYKE8jSykNwl19eJZ1a1b2o1m/rlUatiKOETtApOkcuukQN1EQt1EYEPaBn9IrejCfjxXg3Phata0Yxc4z+wPj8ARNUky4=</latexit>

(1 di ) if di < 1
L2 (di ) = (L2 loss) LEN
<latexit sha1_base64="xCDgDfZC00dbd+yGFAoXoMqgOIo=">AAAB+XicbVBNS8NAEN3Urxq/oh69BIvgqSSlqMeCCB5EKtgPaEPYbDft0t1N2J0US+g/8eJBEa/+E2/+G5M2B219MPB4b4aZeUHMmQbH+TZKa+sbm1vlbXNnd2//wDo8ausoUYS2SMQj1Q2wppxJ2gIGnHZjRbEIOO0E4+vc70yo0iySjzCNqSfwULKQEQyZ5FvWnd8H+gRKpDf3M9M0faviVJ057FXiFqSCCjR966s/iEgiqATCsdY914nBS7ECRjidmf1E0xiTMR7SXkYlFlR76fzymX2WKQM7jFRWEuy5+nsixULrqQiyToFhpJe9XPzP6yUQXnkpk3ECVJLFojDhNkR2HoM9YIoS4NOMYKJYdqtNRlhhAllYeQju8surpF2ruhfV+kO90qgVcZTRCTpF58hFl6iBblETtRBBE/SMXtGbkRovxrvxsWgtGcXMMfoD4/MHEt6SmA==</latexit>

LLog
<latexit sha1_base64="5xvl7ZR3r/CWPL+I/UjKWRQUZPU=">AAAB+nicbVDLSsNAFJ34rPGV6tLNYBFclaQUdVlw46KLCvYBbQiT6aQdOnkwc6OW2E9x40IRt36JO//GSZuFth64cDjnXu69x08EV2Db38ba+sbm1nZpx9zd2z84tMpHHRWnkrI2jUUsez5RTPCItYGDYL1EMhL6gnX9yXXud++ZVDyO7mCaMDcko4gHnBLQkmeVm94A2CPIMGvGo5lpmp5Vsav2HHiVOAWpoAItz/oaDGOahiwCKohSfcdOwM2IBE4Fm5mDVLGE0AkZsb6mEQmZcrP56TN8ppUhDmKpKwI8V39PZCRUahr6ujMkMFbLXi7+5/VTCK7cjEdJCiyii0VBKjDEOM8BD7lkFMRUE0Il17diOiaSUNBp5SE4yy+vkk6t6lxU67f1SqNWxFFCJ+gUnSMHXaIGukEt1EYUPaBn9IrejCfjxXg3Phata0Yxc4z+wPj8ARfukzE=</latexit>

0 otherwise
⇢ L2
<latexit sha1_base64="ZdzEFk4yCkTbHKDjB83jkUDpO3o=">AAAB7HicbVBNS8NAEJ3Urxq/qh69LBbBU0lKUY8FLx48VDBtoQ1ls920SzebsLsRSuhv8OJBEa/+IG/+GzdpDtr6YODx3gwz84KEM6Ud59uqbGxube9Ud+29/YPDo9rxSVfFqSTUIzGPZT/AinImqKeZ5rSfSIqjgNNeMLvN/d4TlYrF4lHPE+pHeCJYyAjWRvLuR03bHtXqTsMpgNaJW5I6lOiMal/DcUzSiApNOFZq4DqJ9jMsNSOcLuxhqmiCyQxP6MBQgSOq/Kw4doEujDJGYSxNCY0K9fdEhiOl5lFgOiOsp2rVy8X/vEGqwxs/YyJJNRVkuShMOdIxyj9HYyYp0XxuCCaSmVsRmWKJiTb55CG4qy+vk26z4V41Wg+tertZxlGFMziHS3DhGtpwBx3wgACDZ3iFN0tYL9a79bFsrVjlzCn8gfX5AzIwjZY=</latexit>

(1 di )2 + |1 di | if di < 1
LEN (di ) = (Elastic net loss) LHin
<latexit sha1_base64="+d08MvmU3wlmy9mTFDKbXegOPqM=">AAAB+nicbVBNS8NAEN34WeNXqkcvwSJ4Kkkp6rHgpQcPFewHtCFstpt26WYTdidqif0pXjwo4tVf4s1/46bNQVsfDDzem2FmXpBwpsBxvo219Y3Nre3Sjrm7t39waJWPOipOJaFtEvNY9gKsKGeCtoEBp71EUhwFnHaDyXXud++pVCwWdzBNqBfhkWAhIxi05FvlG38A9BFklDWZmJmm6VsVp+rMYa8StyAVVKDlW1+DYUzSiAogHCvVd50EvAxLYITTmTlIFU0wmeAR7WsqcESVl81Pn9lnWhnaYSx1CbDn6u+JDEdKTaNAd0YYxmrZy8X/vH4K4ZWXMZGkQAVZLApTbkNs5znYQyYpAT7VBBPJ9K02GWOJCei08hDc5ZdXSadWdS+q9dt6pVEr4iihE3SKzpGLLlEDNVELtRFBD+gZvaI348l4Md6Nj0XrmlHMHKM/MD5/ABNIky4=</latexit>

0 otherwise
8 1
< 2 di if di  0
1 2
LHub (di ) = (1 di ) if 0 < di < 1 (Huber loss)
: 2
0 otherwise <latexit sha1_base64="evPSnb1RgAlrah9WUc3ZCpVAhM8=">AAAB6HicbVBNS8NAEJ34WetX1aOXxSJ4Kkkp6rHgxWML9gPaUDababt2swm7G6GE/gIvHhTx6k/y5r9x2+agrQ8GHu/NMDMvSATXxnW/nY3Nre2d3cJecf/g8Oi4dHLa1nGqGLZYLGLVDahGwSW2DDcCu4lCGgUCO8Hkbu53nlBpHssHM03Qj+hI8iFn1FipGQ5KZbfiLkDWiZeTMuRoDEpf/TBmaYTSMEG17nluYvyMKsOZwFmxn2pMKJvQEfYslTRC7WeLQ2fk0iohGcbKljRkof6eyGik9TQKbGdEzVivenPxP6+XmuGtn3GZpAYlWy4apoKYmMy/JiFXyIyYWkKZ4vZWwsZUUWZsNkUbgrf68jppVyvedaXWrJXr1TyOApzDBVyBBzdQh3toQAsYIDzDK7w5j86L8+58LFs3nHzmDP7A+fwBxgeM4Q==</latexit>

d

1
<latexit sha1_base64="vE/iYxHLbdiJOoPF/qCGrkfIPuY=">AAAB6HicbVBNS8NAEJ34WetX1aOXxSJ4Kkkp6rHgxWML9gPaUDbbSbt2swm7G6GE/gIvHhTx6k/y5r9x2+agrQ8GHu/NMDMvSATXxnW/nY3Nre2d3cJecf/g8Oi4dHLa1nGqGLZYLGLVDahGwSW2DDcCu4lCGgUCO8Hkbu53nlBpHssHM03Qj+hI8pAzaqzU9AalsltxFyDrxMtJGXI0BqWv/jBmaYTSMEG17nluYvyMKsOZwFmxn2pMKJvQEfYslTRC7WeLQ2fk0ipDEsbKljRkof6eyGik9TQKbGdEzVivenPxP6+XmvDWz7hMUoOSLReFqSAmJvOvyZArZEZMLaFMcXsrYWOqKDM2m6INwVt9eZ20qxXvulJr1sr1ah5HAc7hAq7Agxuowz00oAUMEJ7hFd6cR+fFeXc+lq0bTj5zBn/gfP4AeLuMrg==</latexit>

LLog (di ) = exp(1 di ) (Logistic loss)


d<1 d 1
<latexit sha1_base64="Cgrr5dO2mWNu4fTUnqwXSU+m/BE=">AAACDnicbVC7SgNBFJ31GdfXqqXNYEiwCrsS1MIiYGMjRDAqZEOYnb1Jhsw+mLkrhiVfYOOv2FgoYmtt5984WVOo8cDA4ZxzZ+aeIJVCo+t+WnPzC4tLy6UVe3VtfWPT2dq+0kmmOLR4IhN1EzANUsTQQoESblIFLAokXAfD04l/fQtKiyS+xFEKnYj1Y9ETnKGRuk6lSsMT6lHft6vUR7hDFeXnQnPJtDY5CMe2bXedsltzC9BZ4k1JmUzR7DoffpjwLIIYi5vanptiJ2cKBZcwtv1MQ8r4kPWhbWjMItCdvFhnTCtGCWkvUebESAv150TOIq1HUWCSEcOB/utNxP+8doa9404u4jRDiPn3Q71MUkzopBsaCgUc5cgQxpUwf6V8wBTjaBqclOD9XXmWXB3UvMNa/aJebhxM6yiRXbJH9olHjkiDnJEmaRFO7skjeSYv1oP1ZL1ab9/ROWs6s0N+wXr/AmMemcs=</latexit>

<latexit sha1_base64="gXRHDAeSvQO9LkddTo84BKdaxZk=">AAACFnicbVDLSgMxFM34rPVVdekmWCxuLDOlqMtCNy4r2Ad0Sslkbmsw8zC5I5ahX+HGX3HjQhG34s6/MW1noa0HAodz70lyjhdLodG2v62l5ZXVtfXcRn5za3tnt7C339JRojg0eSQj1fGYBilCaKJACZ1YAQs8CW3vtj6Zt+9BaRGF1ziKoRewYSgGgjM0Ur9wWqK+O4Q76lDXzZeoi/CAKkjrkVLAUY4ol0xr4wB/TPuFol22p6CLxMlIkWRo9Atfrh/xJIAQp/d0HTvGXsoUCi5hnHcTDTHjt2wIXUNDFoDupdNYY3psFJ8OImVOiHSq/nakLNB6FHhmM2B4o+dnE/G/WTfBwUUvFWGcIIR89tAgkRQjOumI+iLL7gvGlTB/pfyGKcbRNJk3JTjzkRdJq1J2zsrVq2qxVsnqyJFDckROiEPOSY1ckgZpEk4eyTN5JW/Wk/VivVsfs9UlK/MckD+wPn8ALIueKA==</latexit>

LHin (di ) = max(0, 1 di ) (Hinge loss) Misclassified Correctly classified

[1] Rosasco, De Vito, Caponnetto, Are loss functions all the same? 2004

Xavier Bresson 26
27

Dual optimization problem

As previously, the primal optimization problem can be solved with the dual problem :
<latexit sha1_base64="YBgUDzbbexgH7wqGryQXWMnryUk=">AAAEbniclVPbbtNAEHWTACXcWpB4oEKMqKhSNbXiqEBfIlXiBal9aFBvUjex1ptNsu167e6ue5HrN76QN76BFz6BseuUXpFYyfJo5syZc1azQSyFsa3Wz6lKtfbg4aPpx/UnT589fzEz+3LHRIlmfJtFMtJ7ATVcCsW3rbCS78Wa0zCQfDc4/JLXd4+5NiJSW/Ys5r2QjpQYCkYtpvzZyvcFIKFQfnrS5BkQIOcn5Nxv99uwBEQi0YACMUnop6LjZX0F3BeQ4yw/tTpMwbjWhaKTS+kL12CdjPgReMs5tokV/BWZVo4aRppKCYIIBTtXiRqxFiGV0N2EWEdoIFyEjJD6wiVEGOBHiTimkisLNsrHFvXCALJLHEKojMe0jEsDhbyhpiz1srSdlZj+FnQn8OW/Oc9Xd/mblHOb0Lnwcil9kPxT+ImwY+TodjbWNwCNk5DacRCk37J+qogVITegJmbGJqaMpy139SMLM9jAYROegaCjrJErWPxfmlJ20ex7Tdd1m0WobjPdw7B+78QmrON6HGSdU1/gDZ36B9eg124qX1Sq4ZBrxdFFMavuz8y33FZx4HbglcG8U55Nf+YHGUQsCXEPmKTG7Hut2PZSqq1gkmd1khiO4g/piO9jqCgK7aXFc8ngA2YGgHuIH+5Rkb3akdLQmLMwQGTuwdys5cm7avuJHa72UqHixHLFLgYNE5lvav72YCA0Z1aeYUCZFqgV2JjiWlp8ofkleDct3w522q73yV3prsyvtcvrmHbmnPdOw/Gcz86a89XZdLYdVvlVna2+qc5Vf9de197W3l1AK1Nlzyvn2qk1/gCuLWd2</latexit>

n
X
min kwk22 + ei s.t. `i .si 1 ei , ei 0 8i 2 V (primal QP problem)
w,e
i=1
is equivalent to
1 T
min ↵ Q↵ ↵T 1n s.t. ↵T ` = 0 (dual QP problem)
0↵ 2
Modification
with Q = LKL 2 Rn⇥n
L = diag(`) 2 Rn⇥n
` = (`1 , ..., `n ) 2 Rn
K 2 Rn⇥n , Kij = xTi xj 2 R (linear kernel)

Xavier Bresson 27
28

Optimization algorithm

Solution α* can be computed with the following iterative scheme :

Initialization : ↵k=0 = k=0


= 0n 2 Rn
<latexit sha1_base64="iMDeYrc8IP4gNtV6aPX5ClIGH58=">AAAEdXicbVNbb9MwFM7aAqPcNnhBQkiHDaZW3UoyTYCQKg3thYkibbAL0txGjuu0VhMni52xkvoX8O9442/wwitOk7XJhqVIn8/l8zlfznFCjwlpmr+XKtXardt3lu/W791/8PDRyurjExHEEaHHJPCC6JuDBfUYp8eSSY9+CyOKfcejp854L/WfXtBIsIAfyUlIez4ecuYygqU22auVnxuAJL2UkZ/scyYZ9tiPmQ/egwKEvXCE+8m4Y6oOcqi8wqbNEePIx3LkOMkX1eeAUH3BdcR8CkLSUIDQdMKdpGwSx3ZGmcGUEHn0HJAbYZJYKkHTQzRF0y6aqjkXiDZul9M75QS1CQvCoi/jKRa2L2mEJU27y+znMR4s+mxZCjrz4ANlJ2ZWIBkEEjLoaXkHWPM6bNiARqEsOGzN31E2b/aTLUs15uQKWlCKhq3SvZspPIbmjLsJxQpz8fMC89uCMTVAt9wIQou+P0ggAdejMKSc0E34TmGEL2jxLyMhcVRWa8/DQszHBdyYk8VsgGtfxX09+awal82CdIINuWokRWrVP4LupzSsBU5maqZDFPpg6VfrdXtl3WybswM3gZWDdSM/B/bKLzQISOxTLkla6ZllhrKX4Egy4lFVR7GgISZjPKRnGnLsU9FLZluj4JW2DMANIv3xVB1tLWYk2Bdi4js6Mh1zcd2XGv/nO4ul+66XMB7GUmudPeTGHsgA0hWEAYsokd5EA0wivXMEyAjrmdWjKVIRrOst3wQn223rTXvncGd9dzuXY9l4ZqwZDcMy3hq7xkfjwDg2SOVP9Wn1RXWt+rf2vPaytpGFVpbynCdG6dRe/wMnFXNO</latexit>

1 1 1
Time steps satisfy ⌧↵ ⌧  s.a. ⌧↵ = ,⌧ =
kQkkLk kQk kLk
Iterate :
↵k+1 = P0· (⌧↵ Q + In ) 1
(↵k + ⌧↵ Q ⌧↵ L k
)
k+1 k
= + ⌧ L↵k+1
At convergence, we have : ↵?
Classification function : fSVM (x) = sign(↵? T LK(x) + b? ) 2 ±1

Xavier Bresson 28
29

Lab 2 : Soft-margin SVM


Run code02.ipynb and analyze SVM result on
Noisy linearly separable data points
Noisy non-linear data points

Noisy non-linear data points

Noisy linearly separable data points


Xavier Bresson 29
30

Outline

Supervised classification

Linear SVM

Soft-margin SVM

Kernel techniques

Non-linear/kernel SVM

Graph SVM

Conclusion

Xavier Bresson 30
31

High-dimensional interpolation

How can we perform function interpolation in high-dimensional spaces?


Reproducing Kernel Hilbert Space[1] (RKHS) : A space associated to bounded, symmetric,
positive semidefinite (PSD) operator called a kernel 𝐾 𝑥, 𝑥 ∶ ℝ$ ×ℝ$ ⟶ ℝ% that can reproduce
any smooth function ℎ 𝑥 ∶ ℝ$ ⟶ ℝ.
Representer theorem[1,2] : Any continuous smooth function ℎ in a RKHS can be represented as
a linear combination of the kernel function 𝐾 evaluated at the training data points 𝑥" :
<latexit sha1_base64="lB9fWwiFPSFCLxwYNosGQ2iEHZU=">AAACRHicbVDLSgMxFM34rPVVdenmYhEqljIjom4EwY3gpopVoVOHJJO2oZnMkGSkZejHufED3PkFblwo4lbM1C58HQicnHMuuTkkEVwb1310JianpmdmC3PF+YXFpeXSyuqljlNFWYPGIlbXBGsmuGQNw41g14liOCKCXZHece5f3TKleSwvzCBhrQh3JG9zio2VglKzW+lvwSH4Oo2CjB96wxsJPhZJFwccTiv9aj/gW7ANpAo+jG4+l36ETZeQ7Hx4E1aB/FBsLPQ7HfCKQans1twR4C/xxqSMxqgHpQc/jGkaMWmowFo3PTcxrQwrw6lgw6KfapZg2sMd1rRU4ojpVjYqYQibVgmhHSt7pIGR+n0iw5HWg4jYZL6r/u3l4n9eMzXtg1bGZZIaJunXQ+1UgIkhbxRCrhg1YmAJporbXYF2scLU2N7zErzfX/5LLndq3l5t92y3fLQzrqOA1tEGqiAP7aMjdILqqIEoukNP6AW9OvfOs/PmvH9FJ5zxzBr6AefjE+iYrnE=</latexit>

n
X
h(x) = ↵i K(x, xi ) + b, x, xi 2 Rd , b 2 R d 1
i=1

[1] Beurling, On two problems concerning linear transformations in Hilbert space, 1948
[2] Scholkopf, Herbrich, Smola, A generalized representer theorem, 2001
David Hilbert Bernhard
1862-1943 Schölkopf
Xavier Bresson 31
32

Representer Theorem

Illustration of the Representer theorem to interpolate functions in high-dimensional spaces :

<latexit sha1_base64="OURsv9uv+PuD6/MiBH76Ih94LjI=">AAAB63icdVDLSsNAFJ3UV62vqks3g0Wom5CU2NZdwY3LCvYBbSiT6bQZOjMJMxOxhP6CGxeKuPWH3Pk3TtoIKnrgwuGce7n3niBmVGnH+bAKa+sbm1vF7dLO7t7+QfnwqKuiRGLSwRGLZD9AijAqSEdTzUg/lgTxgJFeMLvK/N4dkYpG4lbPY+JzNBV0QjHSmRRW789H5Ypj15tNz72Eju0skRG3ceE1oJsrFZCjPSq/D8cRTjgRGjOk1MB1Yu2nSGqKGVmUhokiMcIzNCUDQwXiRPnp8tYFPDPKGE4iaUpouFS/T6SIKzXngenkSIfqt5eJf3mDRE+afkpFnGgi8GrRJGFQRzB7HI6pJFizuSEIS2puhThEEmFt4imZEL4+hf+Tbs1267Z341VatTyOIjgBp6AKXNAALXAN2qADMAjBA3gCzxa3Hq0X63XVWrDymWPwA9bbJ8S9jgg=</latexit>

h(x)
Rd , d
<latexit sha1_base64="LCsEvIo2wLu38JbujGcVlBGhyYo=">AAAB/XicbVDLSsNAFL3xWesrPnZuBovgQkpSirosuHFZxT6giWUymaRDJw9mJkINxV9x40IRt/6HO//GSduFth4YOJxzL/fM8VLOpLKsb2NpeWV1bb20Ud7c2t7ZNff22zLJBKEtkvBEdD0sKWcxbSmmOO2mguLI47TjDa8Kv/NAhWRJfKdGKXUjHMYsYAQrLfXNQyfCauB5+e343j9DvhOGyO6bFatqTYAWiT0jFZih2Te/HD8hWURjRTiWsmdbqXJzLBQjnI7LTiZpiskQh7SnaYwjKt18kn6MTrTioyAR+sUKTdTfGzmOpBxFnp4sssp5rxD/83qZCi7dnMVppmhMpoeCjCOVoKIK5DNBieIjTTARTGdFZIAFJkoXVtYl2PNfXiTtWtU+r9Zv6pVGbVZHCY7gGE7BhgtowDU0oQUEHuEZXuHNeDJejHfjYzq6ZMx2DuAPjM8fTBGUbQ==</latexit>

1
↵l
<latexit sha1_base64="yqtZ0z4BT2VuzNBPahoZfuqsp7A=">AAAB73icbVBNS8NAEJ3Ur1q/qh69LBbBg5REinosePFYwX5AG8pku2mXbjZxdyOU0D/hxYMiXv073vw3btsctPXBwOO9GWbmBYng2rjut1NYW9/Y3Cpul3Z29/YPyodHLR2nirImjUWsOgFqJrhkTcONYJ1EMYwCwdrB+Hbmt5+Y0jyWD2aSMD/CoeQhp2is1OmhSEbYF/1yxa26c5BV4uWkAjka/fJXbxDTNGLSUIFadz03MX6GynAq2LTUSzVLkI5xyLqWSoyY9rP5vVNyZpUBCWNlSxoyV39PZBhpPYkC2xmhGellbyb+53VTE974GZdJapiki0VhKoiJyex5MuCKUSMmliBV3N5K6AgVUmMjKtkQvOWXV0nrsupdVWv3tUr9Io+jCCdwCufgwTXU4Q4a0AQKAp7hFd6cR+fFeXc+Fq0FJ585hj9wPn8ACpyP6g==</latexit>

<latexit sha1_base64="EyDuB3k2OnFj8xBcrEt6rTkJZsg=">AAAB73icbVBNS8NAEJ34WetX1aOXxSJ4kJJIUY8FLx4r2A9oQ5lsN+3azSbuboQS+ie8eFDEq3/Hm//GbZuDtj4YeLw3w8y8IBFcG9f9dlZW19Y3Ngtbxe2d3b390sFhU8epoqxBYxGrdoCaCS5Zw3AjWDtRDKNAsFYwupn6rSemNI/lvRknzI9wIHnIKRortbsokiH2HnqlsltxZyDLxMtJGXLUe6Wvbj+macSkoQK17nhuYvwMleFUsEmxm2qWIB3hgHUslRgx7Wezeyfk1Cp9EsbKljRkpv6eyDDSehwFtjNCM9SL3lT8z+ukJrz2My6T1DBJ54vCVBATk+nzpM8Vo0aMLUGquL2V0CEqpMZGVLQheIsvL5PmRcW7rFTvquXaeR5HAY7hBM7AgyuowS3UoQEUBDzDK7w5j86L8+58zFtXnHzmCP7A+fwBB5SP6A==</latexit>

↵j
xi k22 / 2
<latexit sha1_base64="x6RiFuuiNYQCY+IBb8ffGyg121o=">AAACEnicbVDLSsNAFJ3UV62vqEs3g0VowdYkFHUjFNwIbirYBzRpmEwn7dDJg5mJtLT9Bjf+ihsXirh15c6/cfpYaOuBC4dz7uXee7yYUSEN41tLrayurW+kNzNb2zu7e/r+QU1ECcekiiMW8YaHBGE0JFVJJSONmBMUeIzUvd71xK8/EC5oFN7LQUycAHVC6lOMpJJcPX+b65/2XZqHV9Am/ThXsEf9ghLskWu1rDNb0E6AWlYeunrWKBpTwGVizkkWzFFx9S+7HeEkIKHEDAnRNI1YOkPEJcWMjDN2IkiMcA91SFPREAVEOMPpS2N4opQ29COuKpRwqv6eGKJAiEHgqc4Aya5Y9Cbif14zkf6lM6RhnEgS4tkiP2FQRnCSD2xTTrBkA0UQ5lTdCnEXcYSlSjGjQjAXX14mNatonhdLd6Vs2ZrHkQZH4BjkgAkuQBncgAqoAgwewTN4BW/ak/aivWsfs9aUNp85BH+gff4AZ3KbZA==</latexit>

<latexit sha1_base64="h/NhpxYQVTpI12PKlj8RqKO4hiA=">AAAB73icbVBNS8NAEJ3Ur1q/qh69BIvgQUoiRT0WvHisYD+gDWWy3bRLdzdxdyOU0D/hxYMiXv073vw3btsctPXBwOO9GWbmhQln2njet1NYW9/Y3Cpul3Z29/YPyodHLR2nitAmiXmsOiFqypmkTcMMp51EURQhp+1wfDvz209UaRbLBzNJaCBwKFnECBordXrIkxH2Rb9c8areHO4q8XNSgRyNfvmrN4hJKqg0hKPWXd9LTJChMoxwOi31Uk0TJGMc0q6lEgXVQTa/d+qeWWXgRrGyJY07V39PZCi0nojQdgo0I73szcT/vG5qopsgYzJJDZVksShKuWtid/a8O2CKEsMnliBRzN7qkhEqJMZGVLIh+Msvr5LWZdW/qtbua5X6RR5HEU7gFM7Bh2uowx00oAkEODzDK7w5j86L8+58LFoLTj5zDH/gfP4ADCCP6w==</latexit>

↵m K(x, xi ) = exp( kx )
↵i
<latexit sha1_base64="umECOXPUZMecpKlhGSFM+/uaHcw=">AAAB73icbVBNS8NAEJ3Ur1q/qh69LBbBg5REinosePFYwX5AG8pku2mXbjZxdyOU0D/hxYMiXv073vw3btsctPXBwOO9GWbmBYng2rjut1NYW9/Y3Cpul3Z29/YPyodHLR2nirImjUWsOgFqJrhkTcONYJ1EMYwCwdrB+Hbmt5+Y0jyWD2aSMD/CoeQhp2is1OmhSEbY5/1yxa26c5BV4uWkAjka/fJXbxDTNGLSUIFadz03MX6GynAq2LTUSzVLkI5xyLqWSoyY9rP5vVNyZpUBCWNlSxoyV39PZBhpPYkC2xmhGellbyb+53VTE974GZdJapiki0VhKoiJyex5MuCKUSMmliBV3N5K6AgVUmMjKtkQvOWXV0nrsupdVWv3tUr9Io+jCCdwCufgwTXU4Q4a0AQKAp7hFd6cR+fFeXc+Fq0FJ585hj9wPn8ABhCP5w==</latexit>

xi xj xl xm

n
X
<latexit sha1_base64="OMElD5xjVclIg0OqbKrp+zLeUT4=">AAADQ3ichVJLb9NAELbNq4RXCkcuIyqqRNAQh/K4VKrEASQuBfWFuok1Xm/iVXfXZncNttz8Ny78AW78AS4cQIgrEus0VH1JjLTSp5mdb755xLngxvb7X/3gwsVLl68sXG1du37j5q324u1tkxWasi2aiUzvxmiY4IptWW4F2801QxkLthPvv2jiOx+YNjxTm7bK2VDiRPExp2idK1r03y1D2im7sAbEFDKq+Vo4HSkgKPIUIw6vO+XDMuJdeAAx4YpItGkc12+nQEhrGYhlpdWy/shtCjZlIDNjgWZSZgr2mVZMGEDNIGFjJzEBNP8y3xeYzOirpno52qyApCZHyupB7wmV0yNy6DTtoZ4TduF8CsLKvLNCDsqVihxEg9HgETF8InE06AI5RvYSC2M4qv/QdXCmyfXdHVE40hb2Hp/UlmeiUpnkKE4QQtRe6vf6M4OzIJyDJW9uG1H7C0kyWkimLBVozF7Yz+2wRm05FWzaIoVhTsI+TtiegwolM8N6dgNTuO88CYwz7Z5qNuC8xzNqlMZUMnY/mxWa07HGeV5sr7Dj58Oaq7ywTNHDQuNCgM2gOShIuGbUisoBpJo7rUBT1EitO7uWG0J4uuWzYHvQC5/2Vt+sLq0P5uNY8O5697yOF3rPvHXvlbfhbXnU/+R/83/4P4PPwffgV/D78Gvgz3PueCcs+PMXU3cDUA==</latexit>

h(x) = ↵i K(x, xi ) + b 2 R
i=1
with the most common kernels are defined as
K(x, y) = xT y (linear kernel)
K(x, y) = exp( kx yk22 / 2
) (Gaussian kernel)
K(x, y) = (axT y + b)c (polynomial kernel)

Xavier Bresson 32
33

Feature map, kernel trick and interpolation

Any feature map 𝜙 defines a reproducing kernel 𝐾 , and inversely.


Any kernel 𝐾 can be used to design a smooth high-dim function ℎ.

Reproducing
kernel 𝐾

Representer theorem Kernel trick


n
X
<latexit sha1_base64="DaMraS/xBmEVgMJMPRZThpaD7i8=">AAACGHicbVDLSsNAFJ3UV62vqks3g0VpsdREiropFNwIbirYBzQxTKaTduhkEmYm0hL6GW78FTcuFHHbnX9j0mahrQcuHM65l3vvcQJGpdL1by2zsrq2vpHdzG1t7+zu5fcPWtIPBSZN7DNfdBwkCaOcNBVVjHQCQZDnMNJ2hjeJ334iQlKfP6hxQCwP9Tl1KUYqluz8+SkcFEclWIOmDD07ojVj8sihiVgwQDaFd8VReWTTEjyDDjTNnJ0v6BV9BrhMjJQUQIqGnZ+aPR+HHuEKMyRl19ADZUVIKIoZmeTMUJIA4SHqk25MOfKItKLZYxN4Eis96PoiLq7gTP09ESFPyrHnxJ0eUgO56CXif143VO61FVEehIpwPF/khgwqHyYpwR4VBCs2jgnCgsa3QjxAAmEVZ5mEYCy+vExaFxXjslK9rxbq5TSOLDgCx6AIDHAF6uAWNEATYPAMXsE7+NBetDftU/uat2a0dOYQ/IE2/QEt/Zys</latexit>

K(x, y) = (x)T (y)


<latexit sha1_base64="rk8wsWWHoi+dzSklvGRdeWLx+kY=">AAACAXicbZDLSsNAFIYn9VbrLepGcDNYhBZKSaSoG6HgRnBToTdoY5lMJ+3QySTMTKQh1I2v4saFIm59C3e+jUmahbb+MPDxn3M4c37bZ1Qqw/jWciura+sb+c3C1vbO7p6+f9CWXiAwaWGPeaJrI0kY5aSlqGKk6wuCXJuRjj25TuqdByIk9XhThT6xXDTi1KEYqdga6Ee3pWklLF/1/TEtTcv3zRTCcmGgF42qkQoug5lBEWRqDPSv/tDDgUu4wgxJ2TMNX1kREopiRmaFfiCJj/AEjUgvRo5cIq0ovWAGT2NnCB1PxI8rmLq/JyLkShm6dtzpIjWWi7XE/K/WC5RzaUWU+4EiHM8XOQGDyoNJHHBIBcGKhTEgLGj8V4jHSCCs4tCSEMzFk5ehfVY1z6u1u1qxXsniyINjcAJKwAQXoA5uQAO0AAaP4Bm8gjftSXvR3rWPeWtOy2YOwR9pnz+s75UN</latexit>

h(x) = ↵i K(x, xi ) + b Norm of ℎ


khk2HK = hT Kh
<latexit sha1_base64="FJmFn390Kq+s0xTJujtodPAHMWg=">AAACCXicbVDLSsNAFJ34rPUVdelmsAgupCSlqBuh4KbQTYW+oEnDZDpphk4ezEyEkmbrxl9x40IRt/6BO//GSduFth64cDjnXu69x40ZFdIwvrW19Y3Nre3CTnF3b//gUD867ogo4Zi0ccQi3nORIIyGpC2pZKQXc4ICl5GuO77L/e4D4YJGYUtOYmIHaBRSj2IkleToEFpT35o6qRUg6WPE0nrmNLJB5Rb6g1bDh45eMsrGDHCVmAtSAgs0Hf3LGkY4CUgoMUNC9E0jlnaKuKSYkaxoJYLECI/RiPQVDVFAhJ3OPsnguVKG0Iu4qlDCmfp7IkWBEJPAVZ35vWLZy8X/vH4ivRs7pWGcSBLi+SIvYVBGMI8FDiknWLKJIghzqm6F2EccYanCK6oQzOWXV0mnUjavytX7aql2uYijAE7BGbgAJrgGNVAHTdAGGDyCZ/AK3rQn7UV71z7mrWvaYuYE/IH2+QNbLZln</latexit>

i=1

Bounded
Feature map
continuous
𝜙
function ℎ
n
X
<latexit sha1_base64="hZYrjmv30qM04kPc1q/a8KUEFmY=">AAACI3icbZDLSsNAFIYn9VbrrerSzWBRWpSSSFERCgU3Liv0Bk0aJtNpM3QyCTMTsYS+ixtfxY0Lpbhx4bs4bbPQ6oGBj/8/hzPn9yJGpTLNTyOzsrq2vpHdzG1t7+zu5fcPWjKMBSZNHLJQdDwkCaOcNBVVjHQiQVDgMdL2Rrczv/1AhKQhb6hxRJwADTkdUIyUltz8zSn0i48lWIW2jAM3oVVr0uPQRizykUuhHflU+73GAlxagmfQg7ady7n5glk25wX/gpVCAaRVd/NTux/iOCBcYYak7FpmpJwECUUxI5OcHUsSITxCQ9LVyFFApJPMb5zAE6304SAU+nEF5+rPiQQFUo4DT3cGSPly2ZuJ/3ndWA2unYTyKFaE48WiQcygCuEsMNingmDFxhoQFlT/FWIfCYSVjnUWgrV88l9oXZSty3LlvlKonadxZMEROAZFYIErUAN3oA6aAIMn8ALewLvxbLwaU+Nj0Zox0plD8KuMr29TxaDq</latexit>

h(x) = ↵i (x)T (xi ) + b


i=1

Xavier Bresson 33
34

Outline

Supervised classification

Linear SVM

Soft-margin SVM

Kernel techniques

Non-linear/kernel SVM

Graph SVM

Conclusion

Xavier Bresson 34
35

Feature engineering for non-linear data

Linear models, s.a. original and soft-margin SVM, assume linearly separable data points.
But in many real-world scenarios, datasets are not linearly separable, i.e. a hyper-plane
cannot distinguish between distinct classes.
How to address this challenge and classify complex/non-linear datasets with linear separators?
Feature engineering approach[1] : Project the data into a higher-dimensional space using a
feature map 𝜙 where the data becomes linearly separable.

0
R d C+ C-
<latexit sha1_base64="50kvs8yGfgaYEEo8OSbjv/HGIxs=">AAAB+XicbVDLSsNAFL3xWesr6tLNYBFdlaQUdVlw47KKfUAby2QyaYdOJmFmUiihf+LGhSJu/RN3/o2TNgttPTBwOOde7pnjJ5wp7Tjf1tr6xubWdmmnvLu3f3BoHx23VZxKQlsk5rHs+lhRzgRtaaY57SaS4sjntOOPb3O/M6FSsVg86mlCvQgPBQsZwdpIA9vuR1iPfD97mD1lwcWsPLArTtWZA60StyAVKNAc2F/9ICZpRIUmHCvVc51EexmWmhFOZ+V+qmiCyRgPac9QgSOqvGyefIbOjRKgMJbmCY3m6u+NDEdKTSPfTOY51bKXi/95vVSHN17GRJJqKsjiUJhypGOU14ACJinRfGoIJpKZrIiMsMREm7LyEtzlL6+Sdq3qXlXr9/VKo1bUUYJTOINLcOEaGnAHTWgBgQk8wyu8WZn1Yr1bH4vRNavYOYE/sD5/ABfbk0M=</latexit>

<latexit sha1_base64="Yb/z2pnwgpGBQdVCOsOzQaFOQYo=">AAAB83icbVDLSsNAFL2pr1pfVZduBovgqiSlVJcFNy6r2Ac0sUwmk3boZBJmJkIJ/Q03LhRx68+482+ctFlo64GBwzn3cs8cP+FMadv+tkobm1vbO+Xdyt7+weFR9fikp+JUEtolMY/lwMeKciZoVzPN6SCRFEc+p31/epP7/ScqFYvFg54l1IvwWLCQEayN5LoR1hPfz+7nj8GoWrPr9gJonTgFqUGBzqj65QYxSSMqNOFYqaFjJ9rLsNSMcDqvuKmiCSZTPKZDQwWOqPKyReY5ujBKgMJYmic0Wqi/NzIcKTWLfDOZZ1SrXi7+5w1THV57GRNJqqkgy0NhypGOUV4ACpikRPOZIZhIZrIiMsESE21qqpgSnNUvr5Neo+606s27Zq3dKOoowxmcwyU4cAVtuIUOdIFAAs/wCm9War1Y79bHcrRkFTun8AfW5w80F5HB</latexit>

<latexit sha1_base64="beh/cz/PyY5lNSWYctUDOUcdA0c=">AAACCnicbVA9SwNBEN2L3/ErammzGoTYhLsQ1DIgiKWCMYFcCHubSbJk9+7YnRPDkdrGv2JjoYitv8DOf+MmXqGJDwYe780wMy+IpTDoul9ObmFxaXlldS2/vrG5tV3Y2b01UaI51HkkI90MmAEpQqijQAnNWANTgYRGMDyf+I070EZE4Q2OYmgr1g9FT3CGVuoUDnyEe9QqvQCGiQaqWEzH1I8HouTzboTHnULRLbtT0HniZaRIMlx1Cp9+N+KJghC5ZMa0PDfGdso0Ci5hnPcTAzHjQ9aHlqUhU2Da6fSVMT2ySpf2Im0rRDpVf0+kTBkzUoHtVAwHZtabiP95rQR7Z+1UhHGCEPKfRb1EUozoJBfaFRo4ypEljGthb6V8wDTjaNPL2xC82ZfnyW2l7J2Uq9fVYq2SxbFK9skhKRGPnJIauSRXpE44eSBP5IW8Oo/Os/PmvP+05pxsZo/8gfPxDct4mkQ=</latexit>

Rd
Feature map (·)

0
<latexit sha1_base64="PxsLLIXczI9vEHdcohORIyiiGYU=">AAACU3icbVFNTxsxFPRuaaFpKWk59vJEKAmXaBehllOFxKVHWjWAFKeR13Y2Bq+9td/SRKv9jwiph/6RXjjQ3ZADHx3J0mhmnuw3TnKtPEbRnyB8tvL8xeray9ar1+tvNtpv3514WzguB9xq684S5qVWRg5QoZZnuZMsS7Q8TS6OGv/0UjqvrPmO81yOMpYaNVGcYS2N2+c7M6oMzRhOk6T8Vv0QQIE6lU6ROWd/Ac2nqjfbfRgqRbeitLUDQH8WTIDofgZBUc7QZSX0cuu9SvQctkWXpimI7V2oWuN2J+pHC8BTEi9JhyxxPG5fU2F5kUmDXDPvh3GU46hkDhXXsmrRwsuc8QuWymFNDcukH5WLTir4UCsCJtbVxyAs1PsTJcu8n2dJnWz28o+9RvyfNyxwcjAqlckLlIbfXTQpNKCFpmAQykmO9fJCMe5U/VbgU+YYx/obmhLixys/JSd7/fhjf//rfudwb1nHGnlPtkiPxOQTOSRfyDEZEE6uyF9yG5Dgd3AThuHKXTQMljOb5AHC9X9aOLGn</latexit>

x 2 Rd ! (x) 2 Rd C+

d0 > d (possibly d0 d)
C-

Non-linear dataset Linear dataset


and separator and separator
[1] Aizerman et-al, Theoretical foundations of the potential function method in pattern recognition learning, 1964

Xavier Bresson 35
36

Kernel trick

Non-linear mapping 𝜙 enables the separation of non-linear data points.


However, this approach entails operating in a larger feature space compared to the original
one, resulting in increased complexity of O(𝑑′) where d& ≫ d.
To address this issue, the kernel trick was devised[1,2], offering a solution without the
explicit use of the mapping 𝜙.
With this approach, computing the kernel operator/matrix is defined as K = 𝜙 # 𝜙,
rather than 𝜙 individually, making the exact expression of 𝜙 irrelevant.
Some standard kernel operators include :
Time consuming
K(xi , xj ) = xTi xj
<latexit sha1_base64="t33pmjaaYxLRg3N5Hcg02w4gcNA=">AAADAnicbVJLbxMxEPYurxIeTeGCxMUiAiWChCREpRekShxA4hKkpK0UJ6tZx5uY2N7F9qJE24gLf4ULBxDiyq/gxr/Bm2ygpJ3T528e38x4wkRwY5vN355/6fKVq9d2rpdu3Lx1e7e8d+fIxKmmrE9jEeuTEAwTXLG+5Vawk0QzkKFgx+HsZe4//sC04bHq2UXChhImikecgnVUsOfde4TfVOcBfzIP3tXwC+zgqOcwJu9TGGMyNQlQlnUaB1QuiWVzq2VWzeVA4xnTigkcxRpvmLpkoExtSUhpqzJJpjx/10a9Aq5pNk+qdXLqPHVHkdOgPWo/JYZPJIzatU0fG+VXkBrDQRXaFwlV4e8Q+DEOayO6Ncyzxr4b5l/NbiwWKpYcxJmqQbnSbDRXhs+DVgEqqLBuUP5FxjFNJVOWCjBm0GomdpiBtpwKtiyR1DAnP4MJGzioQDIzzFZfuMQPHTNeLTKKlcUr9mxGBtKYhQxdpAQ7Ndu+nLzIN0htdDDMuEpSyxRdC0WpwDbG+T3gMdeMWrFwAKjmrldMp6CBWnc1JbeE1vbI58FRu9Hab3TediqH7WIdO+g+eoCqqIWeo0P0GnVRH1Hvo/fZ++p98z/5X/zv/o91qO8VOXfRf+b//AN7Au3o</latexit>

(linear kernel for linear k-means)


K(xi , xj ) = (xi )T (xj ) = exp( kxi xj k22 / 2
) (Gaussian kernel)
K(xi , xj ) = (axTi xj + b)c (Polynomial kernel)
Efficient kernel
computation

[1] Aizerman et-al, Theoretical foundations of the potential function method in pattern recognition learning, 1964
[2] Guyon, Boser, Vapnik, Automatic capacity tuning of very large VC-dimension classifiers, 1993
Xavier Bresson 36
37

Non-linear/kernel SVM

Primal optimization problem[1] w.r.t. 𝑤 :

<latexit sha1_base64="eAuGzP1erWyQjKZ/AmV2Wmqjf7k=">AAADIHichVJLb9NAELbNq4RHUzhyGRFRpQqx7KgqXJAq9YKEkIpo0krZxFpvxsmq67XZXdNErn8KF/4KFw4gBDf4NWzciEdbxJw+zXyP2UecC65NEHx3vStXr12/sXazcev2nbvrzY17A50VimGfZSJTRzHVKLjEvuFG4FGukKaxwMP4eG85P3yLSvNMHphFjqOUTiVPOKPGtqINd2cTSMplVJ48xooAOT0hp+Ne1IMOEGF9JhSILtKo5M/CaiwBIw6WZnBuVFqC9o0PlaViYkjZIDFOuSypUnRRlUJUjZPxAZB8xtvziG9Z0xjIFN9AJ4Ru7bX52yvJlLWaLwO4hL2oQ8hlcmHl3dDi/8i7Vl5vu8wL/s0cgCUSlJPV2g2i+HRm/F/09gtUEgW8HrzcskclYCtqtgI/qAsugnAFWs6q9qPmNzLJWJGiNExQrYdhkJuRjTScCbShhcacsmM6xaGFkqaoR2X9wBU8sp1JvXaSSQN1909FSVOtF2lsmSk1M31+tmxeNhsWJnk6KrnMC4OSnQUlhQCTwfK3wIQrZEYsLKBMcbsrsBlVlBn7pxr2EsLzR74IBj0/3PG3X223dnur61hzHjgPnbYTOk+cXee5s+/0Hea+cz+4n9zP3nvvo/fF+3pG9dyV5r7zV3k/fgJWrvUD</latexit>

8 T
n
X < w (xi ) + b +1 ei for xi 2 C+
min kwk22 + ei s.t. wT (xi ) + b  1 + ei for xi 2 C (Kernel SVM)
w,e :
i=1 ei 0 for xi 2 V

Dual optimization problem w.r.t. α :


1 T
<latexit sha1_base64="zY6Vcj85Q39UQNgMbn/tETsj1xY=">AAAD6XiclVNLb9NAEHZtHiW8UjhyGZGCEilYdlSgl0qVkBBSemhRX1I3tdabSbLNem28a2iw/A+4cAAhrvwjbvwaWDuuSFt6YCVLn2a/x+xkEiaCK+15v5Zs59r1GzeXbzVu37l7735z5cG+irOU4R6LRZwehlSh4BL3NNcCD5MUaRQKPAinr8r7g/eYKh7LXT1LcBDRseQjzqg2pWBl6XfjKZCIyyD3gAh8B4SKZEJrLIzTkEIBBMgopSz3i7xX1JzjXdg5oz/7W/MDWdI1nuo0ykG52oUFCUEhYANMGimja9oHrieGtbOx1d8CwiWJqJ6EYf62OM4l0TxCBbKoNROVUIa5564/Z1EBW8buzGfI6bholxmd/7WpG6vEgd91XbdbQXnZ6QqH/pWJXegHOT8pNkgy4e3TgHfKSczxSeecbHF27TFKTKngH3EIU0wlmmeV4QAL8b2q+zPN60yy8reF1dJ/FbjJR7MCkCljgqdmq5jZk5k79wmaLc/1qgOXgV+DllWf7aD5kwxjlkUoNRNUqSPfS/Qgp6nmTGDRICbHtDWlYzwyUFIzgUFebWoBT6omRnFqPqmhqi4qchopNYtCwywHoi7elcV/3R1lerQ+yLlMMo2SzYNGmQAdQ7n2MOQpMvNqAyhLuekV2ISajdbmz9EwQ/AvPvky2O+5/gt3bWettdmrx7FsPbIeW23Lt15am9Yba9vas5g9sj/ZX+yvztT57Hxzvs+p9lKteWidO86PPy0oPmo=</latexit>

min ↵ Q↵ ↵T 1n s.t. ↵T ` = 0
0↵ 2
with Q = LKL 2 Rn⇥n
L = diag(`) 2 Rn⇥n
` = (`1 , ..., `n ) 2 Rn
K 2 Rn⇥n , Kij = (xi )T (xj ) 2 R (generalized kernel)
Function is never used explicitly.

[1] Boser, Guyon, Vapnik, A training algorithm for optimal margin classifiers, 1992

Xavier Bresson 37
38

Non-linear/kernel SVM

Decision function 𝑓(𝑥) :


<latexit sha1_base64="5CqodeVg0p1hH7pLgfA74b75ZJo=">AAAESXicjVNbb9MwGM3SASPcOnjk5RMVqBVV1VbjIqRKE3sAiSENWLtJdRc5rttYc5wodnpRlL/HC2+88R944QGEeMJOL7TdQFiKcvJdjs93HHsRZ1LV61+27ML2lavXdq47N27eun2nuHu3I8MkJrRNQh7Gpx6WlDNB24opTk+jmOLA4/TEOz8w+ZMRjSULxbGaRrQX4KFgA0aw0iF313YfAVJ0ouIgfcVGVEAGMIYWIJkELgMECPPIxwZSzs0r8ll54rIKICZQgJXveen77KwPCDl/yMYUfDyiOd3ZMUz+i1IXzvAG+ZzalxEmNG2SIPsX3RvDVTUccy0wZsrXShaJ1soQq1tetknOrYsOdXOlutytaj7XDRBVOFyPpAIpFlAJIptpyc8rHftM0SxFL9mwkoGzZtsBx1IuzwcGiSA5eKHlIxi4i7oPnbeZ0dxadko2FFlZe70Y5zF4FaN3YUI5dyGKWYA5jHDMsP5JKhlCACtzP6ktJl/jXffhL9z9ZIPZcRy3WKrX6vmCi6AxByVrvo7c4mfUD0kSUKGIMaPbqEeql+JYMcJp5qBEUq30HA9pV0OBtcG9NHc2g4c60odBGOtHKMijqx0pDqScBp6uNMckN3MmeFmum6jB817KRJQoKshso0HCQYVgrhX0WUyJ4lMNMImZ1grExzEmSl8+Y0Jjc+SLoNOsNZ7W9t7tlfabczt2rPvWA6tsNaxn1r712jqy2haxP9pf7e/2j8KnwrfCz8KvWam9Ne+5Z62t7cJvA9Jgjw==</latexit>

X
Given w = ↵i `i (xi ) 2 Rd
i
X
T
we have w x = ↵i `i (xi )T (x) 2 R
i
X
= ↵i `i K(xi , x) with K(xi , x) = (xi )T (x)
i

T n n⇥n
= ↵ LK(x), ↵, K(x) 2 R , L 2 R
Classification function : fSVM (x) = sign(wT (x) + b) (with primal variable)
= sign(↵T LK(x) + b) (with dual variable)

Xavier Bresson 38
39

Optimization algorithm

Solution α* can be computed with the following iterative scheme :

Initialization : ↵k=0 = k=0


= 0n 2 Rn
<latexit sha1_base64="VEL54fLMIQX9qMfHshzoQkKKhug=">AAAEd3icbVNbb9MwFM7aAmPcOniDB44oTK26Vck0AUKqNLQXJoq0wW7S3EaO67RWEyeLnbGS+ifw53jjf/DCG06TtcnGkSJ9PpfP53w5dkKPCWmav1cq1dqdu/dW7689ePjo8ZP6+tMTEcQRocck8ILozMGCeozTY8mkR8/CiGLf8eipM9lL46eXNBIs4EdyGtK+j0ecuYxgqV32euXnBiBJr2TkJ/ucSYY99mMegw+gAGEvHONBMumaqoscKq+xaXPEOPKxHDtO8lUNOCC0tuQ6Yj4FIWkoQGg64U5TNoljO6PMYEqIPHoByI0wSSyVoNkhmqFZD83UggtEB3fK5d1ygdqEJWExlvEUG9uXNMKSptNl/osYD5dzti0F3UXygbITM2uQDAMJGfS0vEOseR02akKz0BYcthf3KJu3BsmWpZoLcgVtKGXDVuncyxSeQGvO3YJih7n4eYP5acmYOqBXHgSh5dwfJZCA61UYUU7oJnynMMaXtPiXkZA4Kqu152EhFusCbszJcjfAta/zvp18Uc2rVkE6wUZcNZMitRocQe9zmtYGJ3O10iUKfbD0ranZ9YbZMecGt4GVg4aR24Fd/4WGAYl9yiVJez23zFD2ExxJRjyq1lAsaIjJBI/ouYYc+1T0k/m7UfBGe4bgBpH+eKqP9hYrEuwLMfUdnZkuurgZS53/i53H0n3fTxgPY6nVzi5yYw9kAOkjhCGLKJHeVANMIv3qCJAx1lurl1OkIlg3R74NTrY71tvOzuFOY3c7l2PVeGG8MpqGZbwzdo1PxoFxbJDKn+rzaqP6uvq39rK2UWtmqZWVvOaZUbKa9Q9l03N2</latexit>

1 1 1
Time steps satisfy ⌧↵ ⌧  s.a. ⌧↵ = ,⌧ =
kQkkLk kQk kLk
Iterate :
↵k+1 = P0· (⌧↵ Q + In ) 1
(↵k + ⌧↵ Q ⌧↵ L k
)
k+1 k
= + ⌧ L↵k+1
At convergence, we have : ↵?
Classification function : fSVM (x) = sign(↵? T LK(x) + b? ) 2 ±1

Generalized
kernel

Xavier Bresson 39
40

Supervised learning for classification

SVM belongs to the class of supervised classification algorithms.


In general, algorithms of this class can be described as follows:
n
X
<latexit sha1_base64="Y1v1hnhGdvxjcQRblK1ldBruLAU=">AAAFF3icdVRLbxMxEF6a8AqvFo5cRrSgRDQhKU8hBVVwiRRABfqS6nbleO3E4PWubIckbPdfcOGvcOEAQlzhxr9hdrutkpb6NJkZ63v42/RiJa1rNv+emSuVz547f+Fi5dLlK1evzS9c37TR0DC+wSIVme0etVxJzTecdIpvx4bTsKf4Vu/Di2y+9ZEbKyO97iYx3w1pX0shGXXY8hdK9TtAQqn9RBCpSUjdgFGVdFK/mxIg+4Ls+8lse28F7gJRiBFQIHYY+olst9I9DS994vjYmTAJqKNpVfhymXClfFkDQiqIVIxH0g0gne295Ujccu24ATfgkeEhPMUlUR3XoH20ZmVfp6Qn+9UZaEJVPKC+hG51vDxGwGwFUSWO4hBaM1CvIxNCJGBJLAEuvO123uVQp8hto1jdVxzEsgBi8vr4Vrtg8z5F0SD899DNfyFzsbfexXtH6FC1YRS5gebW3jO8P1TUSDcpCNUObRnYmDKe3G88ZOHp1HAwwsEKPsoUgohMJif3rj3aWx/PwGdZoQbebb6qQZq38xwlo4F0PE3Ic3QuhcqMZa+kZYpaexQd4MYgSmbb8We3U8/enpp2pE6rgS/buFDM2yhoXG0ut+pBHpIplrjd56Aia2uZltNozrDsYMJNPaaGhjzL0VIR02fNJWCRdiZSNgsXOEMDXo+EAOhxN+Jcw+FLfDqQR3UAmRwQMsCvy00ayKLizy82G838wMmiVRSLXnHW/Pk/JIjYMMRc5/7ttJqx202ocZIpnlbI0HJ85w+0z3ew1Ejc7ia50BRuYyfIH1Mgeci70zcSGlo7CXu4meXCHp9lzf/NdoZOPNlNpI6Hjmt2ACSGClwE2Z8EBNJw5tQEC8ownJIBG6CrDE21mQmt45JPFpsrjdajxoM3DxZXVwo7Lng3vVte1Wt5j71Vr+OteRseK30ufS19L/0ofyl/K/8s/zpYnTtT3LnhzZzy739T6rEy</latexit>

min kf k2HK + Ldata (fi , `i )


f 2HK
i=1
with
n
X
Representer theorem : f (x) = sign ↵i K(x, xi ) 2 ±1
i=1
X
Norm of f in RKHS : kf k2HK = hf, f iHK = fi fj Kij = f T Kf (smoothness/regularity of f )
ij

kf k2HK = kwk22 T
for f (x) = w x (linear SVM)

Misclassification error : Ldata (si , `i ) = LHin (di = si `i ) = max(0, 1 di ) (Hinge loss)
Hyper-parameter > 0 controls the trade-o↵ between regularization and data fidelity.

Xavier Bresson 40
41

Lab 3 : Kernel/non-linear SVM


Run code03.ipynb and analyze kernel SVM result on
Noisy non-linear data points
Real-world text documents

Linear SVM

Kernel SVM
Xavier Bresson 41
42

Outline

Supervised classification

Linear SVM

Soft-margin SVM

Kernel techniques

Non-linear/kernel SVM

Graph SVM

Conclusion

Xavier Bresson 42
43

Semi-supervised classification

Semi-supervised classification (SSC) leverages both labeled and unlabeled data to boost the
classification process.
Labeled data, annotated by humans, provide precise insights into class membership, offering a
rich information for learning.
However, human annotation is time-consuming, costly, susceptible to human biases and errors.
In contrast, unlabeled data depict the underlying structure of the data distribution.
Collecting unlabeled data is efficient, cheap, but inherently noisy.
SSC proves particularly beneficial when labeled data are scarce.
The situation where n ≪ m, where the number n of labeled instances is significantly smaller
than the number m of unlabeled instances.
An extreme scenario is when each class has only one labeled instance, n = 1.

Xavier Bresson 43
44

Geometric structure

Unlabeled data encapsulate valuable statistical information, particularly the geometric


structure of the data distribution.
How to leverage this information within the supervised SVM classification framework?

Rd
<latexit sha1_base64="Yb/z2pnwgpGBQdVCOsOzQaFOQYo=">AAAB83icbVDLSsNAFL2pr1pfVZduBovgqiSlVJcFNy6r2Ac0sUwmk3boZBJmJkIJ/Q03LhRx68+482+ctFlo64GBwzn3cs8cP+FMadv+tkobm1vbO+Xdyt7+weFR9fikp+JUEtolMY/lwMeKciZoVzPN6SCRFEc+p31/epP7/ScqFYvFg54l1IvwWLCQEayN5LoR1hPfz+7nj8GoWrPr9gJonTgFqUGBzqj65QYxSSMqNOFYqaFjJ9rLsNSMcDqvuKmiCSZTPKZDQwWOqPKyReY5ujBKgMJYmic0Wqi/NzIcKTWLfDOZZ1SrXi7+5w1THV57GRNJqqkgy0NhypGOUV4ACpikRPOZIZhIZrIiMsESE21qqpgSnNUvr5Neo+606s27Zq3dKOoowxmcwyU4cAVtuIUOdIFAAs/wCm9War1Y79bHcrRkFTun8AfW5w80F5HB</latexit>

d
R Rd
<latexit sha1_base64="Yb/z2pnwgpGBQdVCOsOzQaFOQYo=">AAAB83icbVDLSsNAFL2pr1pfVZduBovgqiSlVJcFNy6r2Ac0sUwmk3boZBJmJkIJ/Q03LhRx68+482+ctFlo64GBwzn3cs8cP+FMadv+tkobm1vbO+Xdyt7+weFR9fikp+JUEtolMY/lwMeKciZoVzPN6SCRFEc+p31/epP7/ScqFYvFg54l1IvwWLCQEayN5LoR1hPfz+7nj8GoWrPr9gJonTgFqUGBzqj65QYxSSMqNOFYqaFjJ9rLsNSMcDqvuKmiCSZTPKZDQwWOqPKyReY5ujBKgMJYmic0Wqi/NzIcKTWLfDOZZ1SrXi7+5w1THV57GRNJqqkgy0NhypGOUV4ACpikRPOZIZhIZrIiMsESE21qqpgSnNUvr5Neo+606s27Zq3dKOoowxmcwyU4cAVtuIUOdIFAAs/wCm9War1Y79bHcrRkFTun8AfW5w80F5HB</latexit> <latexit sha1_base64="Yb/z2pnwgpGBQdVCOsOzQaFOQYo=">AAAB83icbVDLSsNAFL2pr1pfVZduBovgqiSlVJcFNy6r2Ac0sUwmk3boZBJmJkIJ/Q03LhRx68+482+ctFlo64GBwzn3cs8cP+FMadv+tkobm1vbO+Xdyt7+weFR9fikp+JUEtolMY/lwMeKciZoVzPN6SCRFEc+p31/epP7/ScqFYvFg54l1IvwWLCQEayN5LoR1hPfz+7nj8GoWrPr9gJonTgFqUGBzqj65QYxSSMqNOFYqaFjJ9rLsNSMcDqvuKmiCSZTPKZDQwWOqPKyReY5ujBKgMJYmic0Wqi/NzIcKTWLfDOZZ1SrXi7+5w1THV57GRNJqqkgy0NhypGOUV4ACpikRPOZIZhIZrIiMsESE21qqpgSnNUvr5Neo+606s27Zq3dKOoowxmcwyU4cAVtuIUOdIFAAs/wCm9War1Y79bHcrRkFTun8AfW5w80F5HB</latexit>

C+ C- C+ C+ C-

C-

Labeled data +, Labeled data +, Labeled data +,


Supervised classification Supervised classification Unlabeled data o
Semi-supervised classification

Xavier Bresson 44
45

Manifold and graph

The data distribution remains unchanged regardless of whether labels are present or absent.
Both labeled and unlabeled data points are assumed to belong to a manifold within the
d-dimensional feature space.
This manifold is estimated using a k-nearest neighbor graph constructed from the data points,
serving as an approximation of the underlying manifold structure.

C+ C- C+ C-

Rd
<latexit sha1_base64="Yb/z2pnwgpGBQdVCOsOzQaFOQYo=">AAAB83icbVDLSsNAFL2pr1pfVZduBovgqiSlVJcFNy6r2Ac0sUwmk3boZBJmJkIJ/Q03LhRx68+482+ctFlo64GBwzn3cs8cP+FMadv+tkobm1vbO+Xdyt7+weFR9fikp+JUEtolMY/lwMeKciZoVzPN6SCRFEc+p31/epP7/ScqFYvFg54l1IvwWLCQEayN5LoR1hPfz+7nj8GoWrPr9gJonTgFqUGBzqj65QYxSSMqNOFYqaFjJ9rLsNSMcDqvuKmiCSZTPKZDQwWOqPKyReY5ujBKgMJYmic0Wqi/NzIcKTWLfDOZZ1SrXi7+5w1THV57GRNJqqkgy0NhypGOUV4ACpikRPOZIZhIZrIiMsESE21qqpgSnNUvr5Neo+606s27Zq3dKOoowxmcwyU4cAVtuIUOdIFAAs/wCm9War1Y79bHcrRkFTun8AfW5w80F5HB</latexit>

Labeled data +, Manifold M embedded Manifold M is Graph G=(V,E,A)


Unlabeled data o in Rd. Data points are represented by a k-NN k-NN graph
Semi-supervised classification sampled from M. graph of the data points.
on manifold
Xavier Bresson 45
46

Manifold regularization

We aim to ensure that the classification function 𝑓(𝑥) exhibits smoothness across the
manifold, which is approximated by the k-NN graph.
This smoothness constraint will propagate the label information throughout the graph, i.e.
neighboring data points will tend to share the same label.

<latexit sha1_base64="mkPZTEOGd0peiZA9YvD/oq0UsyA=">AAAB8HicbVBNSwMxEJ2tX7V+VT16CRahopRdKepFKHjxWMF+SLuUbJptQ5PskmTFsvRXePGgiFd/jjf/jWm7B60+GHi8N8PMvCDmTBvX/XJyS8srq2v59cLG5tb2TnF3r6mjRBHaIBGPVDvAmnImacMww2k7VhSLgNNWMLqe+q0HqjSL5J0Zx9QXeCBZyAg2VroPy489dnx14vWKJbfizoD+Ei8jJchQ7xU/u/2IJIJKQzjWuuO5sfFTrAwjnE4K3UTTGJMRHtCOpRILqv10dvAEHVmlj8JI2ZIGzdSfEykWWo9FYDsFNkO96E3F/7xOYsJLP2UyTgyVZL4oTDgyEZp+j/pMUWL42BJMFLO3IjLEChNjMyrYELzFl/+S5lnFO69Ub6ul2mkWRx4O4BDK4MEF1OAG6tAAAgKe4AVeHeU8O2/O+7w152Qz+/ALzsc3QkOPVw==</latexit>

f (xi ) = +1 Labeled data +,


C+ C-
<latexit sha1_base64="mkPZTEOGd0peiZA9YvD/oq0UsyA=">AAAB8HicbVBNSwMxEJ2tX7V+VT16CRahopRdKepFKHjxWMF+SLuUbJptQ5PskmTFsvRXePGgiFd/jjf/jWm7B60+GHi8N8PMvCDmTBvX/XJyS8srq2v59cLG5tb2TnF3r6mjRBHaIBGPVDvAmnImacMww2k7VhSLgNNWMLqe+q0HqjSL5J0Zx9QXeCBZyAg2VroPy489dnx14vWKJbfizoD+Ei8jJchQ7xU/u/2IJIJKQzjWuuO5sfFTrAwjnE4K3UTTGJMRHtCOpRILqv10dvAEHVmlj8JI2ZIGzdSfEykWWo9FYDsFNkO96E3F/7xOYsJLP2UyTgyVZL4oTDgyEZp+j/pMUWL42BJMFLO3IjLEChNjMyrYELzFl/+S5lnFO69Ub6ul2mkWRx4O4BDK4MEF1OAG6tAAAgKe4AVeHeU8O2/O+7w152Qz+/ALzsc3QkOPVw==</latexit>

<latexit sha1_base64="84b001QnW8g9Egp0fO7ziLRMp/I=">AAAB6nicbVBNS8NAEJ34WetX1aOXxSJ4kJJIUY8FLx4r2g9oQ9lsJ+3SzSbsbsQS+hO8eFDEq7/Im//GbZuDtj4YeLw3w8y8IBFcG9f9dlZW19Y3Ngtbxe2d3b390sFhU8epYthgsYhVO6AaBZfYMNwIbCcKaRQIbAWjm6nfekSleSwfzDhBP6IDyUPOqLHS/VOP90plt+LOQJaJl5My5Kj3Sl/dfszSCKVhgmrd8dzE+BlVhjOBk2I31ZhQNqID7FgqaYTaz2anTsipVfokjJUtachM/T2R0UjrcRTYzoiaoV70puJ/Xic14bWfcZmkBiWbLwpTQUxMpn+TPlfIjBhbQpni9lbChlRRZmw6RRuCt/jyMmleVLzLSvWuWq6d53EU4BhO4Aw8uIIa3EIdGsBgAM/wCm+OcF6cd+dj3rri5DNH8AfO5w9c4I3L</latexit>

xi Unlabeled data o f (xi ) = +1 <latexit sha1_base64="84b001QnW8g9Egp0fO7ziLRMp/I=">AAAB6nicbVBNS8NAEJ34WetX1aOXxSJ4kJJIUY8FLx4r2g9oQ9lsJ+3SzSbsbsQS+hO8eFDEq7/Im//GbZuDtj4YeLw3w8y8IBFcG9f9dlZW19Y3Ngtbxe2d3b390sFhU8epYthgsYhVO6AaBZfYMNwIbCcKaRQIbAWjm6nfekSleSwfzDhBP6IDyUPOqLHS/VOP90plt+LOQJaJl5My5Kj3Sl/dfszSCKVhgmrd8dzE+BlVhjOBk2I31ZhQNqID7FgqaYTaz2anTsipVfokjJUtachM/T2R0UjrcRTYzoiaoV70puJ/Xic14bWfcZmkBiWbLwpTQUxMpn+TPlfIjBhbQpni9lbChlRRZmw6RRuCt/jyMmleVLzLSvWuWq6d53EU4BhO4Aw8uIIa3EIdGsBgAM/wCm+OcF6cd+dj3rri5DNH8AfO5w9c4I3L</latexit>

xi
<latexit sha1_base64="RD1LM+QIjrmwdKnHIXTPNf0ZbPA=">AAAB8HicbVBNSwMxEJ2tX7V+VT16CRahgpZdKepFKHjxWMF+SLuUbJptQ5PskmTFsvRXePGgiFd/jjf/jWm7B60+GHi8N8PMvCDmTBvX/XJyS8srq2v59cLG5tb2TnF3r6mjRBHaIBGPVDvAmnImacMww2k7VhSLgNNWMLqe+q0HqjSL5J0Zx9QXeCBZyAg2VroPy489dnx16vWKJbfizoD+Ei8jJchQ7xU/u/2IJIJKQzjWuuO5sfFTrAwjnE4K3UTTGJMRHtCOpRILqv10dvAEHVmlj8JI2ZIGzdSfEykWWo9FYDsFNkO96E3F/7xOYsJLP2UyTgyVZL4oTDgyEZp+j/pMUWL42BJMFLO3IjLEChNjMyrYELzFl/+S5lnFO69Ub6ul2kkWRx4O4BDK4MEF1OAG6tAAAgKe4AVeHeU8O2/O+7w152Qz+/ALzsc3RU2PWQ==</latexit>

f (xi ) ⇡ +1 xi f (xi ) = 1
<latexit sha1_base64="aEZKs2PkR6bQQ2gX26PHPaRkzL8=">AAAB+XicbVDLSgMxFM3UV62vUZdugkWoKGVGirosuHFZwT6gHYZMmmlDM0lIMqVl6J+4caGIW//EnX9j2s5CqwcuHM65l3vviSSj2njel1NYW9/Y3Cpul3Z29/YP3MOjlhapwqSJBROqEyFNGOWkaahhpCMVQUnESDsa3c399pgoTQV/NFNJggQNOI0pRsZKoevGlUlIz3tISiUm8MIP3bJX9RaAf4mfkzLI0Qjdz15f4DQh3GCGtO76njRBhpShmJFZqZdqIhEeoQHpWspRQnSQLS6fwTOr9GEslC1u4EL9OZGhROtpEtnOBJmhXvXm4n9eNzXxbZBRLlNDOF4uilMGjYDzGGCfKoINm1qCsKL2VoiHSCFsbFglG4K/+vJf0rqq+tfV2kOtXL/M4yiCE3AKKsAHN6AO7kEDNAEGY/AEXsCrkznPzpvzvmwtOPnMMfgF5+MbLQSSpw==</latexit>

f (xi ) ⇡ +1
<latexit sha1_base64="aEZKs2PkR6bQQ2gX26PHPaRkzL8=">AAAB+XicbVDLSgMxFM3UV62vUZdugkWoKGVGirosuHFZwT6gHYZMmmlDM0lIMqVl6J+4caGIW//EnX9j2s5CqwcuHM65l3vviSSj2njel1NYW9/Y3Cpul3Z29/YP3MOjlhapwqSJBROqEyFNGOWkaahhpCMVQUnESDsa3c399pgoTQV/NFNJggQNOI0pRsZKoevGlUlIz3tISiUm8MIP3bJX9RaAf4mfkzLI0Qjdz15f4DQh3GCGtO76njRBhpShmJFZqZdqIhEeoQHpWspRQnSQLS6fwTOr9GEslC1u4EL9OZGhROtpEtnOBJmhXvXm4n9eNzXxbZBRLlNDOF4uilMGjYDzGGCfKoINm1qCsKL2VoiHSCFsbFglG4K/+vJf0rqq+tfV2kOtXL/M4yiCE3AKKsAHN6AO7kEDNAEGY/AEXsCrkznPzpvzvmwtOPnMMfgF5+MbLQSSpw==</latexit>

<latexit sha1_base64="84b001QnW8g9Egp0fO7ziLRMp/I=">AAAB6nicbVBNS8NAEJ34WetX1aOXxSJ4kJJIUY8FLx4r2g9oQ9lsJ+3SzSbsbsQS+hO8eFDEq7/Im//GbZuDtj4YeLw3w8y8IBFcG9f9dlZW19Y3Ngtbxe2d3b390sFhU8epYthgsYhVO6AaBZfYMNwIbCcKaRQIbAWjm6nfekSleSwfzDhBP6IDyUPOqLHS/VOP90plt+LOQJaJl5My5Kj3Sl/dfszSCKVhgmrd8dzE+BlVhjOBk2I31ZhQNqID7FgqaYTaz2anTsipVfokjJUtachM/T2R0UjrcRTYzoiaoV70puJ/Xic14bWfcZmkBiWbLwpTQUxMpn+TPlfIjBhbQpni9lbChlRRZmw6RRuCt/jyMmleVLzLSvWuWq6d53EU4BhO4Aw8uIIa3EIdGsBgAM/wCm+OcF6cd+dj3rri5DNH8AfO5w9c4I3L</latexit>

<latexit sha1_base64="RD1LM+QIjrmwdKnHIXTPNf0ZbPA=">AAAB8HicbVBNSwMxEJ2tX7V+VT16CRahgpZdKepFKHjxWMF+SLuUbJptQ5PskmTFsvRXePGgiFd/jjf/jWm7B60+GHi8N8PMvCDmTBvX/XJyS8srq2v59cLG5tb2TnF3r6mjRBHaIBGPVDvAmnImacMww2k7VhSLgNNWMLqe+q0HqjSL5J0Zx9QXeCBZyAg2VroPy489dnx16vWKJbfizoD+Ei8jJchQ7xU/u/2IJIJKQzjWuuO5sfFTrAwjnE4K3UTTGJMRHtCOpRILqv10dvAEHVmlj8JI2ZIGzdSfEykWWo9FYDsFNkO96E3F/7xOYsJLP2UyTgyVZL4oTDgyEZp+j/pMUWL42BJMFLO3IjLEChNjMyrYELzFl/+S5lnFO69Ub6ul2kkWRx4O4BDK4MEF1OAG6tAAAgKe4AVeHeU8O2/O+7w152Qz+/ALzsc3RU2PWQ==</latexit>

f (xi ) = 1
f (xi ) ⇡
<latexit sha1_base64="XqEhnZ9jse7hqgikKwfTR12Rn+k=">AAAB+XicbVDLSgMxFM3UV62vUZdugkWooGVGirosuHFZwT6gHYZMmmlDM0lIMqVl6J+4caGIW//EnX9j2s5CqwcuHM65l3vviSSj2njel1NYW9/Y3Cpul3Z29/YP3MOjlhapwqSJBROqEyFNGOWkaahhpCMVQUnESDsa3c399pgoTQV/NFNJggQNOI0pRsZKoevGlUlIz3tISiUm8NIP3bJX9RaAf4mfkzLI0Qjdz15f4DQh3GCGtO76njRBhpShmJFZqZdqIhEeoQHpWspRQnSQLS6fwTOr9GEslC1u4EL9OZGhROtpEtnOBJmhXvXm4n9eNzXxbZBRLlNDOF4uilMGjYDzGGCfKoINm1qCsKL2VoiHSCFsbFglG4K/+vJf0rqq+tfV2kOtXL/I4yiCE3AKKsAHN6AO7kEDNAEGY/AEXsCrkznPzpvzvmwtOPnMMfgF5+MbMA6SqQ==</latexit>

1
f (xi ) ⇡
<latexit sha1_base64="XqEhnZ9jse7hqgikKwfTR12Rn+k=">AAAB+XicbVDLSgMxFM3UV62vUZdugkWooGVGirosuHFZwT6gHYZMmmlDM0lIMqVl6J+4caGIW//EnX9j2s5CqwcuHM65l3vviSSj2njel1NYW9/Y3Cpul3Z29/YP3MOjlhapwqSJBROqEyFNGOWkaahhpCMVQUnESDsa3c399pgoTQV/NFNJggQNOI0pRsZKoevGlUlIz3tISiUm8NIP3bJX9RaAf4mfkzLI0Qjdz15f4DQh3GCGtO76njRBhpShmJFZqZdqIhEeoQHpWspRQnSQLS6fwTOr9GEslC1u4EL9OZGhROtpEtnOBJmhXvXm4n9eNzXxbZBRLlNDOF4uilMGjYDzGGCfKoINm1qCsKL2VoiHSCFsbFglG4K/+vJf0rqq+tfV2kOtXL/I4yiCE3AKKsAHN6AO7kEDNAEGY/AEXsCrkznPzpvzvmwtOPnMMfgF5+MbMA6SqQ==</latexit>

Manifold M is represented by a Graph G=(V,E,A)


k-NN graph of the data points. k-NN graph

Xavier Bresson 46
47

Graph regularization

Graph regularization is usually implemented through loss minimization techniques.


A widely used regularization loss is the Dirichlet energy[1], which is defined as:
Z<latexit sha1_base64="dlesrqzaiPfUOXoxunAuRckaJOg=">AAADunicjVJNb9NAEHUTPkr4SuHIZUQFSqQkxKFQDlRqRQ4gBamgJq0UJ9Z6PXa2Xa/N7rpK5Pg/Im78G9ZpWiUpSIy00ts3s/tmnsZLOFO63f69VSrfuXvv/vaDysNHj588re48G6g4lRT7NOaxPPOIQs4E9jXTHM8SiSTyOJ56F5+K/OklSsVicaJnCY4iEgoWMEq0odydrZ+vwWFCu05E9IQSnn3NwYG5I4jHCQTzcQeciUoIxexty6aRyWqcahllUKOx0EykcaqgyySjE44aUKAMZ3UwhU7FfP4jJT44JElkPAVHpZGbsXMjCYPcgaPiksMcgtrUZXVoLsB5HRa6K1I+U1Sixv8VCsYn0IPA6CwG87zse94wP64zY9GA3sG1ype82R1nTftNJz+6BuvlmXA0i1CBWLWhRxJOKCMCTKVk082Wbux7V9gH3RtBn5Ewr/n1f4o0wD84sl2x0fSqLxhKRLhEqmNphCsVt7rbbrUXAbeBvQS71jKO3eovx49pGqHQlBOlhnY70aOMSM0ox7zipArNABckxKGBgpjeRtli9XJ4ZRgfgliaIzQs2NUXGYmUmkWeqSwGUJu5gvxbbpjq4MMoYyJJNQp6JRSkHHQMxR6Dz6SZmc8MIFQy0yvQCZGEarPthQn25si3waDTst+39r7t7R52lnZsWy+sl1bNsq1969D6bB1bfYuW9kujUlAKyx/LXpmVL65KS1vLN8+ttSjrPzZrMC4=</latexit>

|rf |2 (continuous Dirichlet energy)


M
X
⇡ Aij |f (xi ) f (xj )|2 (discrete Dirichlet energy)
ij2V

⇡ f T Lf 2 R, f 2 Rn , L = I D 1/2
AD 1/2
2 Rn⇥n (Laplacian matrix)
D = diag(d) 2 Rn⇥n , d = A1n 2 Rn (degree vector)

Minimizing the Dirichlet energy enforces the smoothness of the function on the graph domain,
i.e. 𝑓 𝑥" ≈ 𝑓 𝑥' for 𝑗 ∈ 𝒩" , ensuring that the function values at neighboring data points are
similar.

[1] Belkin, Niyogi, Laplacian Eigenmaps and Spectral Techniques for Embedding and Clustering, 2001

Xavier Bresson 47
48

Semi-supervised classification with graphs

SSC optimization problem with graph smoothness :


<latexit sha1_base64="c/V/Tsfo/WqA+KKE1SBT4q+8oFo=">AAACfXicbVFNa9wwEJWdfqTbr2177GXokpI0YbGX0PQSCPQSaAspdJPAaiPGsrwRkWQjyaWL1/+iv6y3/pVeEnmzlCbpgODxZt7M6E1WKel8kvyO4rV79x88XH/Ue/zk6bPn/Rcvj11ZWy7GvFSlPc3QCSWNGHvplTitrECdKXGSXXzs8iffhXWyNN/8vBJTjTMjC8nRB4r1f74FqqVhTUGloRr9OUfVHLbsU0uBLgq6YM1N+mwE20BVGJEjUFdr1sj9tD0z8JlRL354q5scPbabBZM7VCjF5BZs0xlqHQTSePa34ZcWKCyowUwhFIvQGijt9Vh/kAyTZcBdkK7AgKziiPV/0bzktRbGc4XOTdKk8tMGrZdcibZHaycq5Bc4E5MADWrhps3SvRY2ApNDUdrwjIcl+6+iQe3cXGehstvb3c515P9yk9oXH6aNNFXtheHXg4pagS+hOwXk0gru1TwA5FaGXYGfo0Xuw8E6E9LbX74LjkfD9P1w9+vu4GC0smOdvCZvyCZJyR45IIfkiIwJJ38iiLaid9FlvBHvxMPr0jhaaV6RGxHvXQHWscC5</latexit>

n
X Z
min kf k2HK + Ldata (fi , `i ) + |rf |2
f 2HK M
i=1

Graph SVM[1] :
<latexit sha1_base64="RNX9qOo4q5hUCgJNDYjjnvNX9yM=">AAAC03icbVFdaxNBFJ1dv2r8ivroy8WgJKSEbCkqQqHgSyB9iNK0hUy6zE7uJkNnZpeZ2ZqwLIr46p/zzb/gr3A2DWJaLwwczj1z77n3JrkU1vX7v4Lw1u07d+/t3G88ePjo8ZPm02cnNisMxzHPZGbOEmZRCo1jJ5zEs9wgU4nE0+TiQ50/vURjRaaP3SrHqWJzLVLBmfNU3Pz9GqgSOi5TKjRVzC04k+WgiocVhfT8eJgCdIFKX3HGgNpCxaU4iKpzDUcxdbh0RpUDoat2GotdilLGogNdOmdKsboAHIEvQWnDN9rIPwu3gGqb+4TetkXt0IBbYGZQwXsvStvLDhz8lVkx1xVNxLy9ZYUymS9YLGDYXu4uvYNuUos69Uy5gsj3AoibrX6vvw64CaINaJFNjOLmTzrLeKG8Ky6ZtZOon7tpyYwTXGLVoIXFnPELNseJh5optNNyfZMKXnlmBmlm/NMO1uy/P0qmrF2pxCvrrdvruZr8X25SuPTdtBQ6LxxqftUoLSS4DOoDw0wY5E6uPGDcCO8V+IIZxv1qbcMvIbo+8k1wsteL3vT2P+63Dvc269ghL8hL0iYReUsOyYCMyJjwYBRcBl+Cr+E4LMNv4fcraRhs/jwnWxH++AN8ZN4T</latexit>

n
X
min f T Kf + LHin (fi , `i ) + f T Lf
f 2HK
i=1
with
n
X
Representer theorem : f (x) = sign ↵i K(x, xi ) + b 2 ±1
i=1
Misha Belkin

[1] Belkin, Niyogi, Sindhwani,Manifold regularization: A geometric framework for learning from labeled and unlabeled examples, 2006

Xavier Bresson 48
49

Optimization algorithm

Dual optimization problem :


n
X
<latexit sha1_base64="QMQjEJr7TcZxKObf7bUxvA0i1n8=">AAAEYnicfZNbb9MwFMeztoNRYBf2CA9HTKBW26pmmgAhVZrgpaiV2MZu0txGjuu01mwnix3WYvlL8sYTL3wQnDYduwlLkY7O+ftcfj4JE86UbjZ/LZTKlcVHj5eeVJ8+e768srr24kTFWUroMYl5nJ6FWFHOJD3WTHN6lqQUi5DT0/Dicx4//U5TxWJ5pCcJ7Qk8lCxiBGvnCtZK47eABJOBiRCTSGA9Ipibtg06FkHUP+pEAJuAuEs5wIBUJgLDWr7tS+gGSNOxToVpM2lrUcC2EOU8YHXYREMsBM4TQBdcCoSqrlAhv2J6BLbwXWZ4cB05pK57RaWmKegRjVMq4KOTRrVxHVrXMsWG0qJPbFi73REas4D1kdI4hU5tvDV2vWyGkCvr+XiJAP+hsl8TzQT7MWUCSRo7erO6CPNkhGcZWwinwxmrpgNCL4toYReE3B1AUYqJ8a3Zuc5wBAdz+fY/nx/IXF50AaqhG3DjSk7TTd0sWh6pBBNqdhs+EfYuzINWt92pzZ1f7PwFup1632z7tt2F+QOHoTm0fSORG5oqkEWu6TKZqxHT1JopMgvVh2h9i3k2JZUTypkXxFvw3/o3Wbq0wepGs9GcHrhv+IWx4RVnP1j9iQYxyYTbDsKxUud+M9E9g1PNCKe2ijJFHaALPKTnzpTYDdcz06ksvHGeAURx6j6pYeq9ecNgodREhE6ZI1J3Y7nzodh5pqMPPcNkkmkqyaxQlHHQMeT/GwxYSonmE2dgkjLXK5ARdvvhVlxVHQT/7sj3jZOdhv+usXuwu7G3U+BY8l56r72a53vvvT2v7e17xx4p/S4vlpfLK+U/lWplrbI+k5YWijvr3q1TefUX339n9w==</latexit>

min f T Kf + LHin (fi , `i ) + f T Lf


f 2HK
i=1
with
⇣X
n ⌘
Representer theorem : f (x) = sign ⇠i? K(x, xi ) + b 2 ±1
i=1
1 T
Optimization problem : ↵? = arg min ↵T 1n s.t. ↵T ` = 0
↵ Q↵
0↵ 2

1 n⇥n
with Q = LHK(I + LK) HL 2 R
Solution : ⇠ ? = (I + LK) 1
HL↵?

Xavier Bresson 49
50

Lab 4 : Graph SVM

Run code04.ipynb and analyze Graph SVM result on


Noisy non-linear data points
Real-world text documents

Kernel SVM

Graph SVM
Xavier Bresson 50
51

Outline

Supervised classification

Linear SVM

Soft-margin SVM

Kernel techniques

Non-linear/kernel SVM

Graph SVM

Conclusion

Xavier Bresson 51
52

History of SVM techniques

Rd
<latexit sha1_base64="Yb/z2pnwgpGBQdVCOsOzQaFOQYo=">AAAB83icbVDLSsNAFL2pr1pfVZduBovgqiSlVJcFNy6r2Ac0sUwmk3boZBJmJkIJ/Q03LhRx68+482+ctFlo64GBwzn3cs8cP+FMadv+tkobm1vbO+Xdyt7+weFR9fikp+JUEtolMY/lwMeKciZoVzPN6SCRFEc+p31/epP7/ScqFYvFg54l1IvwWLCQEayN5LoR1hPfz+7nj8GoWrPr9gJonTgFqUGBzqj65QYxSSMqNOFYqaFjJ9rLsNSMcDqvuKmiCSZTPKZDQwWOqPKyReY5ujBKgMJYmic0Wqi/NzIcKTWLfDOZZ1SrXi7+5w1THV57GRNJqqkgy0NhypGOUV4ACpikRPOZIZhIZrIiMsESE21qqpgSnNUvr5Neo+606s27Zq3dKOoowxmcwyU4cAVtuIUOdIFAAs/wCm9War1Y79bHcrRkFTun8AfW5w80F5HB</latexit>

d
R
<latexit sha1_base64="Yb/z2pnwgpGBQdVCOsOzQaFOQYo=">AAAB83icbVDLSsNAFL2pr1pfVZduBovgqiSlVJcFNy6r2Ac0sUwmk3boZBJmJkIJ/Q03LhRx68+482+ctFlo64GBwzn3cs8cP+FMadv+tkobm1vbO+Xdyt7+weFR9fikp+JUEtolMY/lwMeKciZoVzPN6SCRFEc+p31/epP7/ScqFYvFg54l1IvwWLCQEayN5LoR1hPfz+7nj8GoWrPr9gJonTgFqUGBzqj65QYxSSMqNOFYqaFjJ9rLsNSMcDqvuKmiCSZTPKZDQwWOqPKyReY5ujBKgMJYmic0Wqi/NzIcKTWLfDOZZ1SrXi7+5w1THV57GRNJqqkgy0NhypGOUV4ACpikRPOZIZhIZrIiMsESE21qqpgSnNUvr5Neo+606s27Zq3dKOoowxmcwyU4cAVtuIUOdIFAAs/wCm9War1Y79bHcrRkFTun8AfW5w80F5HB</latexit>

d d
R
<latexit sha1_base64="Yb/z2pnwgpGBQdVCOsOzQaFOQYo=">AAAB83icbVDLSsNAFL2pr1pfVZduBovgqiSlVJcFNy6r2Ac0sUwmk3boZBJmJkIJ/Q03LhRx68+482+ctFlo64GBwzn3cs8cP+FMadv+tkobm1vbO+Xdyt7+weFR9fikp+JUEtolMY/lwMeKciZoVzPN6SCRFEc+p31/epP7/ScqFYvFg54l1IvwWLCQEayN5LoR1hPfz+7nj8GoWrPr9gJonTgFqUGBzqj65QYxSSMqNOFYqaFjJ9rLsNSMcDqvuKmiCSZTPKZDQwWOqPKyReY5ujBKgMJYmic0Wqi/NzIcKTWLfDOZZ1SrXi7+5w1THV57GRNJqqkgy0NhypGOUV4ACpikRPOZIZhIZrIiMsESE21qqpgSnNUvr5Neo+606s27Zq3dKOoowxmcwyU4cAVtuIUOdIFAAs/wCm9War1Y79bHcrRkFTun8AfW5w80F5HB</latexit>

R
<latexit sha1_base64="Yb/z2pnwgpGBQdVCOsOzQaFOQYo=">AAAB83icbVDLSsNAFL2pr1pfVZduBovgqiSlVJcFNy6r2Ac0sUwmk3boZBJmJkIJ/Q03LhRx68+482+ctFlo64GBwzn3cs8cP+FMadv+tkobm1vbO+Xdyt7+weFR9fikp+JUEtolMY/lwMeKciZoVzPN6SCRFEc+p31/epP7/ScqFYvFg54l1IvwWLCQEayN5LoR1hPfz+7nj8GoWrPr9gJonTgFqUGBzqj65QYxSSMqNOFYqaFjJ9rLsNSMcDqvuKmiCSZTPKZDQwWOqPKyReY5ujBKgMJYmic0Wqi/NzIcKTWLfDOZZ1SrXi7+5w1THV57GRNJqqkgy0NhypGOUV4ACpikRPOZIZhIZrIiMsESE21qqpgSnNUvr5Neo+606s27Zq3dKOoowxmcwyU4cAVtuIUOdIFAAs/wCm9War1Y79bHcrRkFTun8AfW5w80F5HB</latexit>

C+ C- C+ C-
Margin

C+ C- C+

Margin

C-

Linear SVM[1] Soft-Margin SVM[2] Non-Linear/Kernel SVM[3] Graph SVM[4]


Supervised learning Supervised learning Supervised learning Semi-supervised learning

[1] Vapnik, Chervonenkis, On a perceptron class, 1964


[2] Cortes, Vapnik, Support-vector networks, 1995
[3] Boser, Guyon, Vapnik, A training algorithm for optimal margin classifiers, 1992
[4] Belkin, Niyogi, Sindhwani,Manifold regularization: A geometric framework for learning from labeled and unlabeled examples, 2006

Xavier Bresson 52
53

Summary

General class of semi-supervised optimization techniques :


n
X
<latexit sha1_base64="YZQsH76hoCrvhLXjGaADzAlJ04M=">AAAGNXicjVRLb9QwEE5hF0p4tXDkMqIFtSpaNqvylCpVgESlFlSgL2m9G3kdJ7HqOIvttCxp/hQX/gcnOHAAIa78BZxsut3dPlQfrMnMeOab74vd6XKmdL3+Y+LCxUr10uXJK/bVa9dv3JyavrWl4kQSukliHsudDlaUM0E3NdOc7nQlxVGH0+3O7ss8vr1HpWKx2NC9Lm1FOBDMZwRr43KnK6v3AUVMuKmPmEAR1iHBPF3J3NUMATrw0YGbjrrbDVgAxE0PDwNSSeSmbMnJ2gLWXKTpJy2j1MMaZ3O+yx4gyrnL5mEBBTiK8FBOIHE3NEnzAAjZBkbp3w+ppJCVzo8J9gaht7GMIPZh1p8FJuD96soHeJ6nnoxzCfz2xqoPaFAA5lQUxzoUVKmHkgYJx5LpXllzPm8Kx7q+YYpwrNSANqBSxrLofNbIgELVxYSm9dojEmVDILTETDARgNHKYySvOQ9lQqGpIYFpmqXoBQtOQfU6pw8OZ/jcBzYKaZjhEoozCuSIjUNaTZHiGHhxZECeB9ahbkyHo7KNkQNLYCNOfY1SG3VowESKpcS9LOU8sw9TVwwvNDNV1txGsTtmHwSTDpXZkGMtDswlYaTwUeGVFW0kWRDqmmnYJ+1oZiy8HGUfoj1O1vkwvmKSkZBT3f/7BjI/6+t8gATucHM5iBdr89lunEbh0CQbscYc9oyWhZSz7bTptLLZ8v8eq2hIOV2UQcltvEcNSHUmygEB+3gvOwdkOI1p23anZuq1erHguOGUxoxVrnV36hvyYpJEVOjihjWdele3TGWjKKemdqKoAb2LA9o0psARVa20AJXBPePxwDcX0Y+FhsI7fCLFkVK9qGMy83dBjcdy50mxZqL9p62UiW6iqSD9Rn7CQceQP6HgMUmJ5j1jYGKeD0aAhFhios1Dm5PgjI983Nhq1JzHtcV3izPLjZKOSeuOddeasxzribVsrVjr1qZFKl8q3yu/Kr+rX6s/q3+qf/upFybKM7etkVX99x/RaSAa</latexit>

min kf k2HK + Ldata (fi , `i ) + Lgraph (f )


f 2HK
i=1
where
Norm of f in RKHS : kf k2HK = f T Kf (smoothness/regularity of f )

Misclassification error : Ldata (fi , `i ) (training prediction)

Graph regularization : Lgraph (f ) (smoothness of f on graph domain)
with
8 8 ⌘
>
> Hinge >
>
> >
> Dirichlet : kr · k2
< L2 < ⌘
Ldata = L1 and Lgraph = Total variation[1] : kr · k1
>
> >
> ⌘
>
> Huber >
: Wavelets :
: krwav · k2
Logistic

[1] Bresson, Zhang, TV-SVM: Total variation support vector machine for semi-supervised data classification, 2012

Xavier Bresson 53
54

Questions?

Xavier Bresson 54

You might also like