Professional Documents
Culture Documents
(Pattern Classification)
Support Vector Machin (SVM)
Hypothesis set and Algorithm
Second Edition
Recall from Chapter 1
True Error Bound and linear
• Linear in input space:
𝑎𝑟𝑔𝑚𝑖𝑛 ( ^
𝑅𝑆 ( h ) + 𝜆 ℛ ( h) ) =𝑎𝑟𝑔𝑚𝑖𝑛 𝐿(𝑊 )
h∈H h∈H
^
𝑅𝑆 ( h ) 𝐶𝑜𝑚𝑝𝑙𝑒𝑥𝑖𝑡𝑦 ( 𝑅𝑒𝑔𝑢𝑙𝑎𝑧𝑖𝑛𝑔 ) 𝑇𝑒𝑟𝑚
𝑚
1
𝐿 ( 𝑊 )= ∑ [ 𝐿¿ ¿𝑖 (h ( 𝑥 𝑖 ,𝑊 ) , 𝑦 𝑖 )]+ 𝜆 ℛ (𝑊 )¿
𝑚 𝑖=1 2
ℛ ( 𝑊 )=‖𝑊 ‖2=𝑤 21 +𝑤22 +𝑤 23+ …+𝑤 2𝑁
(5.2)
• Consistent case: can be learned such that for all training examples
Φ 𝜌 ( 𝜌 h (𝑥𝑖 , 𝑦 𝑖 ) )=𝑚𝑎𝑥 (0 ,1 − 𝑦 𝑖 h ( 𝑥𝑖 ) )
1−
3 𝑦
𝑖 h(
𝑥
Hinge loss function: 𝑖 )
2
1 𝛼 ¿
𝑖
𝑦 𝑖 h ( 𝑥𝑖 )=𝑠 𝑖
-3 -2 -1 0 1
Weighted regularizer
𝑚
min 𝐿 ( 𝑊 )=‖𝒘‖ + ∑ 𝛼 𝑖 ¿ ¿
2
𝜶=[ 𝛼 1 … 𝛼𝑖 … 𝛼𝑚 ]
2
𝑤 , 𝑏,𝜶 𝑖=1
hinge loss: ,
𝑚
1
ℒ ( 𝒘 ,𝑏 , 𝜶 )= ‖𝒘‖ + ∑ 𝛼 𝑖 ¿ ¿
2
2 𝑖=1
upper bound of true risk
(Lagrangian function) Lagrange variables
• The solution and for the dual problem min ℒ ( 𝒘 ,𝑏is, 𝜶the
) solution
for the primal (5.7) 𝑤 , 𝑏,𝜶
𝑚 𝑚
∇ 𝑏 ℒ =− ∑ 𝛼𝑖 𝑦 𝑖=0 → ∑ 𝛼𝑖 𝑦 𝑖=0 (5.10)
𝑖=1 𝑖=1
‖∑ ‖
𝑚 2 𝑚 𝑚 𝑚 𝑚
1
ℒ (𝑤 ,𝑏, 𝛼 )= ℒ 𝑑𝑢𝑎𝑙 (𝜶)= 𝛼 𝑖 𝑦 𝑖 𝑥𝑖 − ∑ ∑ 𝛼 𝑖 𝛼 𝑗 𝑦 𝑖 𝑦 𝑗 ( 𝒙𝑖 ∙ 𝒙 𝑗 ) − ∑ 𝛼𝑖 𝑦 𝑖 𝑏+ ∑ 𝛼 𝑖 (5.12)
2 𝑖=1 𝑖=1 𝑗=1 𝑖=1 𝑖=1
𝑚 𝑚
1
− ∑ ∑ 𝛼𝑖 𝛼 𝑗 𝑦 𝑖 𝑦 𝑗 ( 𝒙 𝑖 ∙ 𝒙 𝑗 )
0
2 𝑖 =1 𝑗=1
• which simplifies to
𝑚 𝑚 𝑚
1
ℒ 𝑑𝑢𝑎𝑙 (𝜶)=− ∑ ∑ 𝛼𝑖 𝛼 𝑗 𝑦 𝑖 𝑦 𝑗 ( 𝒙 𝑖 ∙𝒙 𝑗 ) + ∑ 𝛼 𝑖 (5.13)
2 𝑖=1 𝑗=1 𝑖=1
subject to:
(∑ )
𝑚
h ( 𝑥 )= 𝑠𝑔𝑛 ( 𝒘 ∙ 𝒙 + 𝑏 )=𝑠𝑔𝑛 𝛼 𝑖 𝑦𝑖 ( 𝒙 𝑖 ∙ 𝒙 )+ 𝑏 (5.15)
𝑖=1
𝑚
𝒘 =∑ 𝛼 𝑖 𝑦 𝑖 𝒙 𝑖(5.9)
𝑖=1
• Since support vectors lie on marginal hyperplanes, for any support
vector , ·, and thus can be obtained via
𝑚
𝑏= 𝑦 𝑗 − ∑ 𝛼 𝑖 𝑦 𝑖 ( 𝒙 𝑖 ∙ 𝒙 𝑗 ) (5.16)
𝑖=1
𝑚
‖𝒘 ‖ = ∑ 𝛼 𝑖=‖𝜶‖1(5.19)
2
2
𝑖= 1
^ 𝑁 𝑆𝑉 (𝑆)
Leave-One-Out error 𝑅 𝐿𝑂𝑂 (𝑆𝑉𝑀 ) ≤
𝑚+1
Training set:
·
03/18/2024 Pattern Recognition-SVM, School of Computer Engineering, IUS 30
T, Morteza Analoui
Which one is the best ?
• Answer: To keep upper bound of true risk as low as possible, for a
given and , we are looking for maximum “-margin in loss function
while empirical error is zero”
0
h ( 𝒙 )=𝒘 ∙ 𝒙 +𝑏=0
𝑦 =+1
𝑦 =−1 h ( 𝑥 𝑖 ) ≥ 0 , 𝑦 𝑖 h( 𝑥 𝑖) ≥ 0
h ( 𝑥 𝑖 ) ≤ 0 , 𝑦 𝑖 h( 𝑥 𝑖) ≥ 0
Score of 𝑥𝑖 =𝑠𝑖 =𝑦 𝑖 h ( 𝑥𝑖 )
h ( 𝒙 )=𝒘 ∙ 𝒙 +𝑏=0 𝑥𝑖 , 𝑦 𝑖 𝑦 𝑖 h( 𝑥 𝑖) ≥ 0
‖𝒘‖2 =√ 𝑤 +𝑤 +…+𝑤
2
1
2
2
2
𝑁
𝑦 𝑖 h( 𝑥 𝑖) ≥ 0
() = Score of 𝑥𝑖 =𝑠𝑖 =𝑦 𝑖 h ( 𝑥𝑖 )
h ( 𝑥 )=0
geometric margin of : =
2 𝜌h
h ( 𝑥 𝑖 ) ≥ 0 , 𝑦 𝑖=+ 1
𝑥𝑖 𝑥𝑖 𝑦 𝑖 ( h ( 𝑥𝑖 ) −1) 1
𝜌 𝑖 ,+ 1= =𝜌 𝑖 −
‖𝑤‖2 ‖𝑤‖2
2
2 𝜌h = h ( 𝑥 )=0
‖𝑤‖2
𝑦 𝑖 (h ( 𝑥 𝑖 ) +1) 1
𝜌 𝑖 ,−1= =𝜌 𝑖 +
‖𝑤‖2 ‖𝑤‖2
h ( 𝑥 𝑖 ) ≥ 0 , 𝑦 𝑖=+ 1 2
𝜌 𝑖 ,−1 − 𝜌 𝑖 ,+ 1= =2 𝜌 h
‖𝑤‖2
03/18/2024 Pattern Recognition-SVM, School of Computer Engineering, IUS 35
T, Morteza Analoui
Margin of based on
𝜌𝑖
() = → 𝜌h
=𝑦 𝑖 h ( 𝑥𝑖 )=𝑠 𝑖
h ( 𝑥 )=0 𝜌𝑖
𝑥𝑖
2
2 𝜌h =
‖𝑤‖2
• It means: max =
𝜌𝑖
zero-one loss
𝑥𝑖 hinge loss: )
𝜌𝑗 𝑥 quadratic hinge )2
𝑗
2
2 𝜌h = h ( 𝑥 )=0
‖𝑤‖2
𝜌𝑖
−2 𝜌 h − 𝜌 h 𝜌 =𝜌 h
𝜌 𝑗 𝜌𝑖
𝜌 𝑗≥ 𝜌 h→ ≥1 𝜌𝑖 ≥ 𝜌h → ≥1
𝜌h 𝜌h
No training margin loss in -1 examples No training margin loss in +1 examples
𝜌𝑖
𝜌h 𝜌𝑖 = 𝑦 h ( 𝑥𝑖 )= 𝑠 𝑖
−2 𝜌 h − 𝜌 h 1 𝜌h 𝑖
𝜌𝑖
= 𝑦 h ( 𝑥𝑖 )= 𝑠 𝑖
𝜌h 𝑖
Figure 5.5 Both hinge loss and quadratic hinge loss provide convex upper bounds on binary zero-one loss.
03/18/2024 Pattern Recognition-SVM, School of Computer Engineering, IUS 39
T, Morteza Analoui
Dual of Algorithm1: Lagrangian function
• Lagrangian function associated to problem (5.48) is upper bound of
true risk ^
𝑅 (h) 𝑆 2 2 2 2
ℛ ( 𝑊 )=‖𝑤‖2=𝑤1 + 𝑤2 +𝑤3 + …+𝑤 2𝑁
𝑚
1
𝑅 (h) ≤ ℒ ( 𝒘 , 𝑏 ) = ∑
𝑚 𝑖 =1
𝑚𝑎𝑥 ¿ ¿ (5.48.1)
Lagrangian function is
upper bound of true risk regularization parameter, Lagrange variable
• The solution and for the dual problem is the solution for the primal
min ℒ ( 𝒘 ,𝑏 )
𝑤,𝑏
03/18/2024 Pattern Recognition-SVM, School of Computer Engineering, IUS 41
T, Morteza Analoui
SVM Primal Algorithm2
• Finding that has maximum geometric margin and no empirical loss
1
m 𝑎𝑥 𝜌 h = (5.7.1)
𝒘 ,𝑏 ‖𝒘‖2
Subject to no empirical loss: Note that:
• Or minimalizing (that is a convex optimization problem and a specific
instance of quadratic programming (QP))
1 2
Minimizing regularizer and no empirical loss:𝑚𝑖𝑛 ‖ ‖
𝒘 2 (5.7)
𝒘 ,𝑏 2
Keeping scores 1 or more: Subject to:
1 2
min ‖𝒘‖2 (5.7)
𝒘 ,𝑏2
Subject to:
2 𝑖=1
Lagrangian function
Lagrange variables
• The solution and for the dual problem is the solution for the primal
min ℒ ( 𝒘 ,𝑏 , 𝜶 )
𝑤 , 𝑏,𝜶
2 1 1 1
𝜌h = = =
• Note that: ‖𝒘 ‖2
2 𝑚
∑ 𝛼𝑖
‖𝜶‖1 (5.19)
𝑖=1
𝑑 ≤ 𝑚𝑖𝑛
([ ] )
𝑅2
𝜌
2
, 𝑁 +1
(5.22)
𝑥𝑗
𝜉′𝑗
h ( 𝑥 ) −1=0
h ( 𝑥 )+ 1=0
Figure 5.4
A separating hyperplane with point classified incorrectly and point correctly classified, but with margin less than 1.
and are outliers ( > 0 and > 0 ).
03/18/2024 Pattern Recognition-SVM, School of Computer Engineering, IUS 49
T, Morteza Analoui
Loss of Inconsistent case
• , =1
• represents loss for based on
𝐿( 𝑦 𝑖 h ( 𝑥 𝑖 ) )=1− 𝑦 𝑖 h ( 𝑥 𝑖 ) ¿ 𝜉 𝑖
h ( 𝑥 )=0 𝜉 ′ 3
𝑖
𝑚
𝑥𝑖
2 ∑ 𝜉𝑖
Total empirical loss=
𝑖 =1
𝑥𝑗 1
𝜉′𝑗 𝜉𝑖
h ( 𝑥 ) −1=0 -2 -1 +1 𝑦 𝑖 h ( 𝑥𝑖 )=1 − 𝜉 𝑖
0
h ( 𝑥 )+ 1=0 1−𝜉 𝑗 1-
=0
𝑖=1
Slack (error) terms
• There are many possible choices for leading to more or less aggressive
penalizations of slack terms:
• Choices and lead to most straightforward solutions. Loss functions associated
with and are called hinge loss and quadratic hinge loss, respectively.
zero-one loss:
hinge loss:
quadratic hinge:
Figure 5.5
Both hinge loss and quadratic hinge loss provide
convex upper bounds on binary zero-one loss.
𝑦h ( 𝑥)
03/18/2024 Pattern Recognition-SVM, School of Computer Engineering, IUS 55
T, Morteza Analoui
Two conflicting objectives: loss and margin
• On one hand, we wish to limit the total amount of empirical loss
(slack penalty) due to misclassified examples and outliers, which can
be measured by , or, more generally by for some .
Subject to:
2 𝑖
2 ,
𝑖 =1
Subject to:
• Analysis is presented in case of hinge loss which is most widely used loss function
for SVMs.
• We introduce Lagrange variables , associated to constraints and , associated to
non-negativity constraints of slack variables
• We denote by the vector and by vector
• Lagrangian can then be defined for all and , by
dual
(5.25)
Where
(5,27)
(5.28)
(5.29)
(5.30)
• As in separable case, weight vector solution is unique, support vectors are not.
Support vectors:
‖ ‖
𝑚 2 𝑚 𝑚 𝑚 𝑚
1
ℒ (𝑤 ,𝑏, 𝛼 )= ℒ 𝑑𝑢𝑎𝑙 (𝜶)=
2
∑ 𝛼 𝑖 𝑦 𝑖 𝑥𝑖 − ∑ ∑ 𝛼 𝑖 𝛼 𝑗 𝑦 𝑖 𝑦 𝑗 ( 𝒙𝑖 ∙ 𝒙 𝑗 ) − ∑ 𝛼𝑖 𝑦 𝑖 𝑏+∑ 𝛼 𝑖 (5.31)
𝑖=1 𝑖=1 𝑗=1 𝑖=1 𝑖=1
𝑚 𝑚
1
− ∑ ∑ 𝛼𝑖 𝛼 𝑗 𝑦 𝑖 𝑦 𝑗 ( 𝒙 𝑖 ∙ 𝒙 𝑗 )
0
2 𝑖 =1 𝑗=1
• Remarkably, we find that objective function is no different than in
separable case:
𝑚 𝑚 𝑚
1
ℒ 𝑑𝑢𝑎𝑙 (𝜶)=− ∑ ∑ 𝛼𝑖 𝛼 𝑗 𝑦 𝑖 𝑦 𝑗 ( 𝒙 𝑖 ∙𝒙 𝑗 ) + ∑ 𝛼 𝑖 (5.32)
2 𝑖=1 𝑗=1 𝑖=1
subject to:
(∑ )
𝑚
h ( 𝑥 )=𝑠𝑔𝑛 ( 𝒘 ∙ 𝒙 + 𝑏 )=𝑠𝑔𝑛 𝛼𝑖 𝑦𝑖 ( 𝒙 𝑖 ∙ 𝒙 )+ 𝑏 (5.34)
𝑚 𝑖=1
𝒘 =∑ 𝛼 𝑖 𝑦 𝑖 𝒙 𝑖
𝑖=1
• can be obtained from any support vector lying on a marginal hyperplane,
𝑚
𝑏= 𝑦 𝑗 − ∑ 𝛼 𝑖 𝑦 𝑖 ( 𝒙 𝑖 ∙ 𝒙 𝑗 ) (5.35)
• important property of SVM:𝑖=1
hypothesis solution depends only on inner products
between vectors and not directly on vectors themselves
𝑦 𝑖 h ( 𝑥𝑖 )=𝜌 𝑖 / 𝜌 h
Figure 5.6
ρ margin loss function: illustrated in red
𝑚
^ 1
𝑅 𝑆 , 𝜌 ( h )= ∑ Φ 𝜌 [ 𝑦 𝑖 h(𝑥𝑖 ) ] (5.37)
𝑚 𝑖=1
^
𝑅 ( h ) ≤ 𝑅 𝑆 , 𝜌 ( h ) +2
𝑟Λ
𝜌 √𝑚
+
𝑙𝑜𝑔1 /𝛿
2𝑚 √
• In the separable case, for a linear with geometric margin and choice of
(5.44)
∑ 𝜉𝑖
^
𝑅 ( h ) ≤ 𝑅 𝑆 , 𝜌 ( h ) +2
𝜌
𝑟Λ
√ 𝑚
+
𝑙𝑜𝑔1 /𝛿
2𝑚 √ (5.44)
𝑖 =1
03/18/2024 Pattern Recognition-SVM, School of Computer Engineering, IUS 72
T, Morteza Analoui
Contents
1. Binary support machine
2. Binary SVM: Separable case (Consistent case)
3. Binary SVM: Non-separable case (Inconsistent case)
4. Kernel Methods
5. Multiclass SVM
𝑚 𝑚 𝑚
1
max ℒ 𝑑𝑢𝑎𝑙 (𝜶)=−
𝜶
∑ ∑
2 𝑖=1 𝑗=1
𝛼 𝑖 𝛼 𝑗 𝑦 𝑖 𝑦 𝑗 𝐾 ( 𝑥𝑖 ∙ 𝑥 𝑗 ) + ∑ 𝛼𝑖 (6.13)
𝑖=1
subject to:
∀ 𝑥 , 𝑥 ∈ 𝑋 , 𝐾 ( 𝑥 , 𝑥 ) =⟨ Φ ( 𝑥 ) , Φ ( 𝑥 ′ ) ⟩
′ ′ (6.1)
• Example: ∀ 𝑥 , 𝑥′ ∈ ℝ 𝑁 , 𝐾 ( 𝑥 , 𝑥 ′ )= ( 𝑥1 𝑥 ′ 1 + 𝑥 2 𝑥 ′ 2 +𝑐 )
2
[ ]
2
𝑥 ′1
2
𝑥 ′2
𝐾 ( 𝑥 , 𝑥 ′ ) =⟨ Φ ( 𝑥 ) , Φ ( 𝑥 ′ ) ⟩ = [ 𝑥21 𝑥22 √ 2 𝑥 1 𝑥 2 √ 2 𝑐 𝑥 1 √ 2 𝑐 𝑥 2 𝑐 ] ∙ √
2 𝑥 ′1 𝑥 ′2 2
= ( 𝑥 1 𝑥 ′ 1+ 𝑥 2 𝑥 ′ 2 + 𝑐 )
√2 𝑐 𝑥 ′1
√2 𝑐 𝑥 ′2
𝑐
03/18/2024 Pattern Recognition-SVM, School of Computer Engineering, IUS 81
T, Morteza Analoui
Example: XOR and 2nd degree polynomial
h ( 𝑥)
√ 2 𝑥1 𝑥 2
(1,1,+ √ 2,− √ 2,− √ 2,1)(1,1,+ √ 2,+ √ 2,+ √ 2,1)
SVM solution:
h (Φ ( 𝑥 )) √ 2 𝑥1
[ 𝑥 21 𝑥 22 √2 𝑥1 𝑥2 √ 2 𝑥 1 √2 𝑥 2 1 ]
Figure 6.3
Illustration of XOR classification problem and use of polynomial kernels. (a) XOR problem linearly
non-separable in input space. (b) Linearly separable using second-degree polynomial kernel.
( )
2
− ‖𝑥 −𝑥 ′‖2
, 𝐾 ( 𝑥 , 𝑥 )=𝑒
2
′ 𝑁 ′ 2𝜎
∀ 𝑥, 𝑥 ∈ℝ (6.5)
∀ 𝑥 , 𝑥′ ∈ ℝ 𝑁 , 𝐾 ( 𝑥 , 𝑥 ′ )=tanh ( 𝑎 ( 𝑥 ∙ 𝑥 ′ ) + 𝑏 ) (6.6)
Class label +1 +1 -1 -1 +1
data point 1 2 3 4 5
input space 1 2 4 5 6
[ ]
𝑥 2𝑖
2
Φ ( 𝑥𝑖 ) = √ 2 𝑥 𝑖 𝐾 ( 𝑥 , 𝑥 𝑖 ) =( 𝑥𝑥𝑖 +1)
1
𝜑 ( 𝑥 5)
sv
𝜑 ( 𝑥4)
sv
sv 𝜑 ( 𝑥3)
𝜑 ( 𝑥 1) 𝜑 ( 𝑥 2)
1 2 3 4 5
1 2 4 5 6
= (6x6+1)2
(6x1+1)2
5 5 5
1
ℒ 𝑑𝑢𝑎𝑙 (𝜶)=∑ 𝛼𝑖 − ∑ ∑ 𝛼 𝑖 𝑗 𝑖 𝑗( 𝑖
𝛼 𝑦 𝑦 𝑥 ∙ 𝑥 𝑗 +1 )
2
→
…………………………………………………………………+
𝑖=1 2 𝑖=1 𝑗=1 ………………………………………………………………...+
…………………………………………
Example-1D
• Finding α to maximize
∑ 𝑦𝑖 𝛼𝑖=0
𝜕ℒ
=1 −0 . 5 ( 2 × 4 𝛼 1+ 2× 9 𝛼 2 −2 ×25 𝛼3 −2 ×36 𝛼4 + 2× 49 𝛼 5 )=0
𝜕 𝛼1
Example-1D
• We get
• a1=0, a2=2.5, a3=0, a4=7.333, a5=4.833, 2.5 + 4.833 - 7.333 = 0
• Note that a< , so there is no training error
• The support vectors are {2=2, 4=5, 5=6}
• using:
1 2 3 4 5
1 2 6
𝑥
4 5
𝑤
𝜑 ( 𝑥4)
𝜑 ( 𝑥3)
𝜑 ( 𝑥 1) 𝜑 ( 𝑥 2)
𝑧2
03/18/2024 Pattern Recognition-SVM, School of Computer Engineering, IUS 91
T, Morteza Analoui
Example-1D 𝐾 ( 𝑥 , 𝑥 𝑖 ) =( 𝑥 ∙ 𝑥 𝑖 +1)
2
Support vectors: )
2
h ( 𝑥 )=0 . 6667 𝑥 − 5 .333 𝑥 +9 h ( Φ ( 𝑥 ) ) =0 .663 𝑧 1 − 3 .77 𝑧 2 +9
h ( 𝑥 )=−1 h ( 𝑥 )=+1
𝑥
1 2 3 4 5
1 2 3 4 5 6 7
[ ]
9 1 1 1
𝜑 ( 𝑧 ) =¿ 1 9 1 1
K= 1 1 9 1
𝑘 ( 𝑥1 , 𝑥 2 ) =¿ 1 1 1 9
C=100: Higher C
lower training error and smaller
increasing margin
𝐶
≫1: Too Complex model
Koby Crammer and Yoram Singer. On the algorithmic implementation of multiclass kernel-based
vector machines. Journal of Machine Learning Research, 2, 2002.
Multiclass SVM
• Let denote input space and denote output space, and let be an unknown
distribution over according to which input points are drawn. We will distinguish
between two cases:
• mono-label case, where is a finite set of classes that we mark with numbers for convenience,
Learning: Given a dataset
• the multi-label case where
• In mono-label case, each example is labeled with a single class, while in multi-
label case it can be labeled with several. Text documents can be labeled with
several different relevant topics, e.g., sports, business, and society. The positive
components of a vector in indicate classes associated with an example.
3.1
2.8 2.8 2.8
2.2
2 𝑙=1
Score constraint: Score for true
𝑠𝑖 − 𝑠 𝑙 ≥1 ≡(1− ( 𝑠𝑖 − 𝑠 𝑙 ) ) ≤ 0 Subject to: label is higher than score for any
other label by 1
(5.25)
subject to:
• Where and ,
• Let be a vector whose components are all zero except for the component which
is equal to 1,
• Let be the vector whose components are all 1.
𝑚 𝑚
𝐶
max ℒ 𝑑𝑢𝑎𝑙 =∑ 𝐴𝑖 ∙1 𝑦 − ∑ ( 𝐴𝑖 ∙ 𝐴 𝑗 ) (𝑥 ¿ ¿𝑖, 𝑥 𝑗 )¿
Α 𝑖=1 2 𝑖, 𝑗=1 𝑖
𝛼 𝑖= { 𝛼𝑖 , 1 𝛼 𝑖 ,2 … 𝛼𝑖 , 𝑘 }
→ 𝐴 =1 −𝛼 𝑖= [ −𝛼 𝑖,1 − 𝛼𝑖 ,2 … 1 −𝛼 𝑖, 𝑦 … −𝛼 𝑖,𝑘 ]
1 𝑦 =[ 0 … 0 1 0 … 0 ] 𝑖 𝑦
𝑖
𝑖 𝑖
𝑚
𝐴 𝑖 ∙1 𝑦 =1 −𝛼𝑖 , 𝑦
𝑖 𝑖
𝐴 𝑖 ∙ 1 =1 − ∑ 𝛼 𝑖= 0
𝑖=1
{ }
𝑚
h ( 𝑥 )=arg𝑚𝑎𝑥 𝑘𝑙=1 ∑ 𝐴𝑖,𝑙 𝐾 ( 𝑥 , 𝑥 𝑗 ) +𝑏 𝑙
𝑖=1
[ ]
𝑚 𝑚
𝑤𝑙 = 𝛽 ∑ (1− 𝛼𝑖, 𝑙 )Φ(𝑥 𝑖 )+ ∑ ( − 𝛼𝑖, 𝑙 ¿ Φ (𝑥 𝑖 ))
𝑖=1 𝑖=1
𝑦 𝑖 =𝑙 𝑦 𝑖 ≠𝑙
• In which
√
𝑚
1 𝑟 Λ 𝑙𝑜𝑔 1/ 𝛿
𝑅 ( h ) ≤ ∑ 𝜉 𝑖 +4 𝑘 + (9.12)
𝑚 𝑖=1 √𝑚 2𝑚
{ }
𝑘
h ∈ H 𝐾 = ( 𝑥 , 𝑦 ) ⟶ 𝑊 𝑦 ∙ Φ ( 𝑥 ) : ∑ ‖𝑤 𝑙‖ ≤ Λ 2
2
2
𝑙 =1