Professional Documents
Culture Documents
X . . . set of observations
w ∈ Rn . . . vector of parameters
X ∈ Rn . . . set of observations
+1 if hu, xi + v ≥ 0
h(x; u, v) = sign(hu, xi + v) =
−1 if hu, xi + v < 0
h(x; w) ∈ Argmaxhwy , xi
y∈Y
where φ : Rn × Y → Rn·Y is
φ(x, y) = (0, . . . , x
|{z} , . . . , 0)
y−th slot
Example: sequence classifier for OCR
5/15
x = (x1, . . . , xL) ∈ I L . . . sequence of images with characters
L
Y L
Y
p(x1, . . . , xL, y1, . . . , yL) = p(y1) p(yi | yi−1) p(xi | yi)
i=2 i=1
x1 x2 x3 x4
Example: sequence classifier for OCR
6/15
The MAP estimate from HMC:
L
X L
X
ŷ ∈ Argmax log p(y1) + log p(yi | yi−1) + log p(xi | yi)
y∈AL i=2 i=1
L
X L
X
ŷ = Argmax w, φ(y1) + φ(yi−1, yi) + φ(xi, yi)
(y1,...,yk )∈AL i=2 i=1
| {z }
φ(x,y)
Learning by Emprical Risk Minimization
7/15
` : Y × Y → [0, ∞) . . . loss such that `(y, y 0) = 0 iff y = y 0
Find w of h(x; w) ∈ Argmaxy∈Y hw, φ(x, y)i which minimizes the risk
R(w) = E(x,y)∼p `(y, h(x; w))
w∗ ∈ Argmin RT m (w)
w∈Rn
where the empirical risk is
m
1 X i
RT m (w) = `(y , h(xi; w))
m i=1
is equivalent to
w ∈ Rn such that
Algorithm:
1. w ← 0
2. Find k ∈ {1, 2, . . . , K} such that hw, ak i ≤ 0
3. If there is no such k return w otherwise update
w ← w + ak
and go to step 2.
If the set of inequalities (1) is solvable then the Perceptron algorithm
Algorithm:
1. w ← 0
2. Find a misclassified example (xk , y k ) ∈ T m such that
and go to step 2.
Structured Output Support Vector Machines
11/15
Learning h(x; w) ∈ Argmaxy∈Y hw, φ(x, y)i from examples
m
1 X
w∗ ∈ Argmin Rψ (w) where ψ
R (w) = ψ(xi, y i, w)
w∈Wr m i=1
where
• Wr = {w ∈ Rn | kwk ≤ r} . . . a ball of radius r
• ψ : X × Y × Rn → R . . . proxy approximating the true loss `
Margin rescaling loss
12/15
The score for the correct label of an example (xi, y i) should be above
scores for incorrect labels icreased by margin proportional to loss `(y i, y):
0
PL 0
Example: Sequencial OCR, Hamming distance `(y, y ) = [ y =
6 yi]
i=1 i
n n
ψ(xi, y i, w) = max 0, max
D E D E
4+ φ , AAAA , w − φ , JOHN , w ,
D E D E
3+ φ , JAAA , w − φ , JOHN , w ,
D E D E
2+ φ , JOAA , w − φ , JOHN , w ,
D E D E
1+ φ , JOHA , w − φ , JOHN , w ,
..
Margin rescaling loss
13/15
i i
The score for the correct label of an example (x , y ) should be above
scores for incorrect labels icreased by margin proportional to loss `(y i, y):
i i i i i i
ψ(x , y , w) = max 0, max `(y , y)+hw, φ(x , y)i−hw, φ(x , y )i
y∈Y\{y i}
m
1 X
w∗ ∈ Argmin Rrψ (w) where ψ
Rr (w) = ψ(xi, y i, w)
w∈Wr m i=1
and Wr = {w ∈ Rn | kwk ≤ r}
m
ψ ψ 1 X λ
w∗ ∈ Argmin Rλ (w) where Rλ (w) = kwk2 + ψ(xi, y i, w)
w∈Rn m i=1 2
m
1 X
= ψλ(xi, y i, w)
m i=1
m
1 X
w∗ ∈ Argmin Rλψ (w) where ψ
Rλ (w) = ψλ(xi, y i, w)
w∈Rn m i=1
Algorithm: