Professional Documents
Culture Documents
However, if you can solve one RLS problem over your entire
data set using a matrix factorization, you get multiclass
classification essentially for free (see RLS lecture). So
with RLS, OVA’s a great choice.
Other Approaches
min
PN
||f || 2 + C Pℓ P
i=1 i K i=1 j6=yi ξij
f1 ,...,fN ∈H,ξ∈Rℓ(N−1)
subject to : fyi (xi ) + byi ≥ fj (xi ) + bj + 2 − ξij
ξij ≥ 0
leads us to. . .
Lee, Lin and Wahba: Formulation
min 1 Pℓ PN
(f (x ) + 1 ) + λ PN ||f ||2
f1 ,...,fN ∈HK ℓ i=1 j=1,j6=yi j i N −1 + j=1 j K
PN
subject to : j=1 fj (x) = 0 ∀x
Lee, Lin and Wahba: Analysis
min
PN
||f || 2 + C Pℓ P
i=1 i K i=1 j6=yi ξi
f1,...,fN∈H,ξ∈Rℓ
subject to : fyi (xi ) ≥ fj (xi) + 1 − ξi
ξi ≥ 0
(K + λℓI)c = y.
Because this is a linear system, the vector c is a linear
function of the right hand side y. Define the vector yi as
(
1 if yj = i
yji =
0 otherwise
Towards a Theory, II
(K + λℓI)ci = yi,
and denote the associated functions f c1 , . . . , f cN . Now, for
any possible right hand side y∗ for which the yi and yj
are equal, whenever xi and xj are in the same class, we
can calculate the associated c vector from the ci without
solving a new RLSC system. In particular, if we let mi
(i ∈ {1, . . . , N }) be the y value for points in class i, then
the associated solution vector c is given by
N
cimi.
X
c=
i=1
Towards a Theory, III
For any code matrix M not containing any zeros, we can
compute the output of the coding system using only the
entries of the coding matrix and the outputs of the under-
lying one-vs-all classifiers. In particular, we do not need
to actually train the classifiers associated with the coding
matrix. We can simply use the appropriate linear combi-
nation of the “underlying” one-vs-all classifiers. For any
r ∈ {1, . . . , N }, it can be shown that
F
X
L(Mrifi(x))
i=1
N
Drj f j (x),
X
= (1)
j=1
j6=r
PF
where Crj ≡ i=1 MriMji and Drj ≡ (F − Crj ).
Towards a Theory, IV