Professional Documents
Culture Documents
Flxxtpykxfcxjtpflyiux PE
Flxxtpykxfcxjtpflyiux PE
f is a linear function if
flxxtpykxfcxjtpflyiux.PE Randx yEV
Example l The mean of vectors in R i.e
xtxzt
f x txnlh.fr x ER
is linear
Example 2 The maximum element of vectors in R i.e
is NOT linear
Example3 fi R R with f x áx is a linear function
Example 4 Fi Ca 𠮩 112 defined
by
FH f化 where x Ea 的 is a givennumber
is linear
Example 5 i F Lab R defined by
UXEU.AdditiviityifcxtnzfcxltfcyiuxyEV.nflxxtn
tx.dk X f 刈⼗ xkfcykxiiXKER.xi M.EU
ˋˋ_
The above holds true for any linear function on Hilbert spaces
widely known as Riesz representation theorem
axsforsomewiqueaEH.IE
fislinearandboundifandongiffM
xample Ii We know IR R mean x is linear on Since is a
Therefore
there doesn't
minute
exist a EH such that 化仁 ⽐EH
Hyperplanes
Again we Gonsider R as a Hilbert space where any linear
function is written as a D for some a ER
consider the set
S x1 ⼋⼋ ⼆ 0
go
Then U X y Esa and X
PER
dxxtpyt XGDtka.gs 0 xxtpy ES
Therefore Saois a Plane
linear
Since the codimension of So is 1 because it is define by oneequation
xi if
ie.io
lets
This concept can be generalized to any inner product space V
了 ⽐⽐
唇品品 众蕊 台哈氵
0andbERaregiuen.Projec.tn
onto hyperplanes
Consider a Hilbert space V and a hyperplane S
S XEVKa.DE b
了
Let y EV be a given vector
The vector onSthat is the closest toy
1edtupriotyons.dmtedbypsy.TL
By arg 恐 化 411
S
o
Let us find an explicit expression of By in terms
of a b and y
thoremizisasolutionofggllkyllifandongifthrge
区 弧d
EsL_P
ㄧ
乆 Z o ⽐
NZ NEHCHHZ
l
tx
ypz.lk
z yttlz
xjpzllz
ykl tiz xlitzttz y.Z
tz y.z P.ie
xsz illz.pl I
If we choose to
yz PZ EHZ
xlilectingt
Z
sagivesfz y.Z
If we choose DZO.sn to
z yz DE EHZ
xlilettingt
so_givescz y.az_x 0
么 D a y 今器 aa
ay n.b b So Z ES
EXES
a_y Ex 9器 么 Z D
s个
花i as tax
because a 2 a b
By the previous theorem Z is a solution of 器 化 以11
It remains to show the uniqueness
Z is a solution Z y Z
ZDZz.is
Gig
a solution Zz Z D 0
12 22112 0 Z ⼆ 曲
Insummantiiǐttthf
exists and is unique Furthermore
1
r.la
uigcy
By y 巡品
迥i
Aflinefunctnsztty
A linear function plus
ate a constant is called an function
1ftp.isaffineifwwegivff
That is a function
1
siiYibERisaonstant
Properties
If fi Vs R is affine then
fcxxtfykxfcxHPHDUX.ge V and
xpERs.t.xtfl.t
t
_linear nrrET.se
this 似 1
flxxtpy 9 xxtpyjtb xgcxitpgcntibb
xgntbjtflgcntblt.fi x t PHD
If fi WR where V is a Hilbert space then
Xi EN is an inputfeature Vector
i 1.2 ⼀
N
yi ER is the corresponding response to Xi
Given a new input feature vector XEN how to predict the
corresponding response y ER
For example Xi ER represents n attributes of a house
and yi ER is the selling price
We want to predict the selling price of a house with feature
This
XERnanatfti.fi
lrreTTT
is called regression In this context
Xi are called regressor independent rarities
yi are called dependent variableshoutcomeKabel
The class of all functions R 112 is too large and the given
dataset Xi 炽 is not enough to determine a function uniquely
So we need to find a function cuss where we search f
Intuitively larger N larger function class 区
Linear model
We search f in the class of all affine functions
ie fix Ptb for some actRn b ER
Thus we find a ER and b ER it
La XDtb yi.ie 2 nn
N
GER
min
bER
Èka 如 tb 炉 ⼀
Write
x 器 ⻔ ERN p ⾏ ER
and y 劁ERN
Then Ls problem becomes
Since we
ngǐiixg
have Nlit equations to fit and ntl unknowns
lie N ㄍ 10 M
Regularization
In many applications we don'thave enoughdata e.g N n 1
so that the class of affine functions is too large to search f
We search f is a subclass of all affine functions
all functions
instead
Therefore of searching PER we search BE SCR
㗊 11 P y1122
In other words we search f in
巫 filRRlflxka.pt b AGES
By choosing different S we obtain dyerent approaches
Ridge regression
We choose S p 瓵 ⼼ 仙⼉ for some co
Representer Theorem
1nthefmofaiciokx.gg
蕊噍器器
Proof For any a EH we claim that a can be decomposed as
a East Ěci Ni
where C
GERN and as 中侧 0 for it 2 ⼀
N
Indeed consider Nil forsimplicity
到⼼ ⽐ 船 Then Sto
Ko 烌 is a
ai
ii.in 飝 i 𧅤
in
Psa_a Psa o o
because OE S
LBa a.BG 0
Also by direct calculation
a Psa c 中化
So as Bat a 中⼼
For general N it can be done similarly
䮢
照
䲜 䨊
N N N
化卧 新 㶭 ⼮N N吾 伙 们 ⼗ 闷
中侧 ⼗2
N N
主 点 点GKlxi.XD y.pt⼋点点的 ⽐㖄 ⼗⼋111
幽
⼆ 主 HKC yktN.TK ⼗ ⼊11 Ii
𠯻
kcx.pk
以
ER
where K
州 Klay yn
GR C
倒
Let F.cc 占11 kcylEtddkc
dpendsoncapinlg.F.la
Hi _depends
⼊ 11as
Hong on as E
Then the minimization is the same as
I a at Ěci中如
㗊 FM and 𣊭 Elas
as
娰
让1 N
I
Let the solution
Iitllkcyg
be CERN
Then the predicted output y for input XER is
y a 中化 歌 中侧 ⽐
⼆ 点 Ci Kai x
iii i
iitiiiiiftrnhig
i
kernel regression
result
4.2.3 classification
Classification Giving training data
X中 必⼒ ⼀⼀
⻰只 X.GR 纵⻔ i 1
y
find a classifier a function
f such that
if flxilzl
yi f if f Xi E 1
了support rears
0
S N a ⼩ tb 13 and S ⽔么 ⼩ t b 1
Let Xt ES t and X.ES such that
HX X.IE dist St S_
onto 5 1 b
Since Xt is a projection of X xka x
Y X__
佌 是以 1
a
x__ ___ 应
是a
22
since X.ES
nap
x 赢a
Thus HX X.lk 11赢班 灵位
ie
themargg
Support Vector Machine SVM
max
娼
商
it La t b 21 if y i 1
Xi t b Et if y i 1
which is equivalent to
The above SM
St iaRj.cn
ik
is NOT robust to noise
ist 21
GUM 1
x if
no solution to x x
x
忢 hlyiax.it b 1
where the function 0 if t ZO
htt
in if to
hl.tl
the
In other words if 以 旧 b 1 error is O
of ykaxitbt
Altogether we solve
是
点点
Nl
吾川以 9XDtbi D t.la E suuy
We call this SUM with a soft margin
on
h can be approximated by some smooth function If his
chosen the socalled logistic function we call it logistic regression
kernel SVM
The linear SVMs don't work for Curved data
The kernel method can be used i 𧅵
Let 中 顺 H be a feature map 爽
So we use linear functions on H to classify the points
By Riesz representation theorem
any linear function in the form of
a x for some a EH
Therefore SVM 2 becomes
Ji
1zcikcl _T
hn KCXi xj CjJ D
喌
whereK EkCXi
xjBiiiERNN
The prediction of the input x is given by
sgn Ěkcxxjkj
Again only KC is needed in the kernel SUM
and no explicit feature map ok is required
4了 Linear approximation and Differentiation
t f 化 x_x
f
fix ____ __
it
t
x x
f 如⼗ 亿 x_x
2 The approxiti on error is
error If 化 f們 ⼀
亿 x_x
The error should be in the order of ⽐如11 ie
error
iris oasx sxq.ie 化北川州
So 仙⼈ 㗀 2
形 x_x 1 从幽
him
如妙
化
惢 两 0
Thus 7 fg 2和
At any in
a x a 如 t a x_x
Therefore
1⼼⼼ __ 妙
⼀
fm如 化州
⼆
箱前 ⼆0
Thus 对闷 a
f阀 4 x_x tllx
Kill
⽐为1
台如 化幽 0
Therefore Jf ⽹ 2 ⽔冰
Properties
7 x ftp.g x x 7批 P79化
Chain rule Let fili Rand gi Ri R Then 9of 以 112 and
7 of x 9纵川⽻的
if f and g are differentiable at x and fix respectively
Example 4 flx 11 ㄨ 11
UXEV
This is a
of flx 化112
composition from Vs R
and t.lt 㺼 from MR
When MI to both f and f are differentiable
Also 折化 2X fit 正 if t to
So 7 fix 7 后听 化 fzkflxD 7f.ly
2X ⼆
trzyi MM
When 仙 0 lie X
flt is NOT differentiable at flxko
0
ie
对⽐
Freshet differentiation is
銂 the same as
㗴
the standard
differentiation in multivariate calculus
Taylors expansion
From the definition we see that
fix fit CAMP x_x
Or more precisely
In patiala if fiR
Rfcxiz.fi
点 叕妙 化 ⼗ ⼋们
Differentiation on normed vector spaces
Definition f is differentiable at 如 EY if i
⺕ a linear function Li h R such that
𡽪 璺
圿恕 算
The linear function L is called the dyerendition of fat x