Flxxtpykxfcxjtpflyiux PE

Ch 4 Linearfunctions and Difrendition
4.1 Linear functions

Let fi Vs R be a function on a vector space V
f is a linear function if
flxxtpykxfcxjtpflyiux.PE Randx yEV
Example l The mean of vectors in R i.e
xtxzt
f x txnlh.fr x ER
is linear
Example 2 The maximum element of vectors in R i.e
fix max 化 hi Xn了 for XE R
is NOT linear
Example3 fi R R with f x áx is a linear function
Example 4 Fi Ca 𠮩 112 defined
by
FH f化 where x Ea 的 is a givennumber
is linear
Example 5 i F Lab R defined by
FA fǎfmdx for f EE lab is linear

Example G fi MR V is an inner product
spay defined by
fix ⼆化 Z where ZEV is a given vector in V
Example 7 A norm function on a vector space V is NOT linear

To see this
11 x 11 1⽐11
which contradict with
ffxkfcxtoxk.fm to 拟 fix
for f being linear on V
Properties of linear functions

Homogeneity flax xfcx EX ER and X EV
Because
fcxxkflxxtokxflxHOflxkxfcxD.lt
f 0
implies flo f
0
fix 0 because 0X 0
UXEU.AdditiviityifcxtnzfcxltfcyiuxyEV.nflxxtn
tx.dk X f 刈⼗ xkfcykxiiXKER.xi M.EU
ˋˋ_
To see this we note that

fax t.n.txkxkkxflxitflhkt
tXKXDE.xfythflxytfkskt.it
⼈们
xflx.lt ⼗ Xkflxd
Inner product representation of a linear function on Hilbert spaces

For simplicity let's consider a linear function on R equipped with the
standard innerproduct 化以⼆ㄨ可 and the induced mom 化⼼⽐唹
From the discussion above
Forany given aER the function fix a x is linear

The reverse is true i.e
Any linear function f 112 R must be in theform of

fix a x for some a ER
To see this let e ez en where ei a basis of IR

劁器能_be
so that any ㄨ㓹ER is written as kxe.txz.at then
Therefore if f is a linear function then
f x f xe.txz.at then
xfk.it Xzfkyt txnfg _by property of

linearfunctions
D
where a
蠿 ER
a linearfunction fix is unique

Furthermore ⽐ representation of a x
whichmeans there is any one vector a ER for which fix a D holdsfor all x
Indeed suppose that a is not unique ie we have two vectors as
such that fix a x and fix b x for all x ER

Then let Kei fei a ⼼ aiandfeiklb.li bi
So hi hi i 1，2 ⼀
n
Therefore a b
Alger we see that
9ERnL
alinearfunctionf.RSRifandonlyiffcxFCa
lttttt.tt x
r
frsomeuni纰
The above holds true for any linear function on Hilbert spaces
widely known as Riesz representation theorem
Theorem Riesz representation thereon
Let It be a Hilbertspace and f be a function HR Then
axsforsomewiqueaEH.IE
fislinearandboundifandongiffM
xample Ii We know IR R mean x is linear on Since is a
Hilbert space we can find a

unique aan st
mean Cx XI
Indeed
mean M 合化中⽕ t.tl n fx thXzt thXn ⼼
where a 六六⼀
六
Example 2 i Let It be a Hilbert space and 1111 is the norm

It is known that the norm function is NOT linear
Therefore
there doesn't
minute
exist a EH such that 化仁⽐EH
Hyperplanes
Again we Gonsider R as a Hilbert space where any linear
function is written as a D for some a ER
consider the set
S x1 ⼋⼋⼆ 0
go
Then U X y Esa and X
PER
dxxtpyt XGDtka.gs 0 xxtpy ES
Therefore Saois a Plane
linear
Since the codimension of So is 1 because it is define by oneequation
Sao is called a hyperplane

hang
Lili
Now let's consider

San⼆ x Ka 如巧了 for b ER is given
Let Xo E Sab i.e Lax b be fixed
Then Sab ⼆ Sao t Xo because
lfXESa.ba X XoFa x7taxo7 0 3X XoESao

XESa.tk
a ⼼动 xtx.ES
UXESaohxtxi ta.DK
In other words
San is a shift of a hyperplane still called a hyperplane
xi if
ie.io
lets
This concept can be generalized to any inner product space V
了⽐⽐
唇品品众蕊台哈氵
0andbERaregiuen.Projec.tn
onto hyperplanes
Consider a Hilbert space V and a hyperplane S
S XEVKa.DE b
了
Let y EV be a given vector
The vector onSthat is the closest toy
1edtupriotyons.dmtedbypsy.TL
By arg 恐化 411
S
o
Let us find an explicit expression of By in terms
of a b and y
thoremizisasolutionofggllkyllifandongifthrge
区弧d
EsL_P
ㄧ
乆 Z o ⽐
roof.IOWe first prove that If Z ES is a solution of 哭化州

then L Ey 2 X 0 EXES
Since Z is a solution ZES.ie z b
EXES and t ER it is easy to see that
Ht Z ⽐⼀ 1州么 Z
tax b
Therefore cut Z tx ES
Since z is closest toy on S we have
NZ NEHCHHZ
l
tx
ypz.lk
z yttlz
xjpzllz
ykl tiz xlitzttz y.Z
tz y.z P.ie
xsz illz.pl I
If we choose to
yz PZ EHZ
xlilectingt
Z
sagivesfz y.Z
If we choose DZO.sn to
z yz DE EHZ
xlilettingt
so_givescz y.az_x 0
Altogether Z satisfies Z y Z x 0 AXES

We then show that Z ES satisfies
Gyz x 0 then Z is
a solution of 交出化以11 by direct calculation

Since E y z X 0 EXES
11 X yN 11 2 x 2 y 112
112 N 112州2 242 X 2 y

112 9112 1112 x N 112 y 112 EXES
This togetherwith ZES implies z minimizes NX y.li in KS
⽹
2
Theorem The solution
of 㗊化州 exists and unique Which is
given by y fiib
r.la
proof denote
Ey_ 器 a
么 D a y 今器 aa
ay n.b b So Z ES
EXES
a_y Ex 9器么 Z D
s个
花i as tax
because a 2 a b
By the previous theorem Z is a solution of 器化以11
It remains to show the uniqueness
Suppose we have two solutions Z and Zz Then
Z is a solution Z y Z
ZDZz.is
Gig
a solution Zz Z D 0
Taking difference leads to G ⽯ Z ⼀⼆ 0
12 22112 0 Z ⼆曲
Insummantiiǐttthf
exists and is unique Furthermore
1
r.la
uigcy
By y 巡品
迥i
Aflinefunctnsztty
A linear function plus
ate a constant is called an function
1ftp.isaffineifwwegivff
That is a function
1
siiYibERisaonstant
Properties
If fi Vs R is affine then
fcxxtfykxfcxHPHDUX.ge V and
xpERs.t.xtfl.t
t
_linear nrrET.se
this 似 1
flxxtpy 9 xxtpyjtb xgcxitpgcntibb
xgntbjtflgcntblt.fi x t PHD
If fi WR where V is a Hilbert space then
f must be in the form of

fix Ptb where GEVand bER
4国 Case Studies Regression and Classification
42.1 Linear Regression

Given a set of data
Xi ㄐ 1 ⽐2 ⽣从⽕
where
Xi EN is an inputfeature Vector
i 1.2 ⼀
N
yi ER is the corresponding response to Xi
Given a new input feature vector XEN how to predict the
corresponding response y ER
For example Xi ER represents n attributes of a house
and yi ER is the selling price
We want to predict the selling price of a house with feature
This
XERnanatfti.fi
lrreTTT
is called regression In this context
Xi are called regressor independent rarities
yi are called dependent variableshoutcomeKabel
The class of all functions R 112 is too large and the given
dataset Xi 炽 is not enough to determine a function uniquely
So we need to find a function cuss where we search f
Intuitively larger N larger function class 区
Linear model
We search f in the class of all affine functions
ie fix Ptb for some actRn b ER
Thus we find a ER and b ER it
La XDtb yi.ie 2 nn
N
by minimizing the error of the linear equations

while there are many possibledefinitionsof error it is popular to
consider the Square error as follows
a 如 tb Y.it i 1.2 N ⼀
Therefore we find a ER b ER by solving
GER
min
bER
Èka 如 tb 炉⼀
This problem is called the Least squares Ks problem
Write
x 器⻔ ERN p ⾏ ER
and y 劁ERN
Then Ls problem becomes
Since we
ngǐiixg
have Nlit equations to fit and ntl unknowns
N Intl is required to have a uniquesolution
However in practice N can be much much smaller than n

ie
For example Xi are images
of size NM pixels n 10M
and it is very difficult to have a database of lo M images
lie N ㄍ 10 M
Regularization
In many applications we don'thave enoughdata e.g N n 1
so that the class of affine functions is too large to search f
We search f is a subclass of all affine functions
all functions
instead
Therefore of searching PER we search BE SCR
㗊 11 P y1122
In other words we search f in
巫 filRRlflxka.pt b AGES
By choosing different S we obtain dyerent approaches
Ridge regression
We choose S p 瓵⼼仙⼉ for some co
The we solve min

MAYNE
PES
by convex optimization theory
It
minpigapiIHXM.IE t Malli
were No depends on C and others
Here ME is the regularization term
and dry HE is the datafitting term
In other words we find A治了 ER such that
the error of datafitting and the ME

are minimized simultaneously
Therefore ridge regression gives 汇⾏such that

Pty and small
X 114 is so that
PES
The parameter ⼊ is tune st.NL EC
LASSO regression
P 话了欣
以
We choose 5 y sC
The we solve min
MAYNE
PES
by convex optimization theory
It
minpigapiIHXM.IE t Mall
were No depends on C and others
Here the regularization term is Nah
So we find B ⾏ such that
the error of datafitting and the
yah
are minimized simultaneously
small Hall tends to give a sparse vector GER

lie many entries of a are zeros
Consequently LASSO gives a solution
Pig such that
x prey and a is sparse
Thus given XEN the prediction is given by

化 a tb Èaixitb 示意 axitb
Let I ik.to
Since I is a small set the prediction depends only on
a small portion entries of X
This able
is preferred because the prediction is interpret
42.2 Kernel regression
Again linear regression has its limitation
We extend it to nonlinear regression by Kernel trick
Feature map 中 112 H H is some Hilbert space
Then do regression in H
However since His very large the set of all linear functions
is also two large We need regularization
We solve
Nǎi
三点⼮⽕炒州 4 ⼊ 11幽
Representer Theorem
1nthefmofaiciokx.gg
蕊噍器器
Proof For any a EH we claim that a can be decomposed as
a East Ěci Ni
where C
GERN and as 中侧 0 for it 2 ⼀
N
Indeed consider Nil forsimplicity
到⼼⽐船 Then Sto
Ko 烌 is a
ai
ii.in 飝 i 𧅤
in
Psa_a Psa o o
because OE S
LBa a.BG 0
Also by direct calculation
a Psa c 中化
So as Bat a 中⼼
For general N it can be done similarly
䮢
照
䲜䨊
N N N
化卧新㶭⼮N N吾伙们⼗闷
中侧⼗2
N N
主点点GKlxi.XD y.pt⼋点点的⽐㖄⼗⼋111
幽
⼆主 HKC yktN.TK ⼗⼊11 Ii
𠯻
kcx.pk
以
ER
where K
州 Klay yn
GR C
倒
Let F.cc 占11 kcylEtddkc
dpendsoncapinlg.F.la
Hi _depends
⼊ 11as
Hong on as E
Then the minimization is the same as
mini Flat Elas

asEH
as 中䢺 0 i ⼼
I a at Ěci中如
㗊 FM and 𣊭 Elas
as
娰
让1 N
Obviously because MsNico

蠿 Flat is solved by as 0
Thus the solution

of the original minimization is
a ⼆点 a 中侧
where CERN is a solution
of 㗊不⼼⽹
From the above proof we see that
ǎii 主意 a 中如⼈则 4 ⼊11幽
I
Let the solution
Iitllkcyg
be CERN
Then the predicted output y for input XER is
y a 中化歌中侧⽐
⼆点 Ci Kai x
All the computation involves only the kernel function KC

No explicit feature map 中 o is needed
iii i
iitiiiiiftrnhig
i
kernel regression
result
4.2.3 classification
Classification Giving training data
X中必⼒⼀⼀
⻰只 X.GR 纵⻔ i 1
y
find a classifier a function
f such that
if flxilzl
yi f if f Xi E 1
We use hyperplanes to separate the points

fix a ⼒ tb where GER b ER
Theweights a Epi andbeRare moralized such that
21 if y i 1
GX t bi
E 1 if y i 1
Support VectorMachine SVM
There exist many such linear Class忄
o
些⼼⼒⼒b a
1
b 0
A Classifier
to
Another r a ⼩b 1
themargin
不凡 themargin
lifer aptb 七⼆
fr
aptb
1
oaxin
i
0
_support
⼊
0
0 1
vectors
了support rears
0
which one is the better

From the figure the blue is better because it has a larger margin
and hence a larger buffer zone of mis classification
Therefore we want to maximize the margin among all candidates

St
Let us calculate the margin interms of a and b x
The margin is the distance between the two hyperplane
s
S N a ⼩ tb 13 and S ⽔么⼩ t b 1
Let Xt ES t and X.ES such that
HX X.IE dist St S_
onto 5 1 b
Since Xt is a projection of X xka x
Y X__
佌是以 1
a
x__ ___ 应
是a
22
since X.ES
nap
x 赢a
Thus HX X.lk 11赢班灵位
ie
themargg
Support Vector Machine SVM
max
娼
商
it La t b 21 if y i 1
Xi t b Et if y i 1
which is equivalent to
The above SM
St iaRj.cn
ik
is NOT robust to noise
ist 21
For example shown on the right

have only two noisy points there is i
even we
GUM 1
x if
no solution to x x
x
We consider a jt version of SVM 1 as follows
We minimize the error to the separation

N
忢 hlyiax.it b 1
where the function 0 if t ZO
htt
in if to
hl.tl
the
In other words if 以旧 b 1 error is O
yia.pt b 4 the is the abs 㖌

if error
of ykaxitbt
Also the margin 铫忙 can be viewed as a regularization
Altogether we solve
是
点点
Nl
吾川以 9XDtbi D t.la E suuy
We call this SUM with a soft margin
on
h can be approximated by some smooth function If his
chosen the socalled logistic function we call it logistic regression
kernel SVM
The linear SVMs don't work for Curved data
The kernel method can be used i 𧅵
Let 中顺 H be a feature map 爽
So we use linear functions on H to classify the points
By Riesz representation theorem
any linear function in the form of
a x for some a EH
Therefore SVM 2 becomes
ǎih Gia 䖩 1 全Hali KS VM

Again one can prove the following representer theorem
Theorem Any solution of KSVM is in the form of

Ěci 中侧
proof Write a Ěci中侧⼗as for some GE Hand 必妙恐
The rest is the same as the linear regression case ⽹
Thus KS VM becomes
t
Ji
1zcikcl _T
hn KCXi xj CjJ D
喌
whereK EkCXi
xjBiiiERNN
The prediction of the input x is given by
sgn Ěkcxxjkj
Again only KC is needed in the kernel SUM
and no explicit feature map ok is required
4了 Linear approximation and Differentiation
Recall that for a function fi Rs R the derivative at Xo is

⽨
f化惢 x_x
which is the same as
⼦以⼦
点别华华 1 0
Notice that fix f化 x_x is an affine function in R that

passes through No 批 1
t f 化 x_x
f
fix ____ __
it
t
x x
In other words in differentiation at Xo

l f is approximated by an affine function passes thru Xo 加⻉
2 the error of the approximation is oly
XD.TK
460 ielimlex.no x x1 30
This can be used to define deferentialton of functions on Hilbert spaces

We use Hilbert space for simplicity It can be easilyadapted to Banach spaces
Let fi V 112 with Va Hilbert space

Consider the differentiation off at xiv
i By Riesz representation theorem any affine function is in the form
y l ng f f
of U x ta for some 0 EV and a ER Since it passes thru
如 f㟗⼼如 a f⼼ Therefore the affinefunction

is in the form of
a x a a x_x 地如 a
f 如⼗亿 x_x
2 The approxiti on error is
error If 化 f們⼀
亿 x_x
The error should be in the order of ⽐如11 ie
error
iris oasx sxq.ie 化北川州
Definition Let V be a Hilbert space Let fi MR Then f is
said Freshet differentiable if there exists a UE U such that

limlfM.fi 0 x_x
0 您不𢇃 the
x
T sameas 化⽔叫
If f is differentiable at 奾 0 is called the gradient off

at 如 denoted by Jfl
Example Ii fix ⼆化112 where Ml is the norm on V

At any in
2
ME 114幽幽⼮两划如敝⼩
⼆化如11411幽 txixxs
Therefore MF lxitz.MG X_X 11 Xxi
the
affine approximation
So 仙⼈㗀 2
形 x_x 1 从幽
him
如妙
化
惢两 0
Thus 7 fg 2和
Example 2 i fM D for some a EV
At any in
a x a 如 t a x_x
Therefore
1⼼⼼ __ 妙
⼀
fm如化州
⼆
箱前⼆0
Thus 对闷 a
Example 3i fix 11 X a112 for some GEV

At any X EV
2
f X 1 从 all 佖⼈ X_x 112
11 ⼈如112 1 2 伈
112 1 111 a x_x
f阀 4 x_x tllx
如 IFS.fm fix f刚九化么

Lire
11
⼀
Kill
⽐为1
台如化幽 0
Therefore Jf ⽹ 2 ⽔冰
Properties
Freshet differentiation is linear ie
7 x ftp.g x x 7批 P79化
Chain rule Let fili Rand gi Ri R Then 9of 以 112 and
7 of x 9纵川⽻的
if f and g are differentiable at x and fix respectively
Example 4 flx 11 ㄨ 11
UXEV
This is a
of flx 化112
composition from Vs R
and t.lt 㺼 from MR
When MI to both f and f are differentiable
Also 折化 2X fit 正 if t to
So 7 fix 7 后听化 fzkflxD 7f.ly
2X ⼆
trzyi MM
When 仙 0 lie X
flt is NOT differentiable at flxko
0
It can be shown that flxk Ml is NOT differentiable at no

For functions on Rn fi R R
fM
意
ie
对⽐
Freshet differentiation is
銂 the same as
㗴
the standard
differentiation in multivariate calculus
Taylors expansion
From the definition we see that
fix fit CAMP x_x
Or more precisely
fix f 如 t Gf 伈 x_x 以 0 化如11
This is a generalization of Taylor's expansion
In patiala if fiR
Rfcxiz.fi
点叕妙化⼗⼋们
Differentiation on normed vector spaces
Let U be a normed vector space
Let fi M R be a function Let X EV

To define differentiation we still use an affine function approximation
and the affine function passes thru 如 f 如
Thus we use f 化以⼗ L X_X以 where L WR is a linear
function to approximate fix
However because there is no Riesz representation on normed spaces we
keep the linear function L in the definition of differentiation
Definition f is differentiable at 如 EY if i
⺕ a linear function Li h R such that
𡽪璺
圿恕算
The linear function L is called the dyerendition of fat x

Flxxtpykxfcxjtpflyiux PE

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Flxxtpykxfcxjtpflyiux PE

Uploaded by

Copyright:

Available Formats

Ch 4 Linearfunctions and Difrendition

4.1 Linear functions

fix max 化 hi Xn了 for XE R

FA fǎfmdx for f EE lab is linear

fix ⼆ 化 Z where ZEV is a given vector in V

Example 7 A norm function on a vector space V is NOT linear

Properties of linear functions

To see this we note that

Inner product representation of a linear function on Hilbert spaces

Forany given aER the function fix a x is linear

Any linear function f 112 R must be in theform of

To see this let e ez en where ei a basis of IR

xfk.it Xzfkyt txnfg _by property of

a linearfunction fix is unique

such that fix a x and fix b x for all x ER

Theorem Riesz representation thereon

Let It be a Hilbertspace and f be a function HR Then

Hilbert space we can find a

Example 2 i Let It be a Hilbert space and 1111 is the norm

Sao is called a hyperplane

Now let's consider

Let Xo E Sab i.e Lax b be fixed

Then Sab ⼆ Sao t Xo because

lfXESa.ba X XoFa x7taxo7 0 3X XoESao

roof.IOWe first prove that If Z ES is a solution of 哭 化州

Altogether Z satisfies Z y Z x 0 AXES

a solution of 交出 化以11 by direct calculation

112 N 112州2 242 X 2 y

Suppose we have two solutions Z and Zz Then

Taking difference leads to G ⽯ Z ⼀⼆ 0

f must be in the form of

42.1 Linear Regression

by minimizing the error of the linear equations

Therefore we find a ER b ER by solving

This problem is called the Least squares Ks problem

N Intl is required to have a uniquesolution

However in practice N can be much much smaller than n

The we solve min

the error of datafitting and the ME

Therefore ridge regression gives 汇⾏such that

small Hall tends to give a sparse vector GER

Thus given XEN the prediction is given by

mini Flat Elas

Obviously because MsNico

Thus the solution

All the computation involves only the kernel function KC

We use hyperplanes to separate the points

which one is the better

Therefore we want to maximize the margin among all candidates

For example shown on the right

We consider a jt version of SVM 1 as follows

We minimize the error to the separation

yia.pt b 4 the is the abs 㖌

Also the margin 铫忙 can be viewed as a regularization

ǎih Gia 䖩 1 全Hali KS VM

Theorem Any solution of KSVM is in the form of

Recall that for a function fi Rs R the derivative at Xo is

Notice that fix f化 x_x is an affine function in R that

In other words in differentiation at Xo

This can be used to define deferentialton of functions on Hilbert spaces

Let fi V 112 with Va Hilbert space

如 f㟗 ⼼ 如 a f⼼ Therefore the affinefunction

fix ⼆化 Z where ZEV is a given vector in V

roof.IOWe first prove that If Z ES is a solution of 哭化州

a solution of 交出化以11 by direct calculation

如 f㟗⼼如 a f⼼ Therefore the affinefunction

Example Ii fix ⼆化112 where Ml is the norm on V

如 IFS.fm fix f刚九化么

fix f 如 t Gf 伈 x_x 以 0 化如11