You are on page 1of 13

Nhng Vn C bn v Mng Neuron

Khanh Nguyen Hieu Pham


University of Maryland Stanford University
College Park, MD 20742 Stanford, CA 94305
kxnguyen@cs.umd.edu hyhieu@cs.stanford.edu

Li ni u

Mng neuron (neural network) l mt m hnh hc my (machine learning) c kh


nng m phng bt k hm s hay quan h no [6]. Tuy khng phi l m hnh
duy nht c kh nng ny (mt v d khc l cy quyt nh - decision tree), mng
neuron l m hnh duy nht hin nay lm c iu vi mt s lng tham s
va phi m my tnh c kh nng tnh ton ra c trong thi gian hp l. Tnh
cht ny khin cho mng neuron c s dng rng ri v t c nhiu thnh
qu trong nhiu lnh vc khc nhau ca tr tu nhn to (artificial intelligence).
Bi vit ny nhm mc ch gii thiu vi cc bn sinh vin v nghin cu sinh
trong lnh vc tr tu nhn to, c bit l hc my, nhng kin thc c bn v
mng neuron. Bng cch s dng ngn ng n gin nht c th v yu cu hiu
bit ti thiu v i s tuyn tnh v gii tch hm nhiu bin, tc gi hy vng rng
sau khi c xong bi vit, cc bn c th lp trnh c mt vi mng neuron c
bn gii quyt mt vi bi ton ca mnh.

1 Gii thiu

D ra i t khong 60 nm trc, thp nin 2006-2015 chng kin s hi sinh mnh m ca


mng neuron. Hin ny, m hnh ny c ng dng rng ri v t c nhiu kt qu tt trong
hu nh mi lnh vc ca tr tu nhn to, bao gm th gic my tnh (computer vision) [14, 7, 8],
x l ngn ng t nhin (natural language processing) [1, 13, 12], nhn din ging ni (speech
recognition) [4, 9].
Tu vo ng dng c th, mng neuron c th mang cc kin trc khc nhau, cho php thng tin
gia cc neuron trong mng c lan truyn theo nhiu phng php v nh hng thch hp. Bi
vit ny gii thiu kin trc n gin nht ca mng neuron: mng kt ni y (fully connected
network). kin trc ny, mng neuron gm nhiu lp neuron (neuron layer) c sp xp theo th
t tuyn tnh. Cc neuron trong cng mt lp khng c kt ni vi nhau. Cc neuron gia hai lp
lin tip c kt ni vi nhau to thnh mt th hai pha y vi cc cnh c trng s c
biu din bi mt ma trn trng s. Kt cu ny gi lin tng n m hnh neuron trong b no con
ngi vi cc neuron trong mng c vai tr nh cc neuron trong no ngi, cn cc cnh ni ng
vi cc ng truyn synapse 1 . T y tr v sau, nu khng ni g thm, thut ng "mng neuron"
c hiu l "mng neuron kt ni y ".
1
Tuy nhin, v hot ng ca b no ngi vn cn l mt n s i vi khoa hc, khng nn ni rng hot
ng ca mng neuron m phng hot ng ca b no.

1
Lp Ll+1 1 2 3 4

(l) (l)
W1i W4i

Lp Ll i

(l1)
(l1)
Wi1 Wi3

Lp Ll1 1 2 3

Hnh 1: Minh ho cho kt ni gia cc lp trong mt mng neuron. Ch quy c v k hiu: trng
(l)
s gia neuron i thuc lp Ll+1 v neuron j thuc lp Ll c k hiu l Wij .

C hai con ng lan truyn thng tin trong mng neuron kt ni y . Trong bc lan truyn
ti (feed-forwarding), thng tin c truyn t lp d liu vo (input layer), qua cc lp n (hidden
layer) ri n lp d liu ra (output layer). Lp d liu ra chnh l kt qu ca mng, th hin gi
tr ca hm m mng ang m phng ti im d liu nhn c lp d liu vo. Tt nhin, mng
neuron c th cho kt qu khng chnh xc, to ra cc li sai lch. Trong bc lan truyn ngc
(back-propagation), cc li ny s c truyn qua cc lp ca mng theo trnh t ngc li vi bc
lan truyn ti, cho php mng neuron tnh c o hm theo cc tham s ca n, t iu chnh
c cc tham s ny bng mt thut ton ti u hm s.
Bi vit ny s gip bn hiu v cc khi nim trn, ng thi a bn i qua nhng kin thc c
bn hun luyn mt mng neuron. Tip sau y, phn 2 ca bi vit s gii thiu v kin trc ca
mng neuron kt ni y . Phn 3 m t thut ton lan truyn ti dng suy lun thng tin trn
mng. Phn 4 gii thiu v hm kch hot. Phn 5 tho lun cch p dng mng neuron cho cc bi
ton phn loi. Phn 6 m t phng php tm ra cu hnh tham s ti u cho mng neuron. Phn
7 tho lun v mt vn thng gp khi s dng mng neuron, overfitting, v cc cch khc phc.
Cui cng, phn 8 s gi m nhng hng mi tm ti thm nhng kin thc nng cao vt ra
ngoi phm vi ca bi vit ny.

2 Kin trc ca mng neuron kt ni y

Nh ni phn gii thiu, cc neuron trong mt mng neuron kt ni y c phn chia


thnh nhiu lp. Mi neuron trong mt lp nhn gi tr tr ra t cc neuron lp lin trc, kt hp
cc gi tr ny thnh mt gi tr trung gian, v sau cng truyn gi tr trung gian qua mt hm kch
hot tr v kt qu cho neuron lp tip theo.
C th hn, xt mt mng neuron gm L 1 lp n. Ta s k hiu L(l) l tp hp cc lp neuron
nm trong lp th l, vi l = 0, 1, ..., L. Lp L(0) l lp d liu vo. Lp L(L) l lp d liu ra. Cc
lp cn li c gi l cc lp n. Neuron trong lp th l ch nhn thng tin t cc neuron thuc lp
th l 1 v ch truyn thng tin cho cc neuron thuc lp th l + 1. Tt nhin, cc neuron thuc
lp L(0) khng nhn d liu vo t cc neuron khc v cc neuron thuc lp L(L) khng truyn d
liu ra cho cc neuron khc. Hnh 1 minh ho lin kt xung quanh mt neuron mu trong mt mng
neuron.

2
Algorithm 1 Thut ton lan truyn ti.
1: function FEEDFORWARD(x(0) R|L0 | )
2: for l = 1 to L do
3: z (l) W (l1) x(l1)

4: x(l) f z (l)
5: end for
6: return x(L) , Loss(z (L) )
7: end function

Gia hai lp lin tip Ll v Ll+1 trong mng kt ni y , ta thit lp mt ma trn trng s W (l) vi
(l)
kch thc l |Ll+1 ||Ll | 2 . Phn t Wij ca ma trn ny th hin nh hng ca neuron j trong
lp l ln neuron i trong lp l + 1. Tp hp cc ma trn trng s W = {W (0) , W (1) , , W (L1) }
c gi l tp hp cc tham s ca mng neuron. Vic xc nh gi tr ca tp tham s c bit
n nh vic hc (learn) hay hun luyn (train) mng neuron.

3 Phng thc suy lun thng tin ca mng neuron

Gi s rng mt khi cc tham s ca mt mng neuron c xc nh, lm th no s dng mng


neuron ny nh mt hm s thng thng? Thut ton lan truyn ti (feed-forwarding) (thut ton
1) cho php mng neuron nhn mt im d liu vo v tnh ton im d liu ra tng ng.
Trong thut ton 1, tham s x(0) l vector d liu vo ca mng. dng 3, W (l1) x(l1) ch php
nhn gia ma trn W (l1) vi vector x(l1) . dng 4, hm f : R R l mt hm kch hot m ta
ta s tm hiu ngay phn 4. f nhn mt s thc v tr v mt s thc. Tuy nhin, trong thut ton
1, ta s dng k hiu f trn mt vector v c hiu l p dng f ln tng thnh phn ca vector.
Tc l:

v0 f (v0 )
v1 f (v1 )
f . = . (1)

.. ..
vl1 f (vl1 )

Ngoi gi tr ca hm s c m phng, x(L) , thut ton lan truyn ti cn tr v gi tr ca hm


mt mt Loss, th hin tt ca tp tham s hin ti. Hm ny s c ni n phn 6.1.
Lu : cc bn s thy cc ti liu khc gii thiu thm mt vector b(l) , gi l thnh kin (bias), cho
mi lp, v vit dng 3 ca thut ton 1 nh sau:

z (l) W (l1) x(l1) + b(l1) (2)

Tuy nhin, v phi ca phng trnh 2 cng c th c bin i li thnh mt php nhn ma trn
duy nht. V th, tc gi tm thi b i vector thnh kin cc cng thc tr nn ngn gn v d
hiu hn.

4 Hm kch hot

Hm f trong thut ton 1 c gi l hm kch hot. Hm kch hot c vai tr v cng quan trng
i vi mng neuron. Trn thc t, nhng tin b gn y nht trong cc nghin cu v mng neuron
2
K hiu |S| ch s phn t ca tp hp S.

3
[5, 11] chnh l nhng cng thc mi cho f , gip tng kh nng m phng ca mng neuron cng
nh n gin ho qu trnh hun luyn mng. Phn ny s gii thch vai tr ca hm kch hot cng
nh gii thiu mt s hm kch hot thng dng.
Hm kch hot c s dng loi b kh nng tuyn tnh ho ca mng neuron. hiu r iu
ny, ta th b i hm f trong thut ton 1. Ch rng iu ny tng ng vi vic s dng hm
kch hot f (x) = x. Khi , ta nhn thy kt qu suy lun ca mng neuron ch l mt nh x tuyn
tnh ca d liu vo. Tht vy, ta c:

x(L) = W (L1) x(L1) = = |W (L1){z W (0)} x(0) = W x(0) (3)


W

(l)
Trong trng hp ny, vic hc tt c L ma trn W l khng cn thit bi v tp hp tt c cc
hm m mng biu din c ch l cc nh x tuyn tnh, v c th c biu din thng qua mt
php nhn ma trn duy nht vi:

W = W (L1) W (L2) W (0) (4)

iu ny lm suy gim rt nhiu kh nng m hnh ha ca mng neuron. biu din c nhiu
hm s hn, ta phi phi tuyn ho mng neuron bng cch a kt qu ca mi php nhn ma trn-
vector W (l1) x(l1) qua mt hm khng tuyn tnh f . Mt s hm kch hot thng c s dng
l:
1
1. Hm sigmoid: f (x) = sigm(x) = ;
1 + exp (x)
2. Hm tanh: f (x) = tanh (x);
3. Hm n v tuyn tnh ng (rectified linear unit ReLU) ([2]): f (x) = max (0, x);
4. Hm n(v tuyn tnh ng c mt mt (leaky rectified linear unit leaky ReLU) ([10]):
x nu x > 0
f (x) = , vi k l mt hng s chn trc. Thng thng k 0.01;
kx nu x 0
5. Hm maxout ([3]): f (x1 , , xn ) = max xi .
1in

5 M phng hm xc sut v hm phn loi

Mng neuron c ng dng rng ri gii cc bi ton phn loi, tc l xc nh xem d liu vo
thuc loi g trong mt tp cc la chn cho trc. gii bi ton ny, ta dng mng neuron
m phng mt phn b xc sut trn tp cc la chn. V d ta mun dng mng neuron gii bi
ton xc nhn gng mt (face verification). Tp cc la chn ch gm hai phn t: vi mt cp nh
chn dung bt k, ta yu cu mng neuron tr li "c" hoc "khng" cho cu hi rng hai bc nh
c phi cng mt ngi hay khng. Mng neuron a ra cu tr li da vo vic tnh ton xc
sut xy ra ca tng p n ri chn cu tr li c xc sut cao hn. Trong trng hp ny, gi s
rng tng xc sut ca hai p n l 1, vy th ta ch cn tnh xc sut cho mt p n v suy ra xc
sut ca p n cn li. Mt mng neuron s dng hm sigmoid kch hot lp cui rt ph hp
lm iu ny, v hm sigmoid nhn vo mt s thc trong khong (, +) v tr v mt s thc
trong khong (0, 1).
Tng qut hn, khi tp phng n la chn c nhiu hn hai phn t, ta cn bin mng neuron thnh
mt phn b xc sut P (x) tha mn hai iu kin sau:

1. P (x) 0 x ( l tp la chn);
P
2. x P (x) = 1.

4
 
(L) (L) (L)
Xt vector trc khi kch hot lp cui, z (L) = z0 , z1 , .., z|LL |1 . Thay v s dng hm
sigmoid, ta dng hm softmax a vector ny thnh mt phn b xc sut. Hm softmax c dng
nh sau:

softmax(z (l) ) = p0 , p1 , , p|LL |1



(5)
trong  
(L)
exp zi
pi = P  
|LL |1 (L)
j=0 exp z j

vi exp(.) l hm ly tha theo c s t nhin e v 0 i |LL | 1. Lu l s lng neuron


lp cui, |LL |, phi bng vi s cc phng n la chn.
D thy l kt qu ca hm softmax tha mn hai iu kin ca mt phn b xc sut v hm sigmoid
l mt trng hp c bit ca hm softmax.

6 Phng php c lng tham s ca mng neuron

Khi suy lun thng tin trn mng neuron, ta gi s rng cc tham s (cc ma trn W (l) ) u c
cho sn. iu ny d nhin l khng thc t; ta cn phi i tm cc gi tr ca tham s sao cho mng
neuron suy lun cng chnh xc cng tt. Nh ni trn, cng vic ny c gi l c lng
tham s (parameter estimation), cn c bit n nh qu trnh hun luyn (train) hay hc (learn)
ca mng neuron.
Ta gi h(x; W ) v g(x) ln lt l hm biu din bi mng neuron (vi tp tham s W ) v hm mc
tiu cn m phng. Vic tm ra cng thc tnh ngay ra gi tr ca tp s tham s rt kh khn. Ta
chn mt cch tip cn khc, gim thiu dn khong cch gia h(x; W ) v g(x) bng cch lp li
hai bc sau:

1. o sai lch ca suy lun ca mng neuron trn mt tp im d liu mu {(xd , g(xd ))},
gi l tp hun luyn (training set).
2. Cp nht tham s ca mng W gim thiu sai lch trn.

Cch tnh sai lch s c cp phn 6.1. Cch tnh gi tr mi ca tham s sau mi ln cp
nht s c ni n phn 6.2 v 6.3.

6.1 Hm mt mt

Tng ca cc sai lch gia d liu ra ca mng neuron, h(xd ; W ), v d liu ra cn t c,


g(xd ), th hin tt ca tp tham s hin ti. Nu tp hun luyn l c nh, tng ny v bn cht
l mt hm s ch ph thuc vo tp tham s W , thng c bit n vi ci tn hm mt mt:
X
Loss(W ) dist(h(xd ; W ), g(xd )) (6)
dD

vi D l tp hun luyn, dist l mt hm tnh chnh lch gia hai im d liu ra 3 .


Tt nhin, vi mt mng neuron ti u, gi tr ca hm mt mt s bng 0. Trong thc t, ta mun
tm ra gi tr ca tham s gi tr hm mt mt cng nh cng tt. V th, bi ton c lng tham
3
C nhiu cch nh ngha chnh lch khc nhau. Ngi ta thng chn nhng hm lin tc, c o hm
(gn nh) mi ni, v d tnh tnh chnh lch.

5
x0 xopt

Hnh 2: V d minh ha cho vic ti u mt hm s. ng thng mu vng l o hm ti im x0 .


Mi tn ch hng x0 cn c dch chuyn n gn hn vi xopt .

s ca mng neuron v bn cht chnh l bi ton tm gi tr ca bin W cc tiu ha hm s


Loss(W ).
hiu r hn, ta xem xt v d sau y: s dng mng neuron m phng hm XOR ca hai bit
nh phn 4 . Ta thit k mt mng neuron n gin ch c hai lp d liu vo v ra (khng c lp n),
v s dng hm sigmoid kch hot. Lp d liu vo gm 2 neuron v lp d liu ra gm 1 neuron;
v th, kt ni gia hai lp ny c th hin ch bng mt vector tham s w = (w1 , w2 ). V hm
sigmoid tr v mt s thc trong khong (0, 1), m phng hm XOR, ta cn thc hin mt php
bin i n gin chuyn mt s thc thnh mt bit nh phn: nu gi tr ca lp cui cng ln
hn 0.5, xut ra 1, ngc li xut ra 0. n y ta c mt mng neuron c nh dng d liu vo
v ra ging ht nh hm XOR.
Vic cn li l t gi tr tham s w thch hp mng neuron ca ta hot ng nh mt hm XOR
thc th. Trong v d ny, ta nh ngha chnh lch gia d liu ra ca mng neuron p 5 v d
liu ra mu y bng bnh phng ca hiu gia chng:
1
dist(p, y) (p y)2 (7)
2

Hm mt mt trn tp hun luyn c nh ngha nh sau:


1X 1X 2
Loss(w) (pd yd )2 = sigm(w> xd ) yd (8)
2 2
dD dD

vi D = {(xd , yd )} = {((0, 0), 0), ((0, 1), 1), ((1, 0), 1), ((1, 1), 0)}.

6.2 Thut ton gradient descent

Tip theo, ta cn mt thut ton c th cc tiu ha hm mt mt. Quay li vi nhng kin thc
v cc tiu ha hm s c dy trung hc ph thng, hn ta quen thuc vi nhn xt rng gi
4
Bit nh phn l mt bin ch c th mang gi tr 0 hoc 1. Hm XOR(x, y) nhn vo hai bit nh phn x v
y, tr v gi tr 1 nu x khc y, v tr v gi tr 0 nu x bng y.
5
p y l mt s thc trong khong (0, 1) tr ra t hm sigmoid. Vic bin i s thc ny thnh bit nh
phn ch xy ra khi ta dng mng neuron suy lun. y ta ang ni v vic c lng tham s.

6
Algorithm 2 Thut ton gradient descent.
1: function GRADDESC(x f , )
2: Khi to x0 ty .
3: while kx f (x0 )k >  do
4: x0 x0 x f (x0 )
5: Ty chn: cp nht .
6: end while
7: return x0
8: end function

tr o hm ca mt hm s bng 0 ti cc im cc tr cc b. Vy nhng im khng phi l cc


tr, gi tr ny ni vi ta iu g?
Xem xt vic cc tiu ha mt hm s f (x) (hnh 2). Nhim v ca ta l di t chuyn gi tr ca
bin hin thi x0 (im xanh) n gi tr ca bin ti im cc tiu ca f , xopt (im ). D thy
rng nu x0 < xopt th o hm ti x0 s mang du m. Ngc li, o hm ti cc im x0 > xopt
s cho du dng. Trong c hai trng hp, ta thy cn i ngc li vi du ca o hm n gn
hn vi xopt . C th hn, nu f 0 (x0 ) < 0 ta phi tng x0 , nu f 0 (x0 ) > 0 ta phi gim x0 .
Khi input x khng cn l mt s thc m l mt vector l chiu x = (x0 , x1 , ..., xl1 ), f (x) tr thnh
f
mt hm nhiu bin. o hm ring (partial derivative) ca f theo chiu xi c k hiu l x i
.
Gradient ca f theo vector x, k hiu l x f , l vector bao gm o hm ring theo mi chiu ca
x:
 
f f f
x f = , , ...,
x0 x1 xl1

Gradient ng vai tr nh du ca o hm trong trng hp mt chiu. Hng ngc li vi gradient


vector ti mt im l hng m hm s tng/gim nhanh nht nu di chuyn mt on cc nh ra
khi im . Thut ton gradient descent tm n im cc tiu bng cch dch chuyn tham s
theo hng ngc li vi vector gradient mt on nh va trong mi ln cp nht.
Thut ton 2 m t cc bc ca gradient descent. Ti mi bc trong vng lp while, thut ton ny
dch chuyn x0 i mt lng ph thuc vo gradient ca f ti x0 , x f (x0 ). Du tr th hin vic
di chuyn ngc li vi hng ca gradient vector. c dng kim sot ln ca mi bc
dch chuyn x0 , c gi l dch. dch rt quan trng v n nh hng n tc hi t ca
x0 n xopt . Nu dch qu ln, ta c kh nng di chuyn vt qu khi gi tr ti u v khng bao
gi hi t. Ngc li, nu di chuyn vi dch qu nh, ta phi tn nhiu bc hn mi n c
ch, khin tc hi t gim.
C mt s im cn lu khi ci t thut ton 2. Th nht, dng 4, php tnh x0 x f (x0 ) l
mt php tr gia hai vector cng chiu. Th hai, php nhn x f (x0 ) c hiu ty theo ng cnh
v trong thc t ta c th dng cng mt dch cho tt c cc chiu ca x f (x0 ) ( l s thc),
hoc dng dch khc nhau cho tng chiu ( l vector). Cui cng, iu kin dng ca thut ton
(dng 3) l khi ta tm c mt im cc tr cc b, tc l khi norm ca gradient vector, k hiu l
kx f (x0 )k, xp x 0 vi sai s l  6 .

6
C nhiu loi norm, thng ngi ta dng norm-2, tc l di ca vector.

7
Algorithm 3 Thut ton lan truyn ngc.
1: function BACKPROP(x(0) R|L0 | , {W (l) })
2: p dng thut ton 1 vi x(0) tnh cc gi tr z (1) , , z (L) , x(1) , , x(L) v hm mt
mt Loss(z (L) ).
3: (L) z(L) Loss
4: for l = L 1 to 0 do

Loss f 0 (z (l) ) W (l)> (l+1)

5: z (l)

6: W (l)
Loss (l+1) x(l)>
7: end for
return W(0) Loss, , W (L1)


8: Loss
9: end function

Ta minh ha thut ton gradient descent vi v d m phng hm XOR phn 6.1. Gradient ca
hm mt mt theo w s l:

1X
w Loss(w) = w (pd yd )2
2
dD
1X 2
= w sigm(w> xd ) yd
2
dD
X
sigm(w> xd ) yd w sigm(w> xd )

= (9)
dD
X
sigm(w> xd ) yd sigm(w> xd ) 1 sigm(w> xd ) xd
 
=
dD
X
= (pd yd ) pd (1 pd ) xd
dD

Sau ta ch vic p dng thut ton 2 vi x f (x) w Loss(w) hun luyn mng neuron.

6.3 Cch tnh o hm trn mng neuron: thut ton back-propagation

Khi p dng thut ton gradient descent trn mt mng neuron c nhiu lp, ta cn tnh o hm ca
hm mt mt theo tham s ca tng lp neuron mt. Vic tnh o hm theo tng tham s tr nn
phc tp v kh kim sot khi mng neuron c nhiu lp hn v nhiu lin kt phc tp hn gia
cc neuron. Tuy nhin, ta c th tn dng s lp li cu trc gia cc lp v ch cn xem xt cch
tnh o hm mt lp bt k. Thut ton lan truyn ngc (back-propagation) (thut ton 3) khi
qut ha cch tnh o hm mt lp neuron v cch truyn o hm cho lp lin trc.

8
Ta s gii thch dng 5 v dng 6 ca thut ton 3. dng 5, ch rng gradient ca hm mt mt
theo z (l) , z(l) Loss, l mt vector c |Ll | dng, trong dng th i l:

|L(l+1) | (l+1)
X Loss zj
(l)
Loss = (l+1) (l)
zi j=1 zj zi
|L(l+1) | (l+1)
(l+1) zj
X
= j (l)
(nh ngha j )
j=1 zi
|L(l+1) | |L(l) |
X (l+1) X (l) (l)
= j Wjk f (zk ) (10)
(l)
j=1 zi k=1
|L(l+1) |
(l+1) (l) (l)
X
= j Wji f 0 (zi )
j=1

|L(l+1) |
(l) (l+1)
X
0
=f (zi ) (W (l) )>
ij j
j=1

P|L(l+1) | (l+1)
Ta thy j=1 (W (l) )> ij j chnh l dng th i ca vector nhn c t php nhn W (l+1)>
(l+1) 7 . V th, z(l) Loss = f 0 (z (l) ) W (l)> (l+1) , trong l php nhn cc phn t c


chiu tng ng ca hai vector 8 .


dng 6, gradient theo hm mt mt theo W (l) , W (l) Loss, l mt ma trn cng kch thc vi
W (l) , tc l |L(l+1) | |L(l) |. Phn t thuc dng i v ct j ca ma trn ny l:
|L(l+1) | (l+1)
X Loss zk
(l)
Loss = (l+1) (l)
Wij k=1 zk Wij
|L(l+1) | |L(l) |
X (l+1) X (l) (l)
= k Wkt xt
(l)
Wij t=1 (11)
k=1
!
(l+1) (l) (l) (l)
= i (l)
Wij xj (l)
Wkt = 0 ch khi k 6= i hoc t 6= j
Wij Wij
(l+1) (l)
= i xj


V th, W (l)
Loss = (l+1) x(l)> .

7 Overfitting

Trong ng dng thc t, ta thng s dng mng neuron m phng nhng hm s hm s m cu


trc ca chng vn cha c xc nh. Khi , ta ch c th thu nhp c cc b mu d liu ra
(vo) c sinh ra t hm s, nhng li khng th c t qu trnh sinh ra cc b mu . Mt v d
kinh in l qu trnh b no con ngi thu nhn thng tin t hnh nh ca ch vit tay, ri suy
lun ra ch vit. C ch b no biu din hnh nh v suy lun ra thng tin t l mt n s i vi
khoa hc. Tuy nhin, ta c th dng cc bc nh cng vi nhn ng ca chng hun luyn mng
neuron m phng xp x c qu trnh x l hnh nh ca b no. Cho d cu trc gia b no v
7
W (l+1)> l chuyn v ca ma trn W (l+1) .
8
V d: (1, 2, 3) (4, 5, 6) = (1 2, 2 4, 3 6) = (2, 8, 18).

9
Hnh 3: Mt v d v overfitting. a thc c bc cao hn (xanh dng) v qu ch trng vo vic phi
i qua tt c cc im trong tp hun luyn (en) nn c hnh dng phc tp, khng "bnh thng".
a thc bc thp hn () cho gi tr hm mt mt cao hn trn tp hun luyn nhng li ph hp
hn vi phn b d liu trong thc t. iu ny th hin bng vic a thc bc thp c lng mt
im khng c trong tp hun luyn (xanh) chnh xc hn a thc bc cao.

mng neuron (c th) khc nhau, vi mt thut ton hun luyn tt, chng s a ra kt lun ging
nhau vi cng mt im d liu vo.
i vi bi ton d on, v mc tiu cui cng ca ta l m phng mt hm s n, ta khng nn
cc tiu ha hm mt mt trn tp hun luyn. Nu ta lm nh vy s dn n hin tng overfitting,
tc l mng neuron s hc c mt hm phc tp m phng hon ho nht tp hun luyn. Tuy
nhin, cng do cu trc phc tp, hm ny khng c tnh tng qut ha cao, tc l n rt d sai khi
gp mt im d liu khng c trong tp hun luyn (hnh 3). Khi y, mng neuron ging nh mt
con ngi ch bit hc t m khng bit cch vn dng kin thc gii quyt nhng th cha tng
gp phi. Overfitting l mt vn nghim trng i vi mng neuron v kh nng m hnh ha ca
chng qu cao, d dng hc c cc hm phc tp. Ta s tm hiu mt s phng php thng dng
chn on v ngn nga overfitting cho mng neuron.

7.1 S dng tp kim tra cha c nhn thy

nh gi sai lch ca mng neuron mt cch thc t, ta khng th dng gi tr ca hm mt


mt trn tp hun luyn bi v gi tr ny thng s lc quan (thp) hn sai lch thc t. Mun
khch quan, ta thng s dng mt tp kim tra (test set), c lp vi tp hun luyn, tnh sai
lch trn . Tp kim tra ny khng c mng neuron bit n trong sut qu trnh hun luyn, v
ch c s dng mt khi tham s ca mng neuron c nh.

7.2 Dng hun luyn sm (early stopping)

Ta bit rng vic cc tiu ha hm mt mt trn tp hun luyn c th gy ra overfitting nu tp


hun luyn qu nh. Mt cch n gin trnh iu ny l dng hun luyn trc khi gi tr
hm mt mt trn tp hun luyn t n mt im cc tiu cc b. Ti u nht l nn dng khi
sai lch trn tp kim tra bt u tng ln, cho d sai lch tp hun luyn vn ang gim (hnh
4). y chnh l lc overfitting bt u xut hin.
Tuy nhin, nh ni n phn trc, v nguyn tc, tp kim tra khng c can thip vo qu
trnh hun luyn mng neuron. V th, ta cn mt tp kim tra khc thc hin dng sm, gi l
tp pht trin (development set). Khi qu trnh hun luyn ca ta s din ra nh sau:

10
Hnh 4: Dng sm ngay khi vic hun luyn lm gim chnh xc ca mng neuron trn tp pht
trin.

1. Theo di sai lch trn tp hun luyn v tp pht trin.


2. Khi gi tr ca sai lch trn tp pht trin bt u tng, dng hun luyn.
3. C nh mng neuron vi tp tham s m cho sai lch trn tp pht trin tt nht cho
n trc thi im dng. Dng tp kim tra nh gi mng neuron ny mt ln cui v
thng bo kt qu.

7.3 Bnh thng ha tham s (regularization)

Mt cch khc ngn nga overfitting l gii hn khng gian hm s m mng neuron c th
biu din. iu ny c tin hnh bng vic thu hp li bin gi tr ca tp tham s. C th hn,
thay v hun luyn mng ch bng hm mt mt, ta thm vo mt i lng th hin " ln" ca tp
tham s:
Loss(w) + kwk (12)
vi l mt hng s chn trc, kwk c gi l norm ca vector w.
Khi ta cc tiu ha hm s 12, c Loss(w) v kwk u s c cc tiu ha. Kt qu l, mng
neuron nhn c s c cc tham s c gi tr nh hn bnh thng, ngn nga n tr nn qu linh
ng v phc tp.
Trong thc hnh, ta thng s dng bnh phng norm-2, tc l bnh phng di ca vector 9 :
2
X
kwk2 = wi2 (13)
i

Bn cnh , hun luyn vi norm-1 li cho ta tp tham s tha (c nhiu phn t l s 0) gip tng
tc tnh ton v gim b nh, nhng norm-1 li khng c o hm lin tc nn vic ti u ha s
kh khn hn:
X
kwk1 = |wi | (14)
i

i vi cc ma trn ca mng neuron, ta c th chuyn chng thnh cc vector v p dng cc cng


thc norm ni trn nh bnh thng (v th t cc phn t khng quan trng).
9
S dng bnh phng thay v ly cn bc hai d tnh o hm.

11
8 Tip theo l g?

Trong bi vit ny, ta ch mi va i qua nhng vn c bn nht ca mng neuron. gip bn


c p dng ngay cc kin thc va hc, tc gi cung cp hai chng trnh hun luyn mng neuron
m phng hm XOR vi hai kin trc khc nhau cc bn tham kho (https://gitlab.
com/khanhptnk/neuralnet_tutorial/tree/master. Lu l cc chng trnh c
vit bng ngn ng Python n gin v ch c tnh cht minh ha. Mun lp trnh ra cc mng
neuron gii cc bi ton ln, tc gi gi cc bn nn tm hiu v Theano, Pylearn2, Caffe,
Torch, hay TensorFlow. Ngoi ra, v ti liu v mng neuron bng ting Vit cn kh khan him, tc
gi cng hy vng rng cc bn s o su tm hiu thm cc ti liu bng ting nc ngoi. Cun
sch Deep Learning c vit bi cc nh nghin cu u ngnh l mt la chn tuyt vi.
Xin cm n cc bn c ti liu ny!

Ti liu
[1] D. Bahdanau, K. Cho, and Y. Bengio. Neural machine translation by jointly learning to align
and translate. CoRR, abs/1409.0473, 2014.
[2] X. Glorot, A. Bordes, and Y. Bengio. Deep sparse rectifier neural networks. In International
Conference on Artificial Intelligence and Statistics, pages 315323, 2011.
[3] I. J. Goodfellow, D. Warde-Farley, M. Mirza, A. Courville, and Y. Bengio. Maxout networks.
arXiv preprint arXiv:1302.4389, 2013.
[4] A. Y. Hannun, C. Case, J. Casper, B. C. Catanzaro, G. Diamos, E. Elsen, R. Prenger,
S. Satheesh, S. Sengupta, A. Coates, and A. Y. Ng. Deep speech: Scaling up end-to-end
speech recognition. CoRR, abs/1412.5567, 2014.
[5] S. Hochreiter and J. Schmidhuber. Long short-term memory. 1997.
[6] K. Hornik, M. B. Stinchcombe, and H. White. Multilayer feedforward networks are universal
approximators. Neural Networks, 2(5):359366, 1989.
[7] A. Karpathy, A. Joulin, and F. F. F. Li. Deep fragment embeddings for bidirectional image
sentence mapping. In Z. Ghahramani, M. Welling, C. Cortes, N. Lawrence, and K. Weinberger,
editors, Advances in Neural Information Processing Systems 27, pages 18891897. Curran
Associates, Inc., 2014.
[8] Q. V. Le, R. Monga, M. Devin, K. Chen, G. S. Corrado, J. Dean, and A. Y. Ng. Building
high-level features using large scale unsupervised learning. In In International Conference on
Machine Learning, 2012. 103.
[9] A. L. Maas, R. E. Daly, P. T. Pham, D. Huang, A. Y. Ng, and C. Potts. Learning word vectors
for sentiment analysis. In Proceedings of the 49th Annual Meeting of the Association for
Computational Linguistics: Human Language Technologies, pages 142150, Portland, Oregon,
USA, June 2011. Association for Computational Linguistics.
[10] A. L. Maas, A. Y. Hannun, and A. Y. Ng. Rectifier nonlinearities improve neural network
acoustic models. In Proc. ICML, volume 30, page 1, 2013.
[11] I. S. Rafal Jozefowicz, Wojciech Zaremba. An empirical exploration of recurrent network
architectures. In International Conference in Machine Learning. ICML, 2015.
[12] R. Socher, D. Chen, C. D. Manning, and A. Ng. Reasoning with neural tensor networks
for knowledge base completion. In C. Burges, L. Bottou, M. Welling, Z. Ghahramani, and
K. Weinberger, editors, Advances in Neural Information Processing Systems 26, pages 926
934. Curran Associates, Inc., 2013.
[13] R. Socher, A. Perelygin, J. Wu, J. Chuang, C. D. Manning, A. Y. Ng, and C. Potts. Recursive
deep models for semantic compositionality over a sentiment treebank. In Proceedings of the

12
2013 Conference on Empirical Methods in Natural Language Processing, pages 16311642,
Stroudsburg, PA, October 2013. Association for Computational Linguistics.
[14] C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, and
A. Rabinovich. Going deeper with convolutions. In CVPR 2015.

13

You might also like