Professional Documents
Culture Documents
Nhung Van de Co Ban Ve Mang Neuron PDF
Nhung Van de Co Ban Ve Mang Neuron PDF
Li ni u
1 Gii thiu
1
Lp Ll+1 1 2 3 4
(l) (l)
W1i W4i
Lp Ll i
(l1)
(l1)
Wi1 Wi3
Lp Ll1 1 2 3
Hnh 1: Minh ho cho kt ni gia cc lp trong mt mng neuron. Ch quy c v k hiu: trng
(l)
s gia neuron i thuc lp Ll+1 v neuron j thuc lp Ll c k hiu l Wij .
C hai con ng lan truyn thng tin trong mng neuron kt ni y . Trong bc lan truyn
ti (feed-forwarding), thng tin c truyn t lp d liu vo (input layer), qua cc lp n (hidden
layer) ri n lp d liu ra (output layer). Lp d liu ra chnh l kt qu ca mng, th hin gi
tr ca hm m mng ang m phng ti im d liu nhn c lp d liu vo. Tt nhin, mng
neuron c th cho kt qu khng chnh xc, to ra cc li sai lch. Trong bc lan truyn ngc
(back-propagation), cc li ny s c truyn qua cc lp ca mng theo trnh t ngc li vi bc
lan truyn ti, cho php mng neuron tnh c o hm theo cc tham s ca n, t iu chnh
c cc tham s ny bng mt thut ton ti u hm s.
Bi vit ny s gip bn hiu v cc khi nim trn, ng thi a bn i qua nhng kin thc c
bn hun luyn mt mng neuron. Tip sau y, phn 2 ca bi vit s gii thiu v kin trc ca
mng neuron kt ni y . Phn 3 m t thut ton lan truyn ti dng suy lun thng tin trn
mng. Phn 4 gii thiu v hm kch hot. Phn 5 tho lun cch p dng mng neuron cho cc bi
ton phn loi. Phn 6 m t phng php tm ra cu hnh tham s ti u cho mng neuron. Phn
7 tho lun v mt vn thng gp khi s dng mng neuron, overfitting, v cc cch khc phc.
Cui cng, phn 8 s gi m nhng hng mi tm ti thm nhng kin thc nng cao vt ra
ngoi phm vi ca bi vit ny.
2
Algorithm 1 Thut ton lan truyn ti.
1: function FEEDFORWARD(x(0) R|L0 | )
2: for l = 1 to L do
3: z (l) W (l1) x(l1)
4: x(l) f z (l)
5: end for
6: return x(L) , Loss(z (L) )
7: end function
Gia hai lp lin tip Ll v Ll+1 trong mng kt ni y , ta thit lp mt ma trn trng s W (l) vi
(l)
kch thc l |Ll+1 ||Ll | 2 . Phn t Wij ca ma trn ny th hin nh hng ca neuron j trong
lp l ln neuron i trong lp l + 1. Tp hp cc ma trn trng s W = {W (0) , W (1) , , W (L1) }
c gi l tp hp cc tham s ca mng neuron. Vic xc nh gi tr ca tp tham s c bit
n nh vic hc (learn) hay hun luyn (train) mng neuron.
Tuy nhin, v phi ca phng trnh 2 cng c th c bin i li thnh mt php nhn ma trn
duy nht. V th, tc gi tm thi b i vector thnh kin cc cng thc tr nn ngn gn v d
hiu hn.
4 Hm kch hot
Hm f trong thut ton 1 c gi l hm kch hot. Hm kch hot c vai tr v cng quan trng
i vi mng neuron. Trn thc t, nhng tin b gn y nht trong cc nghin cu v mng neuron
2
K hiu |S| ch s phn t ca tp hp S.
3
[5, 11] chnh l nhng cng thc mi cho f , gip tng kh nng m phng ca mng neuron cng
nh n gin ho qu trnh hun luyn mng. Phn ny s gii thch vai tr ca hm kch hot cng
nh gii thiu mt s hm kch hot thng dng.
Hm kch hot c s dng loi b kh nng tuyn tnh ho ca mng neuron. hiu r iu
ny, ta th b i hm f trong thut ton 1. Ch rng iu ny tng ng vi vic s dng hm
kch hot f (x) = x. Khi , ta nhn thy kt qu suy lun ca mng neuron ch l mt nh x tuyn
tnh ca d liu vo. Tht vy, ta c:
(l)
Trong trng hp ny, vic hc tt c L ma trn W l khng cn thit bi v tp hp tt c cc
hm m mng biu din c ch l cc nh x tuyn tnh, v c th c biu din thng qua mt
php nhn ma trn duy nht vi:
iu ny lm suy gim rt nhiu kh nng m hnh ha ca mng neuron. biu din c nhiu
hm s hn, ta phi phi tuyn ho mng neuron bng cch a kt qu ca mi php nhn ma trn-
vector W (l1) x(l1) qua mt hm khng tuyn tnh f . Mt s hm kch hot thng c s dng
l:
1
1. Hm sigmoid: f (x) = sigm(x) = ;
1 + exp (x)
2. Hm tanh: f (x) = tanh (x);
3. Hm n v tuyn tnh ng (rectified linear unit ReLU) ([2]): f (x) = max (0, x);
4. Hm n(v tuyn tnh ng c mt mt (leaky rectified linear unit leaky ReLU) ([10]):
x nu x > 0
f (x) = , vi k l mt hng s chn trc. Thng thng k 0.01;
kx nu x 0
5. Hm maxout ([3]): f (x1 , , xn ) = max xi .
1in
Mng neuron c ng dng rng ri gii cc bi ton phn loi, tc l xc nh xem d liu vo
thuc loi g trong mt tp cc la chn cho trc. gii bi ton ny, ta dng mng neuron
m phng mt phn b xc sut trn tp cc la chn. V d ta mun dng mng neuron gii bi
ton xc nhn gng mt (face verification). Tp cc la chn ch gm hai phn t: vi mt cp nh
chn dung bt k, ta yu cu mng neuron tr li "c" hoc "khng" cho cu hi rng hai bc nh
c phi cng mt ngi hay khng. Mng neuron a ra cu tr li da vo vic tnh ton xc
sut xy ra ca tng p n ri chn cu tr li c xc sut cao hn. Trong trng hp ny, gi s
rng tng xc sut ca hai p n l 1, vy th ta ch cn tnh xc sut cho mt p n v suy ra xc
sut ca p n cn li. Mt mng neuron s dng hm sigmoid kch hot lp cui rt ph hp
lm iu ny, v hm sigmoid nhn vo mt s thc trong khong (, +) v tr v mt s thc
trong khong (0, 1).
Tng qut hn, khi tp phng n la chn c nhiu hn hai phn t, ta cn bin mng neuron thnh
mt phn b xc sut P (x) tha mn hai iu kin sau:
1. P (x) 0 x ( l tp la chn);
P
2. x P (x) = 1.
4
(L) (L) (L)
Xt vector trc khi kch hot lp cui, z (L) = z0 , z1 , .., z|LL |1 . Thay v s dng hm
sigmoid, ta dng hm softmax a vector ny thnh mt phn b xc sut. Hm softmax c dng
nh sau:
Khi suy lun thng tin trn mng neuron, ta gi s rng cc tham s (cc ma trn W (l) ) u c
cho sn. iu ny d nhin l khng thc t; ta cn phi i tm cc gi tr ca tham s sao cho mng
neuron suy lun cng chnh xc cng tt. Nh ni trn, cng vic ny c gi l c lng
tham s (parameter estimation), cn c bit n nh qu trnh hun luyn (train) hay hc (learn)
ca mng neuron.
Ta gi h(x; W ) v g(x) ln lt l hm biu din bi mng neuron (vi tp tham s W ) v hm mc
tiu cn m phng. Vic tm ra cng thc tnh ngay ra gi tr ca tp s tham s rt kh khn. Ta
chn mt cch tip cn khc, gim thiu dn khong cch gia h(x; W ) v g(x) bng cch lp li
hai bc sau:
1. o sai lch ca suy lun ca mng neuron trn mt tp im d liu mu {(xd , g(xd ))},
gi l tp hun luyn (training set).
2. Cp nht tham s ca mng W gim thiu sai lch trn.
Cch tnh sai lch s c cp phn 6.1. Cch tnh gi tr mi ca tham s sau mi ln cp
nht s c ni n phn 6.2 v 6.3.
6.1 Hm mt mt
5
x0 xopt
vi D = {(xd , yd )} = {((0, 0), 0), ((0, 1), 1), ((1, 0), 1), ((1, 1), 0)}.
Tip theo, ta cn mt thut ton c th cc tiu ha hm mt mt. Quay li vi nhng kin thc
v cc tiu ha hm s c dy trung hc ph thng, hn ta quen thuc vi nhn xt rng gi
4
Bit nh phn l mt bin ch c th mang gi tr 0 hoc 1. Hm XOR(x, y) nhn vo hai bit nh phn x v
y, tr v gi tr 1 nu x khc y, v tr v gi tr 0 nu x bng y.
5
p y l mt s thc trong khong (0, 1) tr ra t hm sigmoid. Vic bin i s thc ny thnh bit nh
phn ch xy ra khi ta dng mng neuron suy lun. y ta ang ni v vic c lng tham s.
6
Algorithm 2 Thut ton gradient descent.
1: function GRADDESC(x f , )
2: Khi to x0 ty .
3: while kx f (x0 )k > do
4: x0 x0 x f (x0 )
5: Ty chn: cp nht .
6: end while
7: return x0
8: end function
6
C nhiu loi norm, thng ngi ta dng norm-2, tc l di ca vector.
7
Algorithm 3 Thut ton lan truyn ngc.
1: function BACKPROP(x(0) R|L0 | , {W (l) })
2: p dng thut ton 1 vi x(0) tnh cc gi tr z (1) , , z (L) , x(1) , , x(L) v hm mt
mt Loss(z (L) ).
3: (L) z(L) Loss
4: for l = L 1 to 0 do
Loss f 0 (z (l) ) W (l)> (l+1)
5: z (l)
6: W (l)
Loss (l+1) x(l)>
7: end for
return W(0) Loss, , W (L1)
8: Loss
9: end function
Ta minh ha thut ton gradient descent vi v d m phng hm XOR phn 6.1. Gradient ca
hm mt mt theo w s l:
1X
w Loss(w) = w (pd yd )2
2
dD
1X 2
= w sigm(w> xd ) yd
2
dD
X
sigm(w> xd ) yd w sigm(w> xd )
= (9)
dD
X
sigm(w> xd ) yd sigm(w> xd ) 1 sigm(w> xd ) xd
=
dD
X
= (pd yd ) pd (1 pd ) xd
dD
Sau ta ch vic p dng thut ton 2 vi x f (x) w Loss(w) hun luyn mng neuron.
Khi p dng thut ton gradient descent trn mt mng neuron c nhiu lp, ta cn tnh o hm ca
hm mt mt theo tham s ca tng lp neuron mt. Vic tnh o hm theo tng tham s tr nn
phc tp v kh kim sot khi mng neuron c nhiu lp hn v nhiu lin kt phc tp hn gia
cc neuron. Tuy nhin, ta c th tn dng s lp li cu trc gia cc lp v ch cn xem xt cch
tnh o hm mt lp bt k. Thut ton lan truyn ngc (back-propagation) (thut ton 3) khi
qut ha cch tnh o hm mt lp neuron v cch truyn o hm cho lp lin trc.
8
Ta s gii thch dng 5 v dng 6 ca thut ton 3. dng 5, ch rng gradient ca hm mt mt
theo z (l) , z(l) Loss, l mt vector c |Ll | dng, trong dng th i l:
|L(l+1) | (l+1)
X Loss zj
(l)
Loss = (l+1) (l)
zi j=1 zj zi
|L(l+1) | (l+1)
(l+1) zj
X
= j (l)
(nh ngha j )
j=1 zi
|L(l+1) | |L(l) |
X (l+1) X (l) (l)
= j Wjk f (zk ) (10)
(l)
j=1 zi k=1
|L(l+1) |
(l+1) (l) (l)
X
= j Wji f 0 (zi )
j=1
|L(l+1) |
(l) (l+1)
X
0
=f (zi ) (W (l) )>
ij j
j=1
P|L(l+1) | (l+1)
Ta thy j=1 (W (l) )> ij j chnh l dng th i ca vector nhn c t php nhn W (l+1)>
(l+1) 7 . V th, z(l) Loss = f 0 (z (l) ) W (l)> (l+1) , trong l php nhn cc phn t c
V th, W (l)
Loss = (l+1) x(l)> .
7 Overfitting
9
Hnh 3: Mt v d v overfitting. a thc c bc cao hn (xanh dng) v qu ch trng vo vic phi
i qua tt c cc im trong tp hun luyn (en) nn c hnh dng phc tp, khng "bnh thng".
a thc bc thp hn () cho gi tr hm mt mt cao hn trn tp hun luyn nhng li ph hp
hn vi phn b d liu trong thc t. iu ny th hin bng vic a thc bc thp c lng mt
im khng c trong tp hun luyn (xanh) chnh xc hn a thc bc cao.
mng neuron (c th) khc nhau, vi mt thut ton hun luyn tt, chng s a ra kt lun ging
nhau vi cng mt im d liu vo.
i vi bi ton d on, v mc tiu cui cng ca ta l m phng mt hm s n, ta khng nn
cc tiu ha hm mt mt trn tp hun luyn. Nu ta lm nh vy s dn n hin tng overfitting,
tc l mng neuron s hc c mt hm phc tp m phng hon ho nht tp hun luyn. Tuy
nhin, cng do cu trc phc tp, hm ny khng c tnh tng qut ha cao, tc l n rt d sai khi
gp mt im d liu khng c trong tp hun luyn (hnh 3). Khi y, mng neuron ging nh mt
con ngi ch bit hc t m khng bit cch vn dng kin thc gii quyt nhng th cha tng
gp phi. Overfitting l mt vn nghim trng i vi mng neuron v kh nng m hnh ha ca
chng qu cao, d dng hc c cc hm phc tp. Ta s tm hiu mt s phng php thng dng
chn on v ngn nga overfitting cho mng neuron.
10
Hnh 4: Dng sm ngay khi vic hun luyn lm gim chnh xc ca mng neuron trn tp pht
trin.
Mt cch khc ngn nga overfitting l gii hn khng gian hm s m mng neuron c th
biu din. iu ny c tin hnh bng vic thu hp li bin gi tr ca tp tham s. C th hn,
thay v hun luyn mng ch bng hm mt mt, ta thm vo mt i lng th hin " ln" ca tp
tham s:
Loss(w) + kwk (12)
vi l mt hng s chn trc, kwk c gi l norm ca vector w.
Khi ta cc tiu ha hm s 12, c Loss(w) v kwk u s c cc tiu ha. Kt qu l, mng
neuron nhn c s c cc tham s c gi tr nh hn bnh thng, ngn nga n tr nn qu linh
ng v phc tp.
Trong thc hnh, ta thng s dng bnh phng norm-2, tc l bnh phng di ca vector 9 :
2
X
kwk2 = wi2 (13)
i
Bn cnh , hun luyn vi norm-1 li cho ta tp tham s tha (c nhiu phn t l s 0) gip tng
tc tnh ton v gim b nh, nhng norm-1 li khng c o hm lin tc nn vic ti u ha s
kh khn hn:
X
kwk1 = |wi | (14)
i
11
8 Tip theo l g?
Ti liu
[1] D. Bahdanau, K. Cho, and Y. Bengio. Neural machine translation by jointly learning to align
and translate. CoRR, abs/1409.0473, 2014.
[2] X. Glorot, A. Bordes, and Y. Bengio. Deep sparse rectifier neural networks. In International
Conference on Artificial Intelligence and Statistics, pages 315323, 2011.
[3] I. J. Goodfellow, D. Warde-Farley, M. Mirza, A. Courville, and Y. Bengio. Maxout networks.
arXiv preprint arXiv:1302.4389, 2013.
[4] A. Y. Hannun, C. Case, J. Casper, B. C. Catanzaro, G. Diamos, E. Elsen, R. Prenger,
S. Satheesh, S. Sengupta, A. Coates, and A. Y. Ng. Deep speech: Scaling up end-to-end
speech recognition. CoRR, abs/1412.5567, 2014.
[5] S. Hochreiter and J. Schmidhuber. Long short-term memory. 1997.
[6] K. Hornik, M. B. Stinchcombe, and H. White. Multilayer feedforward networks are universal
approximators. Neural Networks, 2(5):359366, 1989.
[7] A. Karpathy, A. Joulin, and F. F. F. Li. Deep fragment embeddings for bidirectional image
sentence mapping. In Z. Ghahramani, M. Welling, C. Cortes, N. Lawrence, and K. Weinberger,
editors, Advances in Neural Information Processing Systems 27, pages 18891897. Curran
Associates, Inc., 2014.
[8] Q. V. Le, R. Monga, M. Devin, K. Chen, G. S. Corrado, J. Dean, and A. Y. Ng. Building
high-level features using large scale unsupervised learning. In In International Conference on
Machine Learning, 2012. 103.
[9] A. L. Maas, R. E. Daly, P. T. Pham, D. Huang, A. Y. Ng, and C. Potts. Learning word vectors
for sentiment analysis. In Proceedings of the 49th Annual Meeting of the Association for
Computational Linguistics: Human Language Technologies, pages 142150, Portland, Oregon,
USA, June 2011. Association for Computational Linguistics.
[10] A. L. Maas, A. Y. Hannun, and A. Y. Ng. Rectifier nonlinearities improve neural network
acoustic models. In Proc. ICML, volume 30, page 1, 2013.
[11] I. S. Rafal Jozefowicz, Wojciech Zaremba. An empirical exploration of recurrent network
architectures. In International Conference in Machine Learning. ICML, 2015.
[12] R. Socher, D. Chen, C. D. Manning, and A. Ng. Reasoning with neural tensor networks
for knowledge base completion. In C. Burges, L. Bottou, M. Welling, Z. Ghahramani, and
K. Weinberger, editors, Advances in Neural Information Processing Systems 26, pages 926
934. Curran Associates, Inc., 2013.
[13] R. Socher, A. Perelygin, J. Wu, J. Chuang, C. D. Manning, A. Y. Ng, and C. Potts. Recursive
deep models for semantic compositionality over a sentiment treebank. In Proceedings of the
12
2013 Conference on Empirical Methods in Natural Language Processing, pages 16311642,
Stroudsburg, PA, October 2013. Association for Computational Linguistics.
[14] C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, and
A. Rabinovich. Going deeper with convolutions. In CVPR 2015.
13