Professional Documents
Culture Documents
TT NGHIP I HC
NGNH CNG NGH THNG TIN
L Quang Thng
Lp: Cng ngh phn mm K51
H NI 05-2011
MC LC
TM TT NI DUNG N TT NGHIP ............................................................................................. v
LI CM N .................................................................................................................................................. vi
DANH SCH CC BNG TRONG LUN VN ......................................................................................viii
DANH SCH CC HNH TRONG LUN VN ......................................................................................... ix
DANH SCH CC BNG TRONG LUN VN ......................................................................................... x
LI NI U .................................................................................................................................................. xi
CHNG 1.
2.1.5. Kt t - C ....................................................................................................................................... 9
2.1.5.1. Kt t chnh ph - Cm .......................................................................................................... 10
2.1.5.2. Kt t lin hp - Cp .............................................................................................................. 10
2.1.6. i t - P ..................................................................................................................................... 10
2.1.6.1. i t s vt ......................................................................................................................... 10
2.1.6.2. i t hot ng tnh cht - Pl ............................................................................................. 10
2.1.6.3. i t nghi vn - Pi .............................................................................................................. 10
2.1.7. Tr t - M .................................................................................................................................... 10
2.2. Cm t ting Vit ................................................................................................................................ 10
2.2.1. Cm danh t - NP ........................................................................................................................ 10
2.2.1.1. Khi nim ............................................................................................................................. 10
2.2.1.2. Cu to ................................................................................................................................. 10
2.2.1.3. Chc nng ng php ca cm danh t ................................................................................. 11
2.2.2. Cm ng t - VP ....................................................................................................................... 11
2.2.2.1. Khi nim ............................................................................................................................. 11
2.2.2.2. Cu to ................................................................................................................................. 11
2.2.2.3. Chc nng ng php ca cm ng t ................................................................................. 12
2.2.3. Cm tnh t - AP ......................................................................................................................... 12
2.2.3.1. Khi nim ............................................................................................................................. 12
2.2.3.2. Cu to ................................................................................................................................. 12
2.2.3.3. Chc nng ng php............................................................................................................. 12
2.3. Cc kiu cu ca ting Vit ................................................................................................................ 13
2.4. M hnh CFG v gii thut phn tch c php CYK .......................................................................... 13
2.4.1. M hnh vn phm CFG .............................................................................................................. 13
2.4.2. Thut ton CYK .......................................................................................................................... 13
2.4.2.1. Thut ton to bng ca CYK[7] ......................................................................................... 14
2.4.2.2. V d minh ha cho thut ton CYK .................................................................................... 14
2.4.2.3. Thut ton CYK ci tin....................................................................................................... 15
2.4.2.4. Nhng kh khn ca thut ton CYK .................................................................................. 16
2.5. M hnh xc sut PCFG ..................................................................................................................... 16
2.5.1. nh ngha c bn v PCFG ........................................................................................................ 16
2.5.2. Cc loi xc sut trong PCFG ..................................................................................................... 16
2.5.2.1. Xc sut trong (inside) ......................................................................................................... 16
2.5.2.2. Xc sut ngoi (outside) ....................................................................................................... 17
2.5.3. Cch kh nhp nhng mt c php vi PCFG ............................................................................. 17
2.6. Kt chng ......................................................................................................................................... 18
CHNG 3.
ii
4.1. Phn tch v thit k chng trnh phn tch c php tng hp ting ni ting Vit.......................... 41
4.1.1. M hnh tng th ......................................................................................................................... 41
4.1.2. M t cc chc nng trong phn tch c php ............................................................................. 42
4.1.2.1. Tin x l vn bn................................................................................................................. 42
4.1.2.1.1. Tch t .......................................................................................................................... 42
4.1.2.1.2. Gn nhn ....................................................................................................................... 42
4.1.2.2. Qun l lut (Rule Manager) ................................................................................................ 43
4.1.2.2.1. Load file d liu cha tp lut ...................................................................................... 43
4.1.2.2.2. Tnh xc sut PCFG cho mi lut ................................................................................. 43
4.1.2.3. Phn tch c php ................................................................................................................. 44
4.1.3. T chc lu tr d liu ................................................................................................................ 44
4.1.3.1. D liu vn bn chun ha u vo...................................................................................... 44
4.1.3.2. D liu chuyn giao ca b tin x l vn bn ..................................................................... 45
4.1.3.3. Tp lut c php ................................................................................................................... 46
4.1.3.4. Kho d liu VietTreeBank ................................................................................................... 46
4.1.3.5. D liu u ra ca h thng.................................................................................................. 47
4.2. Xy dng h thng .............................................................................................................................. 48
4.2.1. Cng c la chn ......................................................................................................................... 48
4.2.2. Biu lp .................................................................................................................................. 49
4.2.3. Thit k chi tit lp...................................................................................................................... 49
4.2.3.1. Gi RuleManager ................................................................................................................. 49
4.2.3.1.1. Lp Rule ....................................................................................................................... 49
4.2.3.1.2. Lp RuleSet .................................................................................................................. 50
4.2.3.2. Gi phn t parsing Element ................................................................................................ 51
4.2.3.2.1. Lp CYK Element ........................................................................................................ 51
4.2.3.2.2. Lp Cell ........................................................................................................................ 52
4.2.3.2.3. Lp AstarElement ......................................................................................................... 52
4.2.3.3. Gi x l chung Common ..................................................................................................... 53
4.2.3.3.1. Lp PartOfSpeech ......................................................................................................... 53
4.2.3.3.2. Lp Functions ............................................................................................................... 54
4.2.3.4. Gi Analysis ......................................................................................................................... 54
4.2.3.4.1. Lp CYKBeamSearch .................................................................................................. 54
4.2.3.4.2. Lp AStar ...................................................................................................................... 55
4.2.3.4.3. Lp LeLightWin ........................................................................................................... 56
4.2.3.4.4. Lp Sentence ................................................................................................................. 58
4.3. Th nghim v nh gi ..................................................................................................................... 59
4.3.1. Giao din chng trnh ................................................................................................................ 59
4.3.2. Kt qu th nghim ..................................................................................................................... 60
4.3.2.1. Tp d liu th nghim ........................................................................................................ 60
4.3.2.2. Kt qu phn tch. ................................................................................................................ 60
4.3.3. nh gi h thng........................................................................................................................ 64
4.4. Kt chng ......................................................................................................................................... 64
KT LUN V HNG PHT TRIN ..................................................................................................... 65
Ti liu tham kho .......................................................................................................................................... 67
PH LC ........................................................................................................................................................ 68
iii
L Quang Thng
5. Xc nhn ca gio vin hng dn v mc hon thnh ca ATN v cho php bo
v:
H Ni, ngy 28 thng 05 nm 2011
Gio vin hng dn
iv
TM TT NI DUNG N TT NGHIP
Tng hp ting ni l mt lnh vc p ng li mong mun ca con ngi trong giao
tip vi my tnh qua ting ni. Trong , con ngi mun my tnh c ln nhng on
vn bn mong mun. Lnh vc ny c tm hiu v pht trin t kh sm v n by gi
t c nhng thnh qu quan trng. Vi mong mun tm hiu v pht trin b tng
hp ting ni cho ting Vit, n chn lnh vc tng hp ting ni lm hng nghin
cu. Trong n ny, n tp trung vo phn phn tch c php trong tng hp ting ni.
Vi mong mun ci thin hiu nng cho h thng ting ni tng hp.
Trong n ny, n tp trung i tm hiu c s l thuyt phn tch c php v ng
php ting Vit. Sau , n tp trung vo vn ci tin gii thut phn tch c php,
ng thi xut ra thut ton hon thin hn cht lng, tc ca b phn tch c php
p dng cho tng hp ting ni ting Vit. n cng tin hnh ci t v nh gi hiu
qu ca h thng. T a ra hng pht trin tip theo cho n.
LI CM N
u tin, con xin cm n b m, nhng ngi nui nng, ng
vin, gip con cho n tn lc con c th t mnh t tay g ra nhng
dng ny. Anh xin cm n c em gi d dn, nghch ngm ca anh
ng vin, chc tcanh trong sut qu trnh anh lm n.
Em xin c gi li cm n chn thnh ti cc thy c gio trong
trng i hc Bch Khoa H Ni cng nh cc thy c trong Vin Cng
ngh thng tin v truyn thng truyn dy cho em nhng kin thc v
kinh nghim qu gi trong sut qu trnh hc tp tu dng trong sut 5
nm qua.
Em xin c gi li cm n n TS. Cao Tun Dng, TS. Trn
t v Ths.Nguyn Th Thu Trang dn dt em trong sut qu trnh lm
n. Em xin cm n mt ln na, thy c l nhng ngi gio vin tn
tnh nht m em tng bit n.
Em xin c bit gi li cm n n thy gio B Lm. D khng
phi l gio vin hng dn ca em nhng nu khng c thy, em
khng th hon thnh n tt n nh vy.
Em xin c gi li cm n n bn T Hong Long lp Vit Nht
K51. Cu l ngi bn thn lun st cnh cng t t cp 3 n gi. V
cho n tn khi ln i hc, ri n c khi t c th t ho v hon
thnh n, cu vn lun l ngi gip t c ngh lc vt qua nhng
cn ng lng v mt mi.
Em xin c gi li cm n n nhm bn Q4T ca lp cng ngh
phn mm. Chng ta c th khng phi l anh em, nhng chng ta l mt
i tht tuyt vi phi khng cc bn. Hy cng nhau hon thnh tt
n cuc i cc bn nh, cc chin hu ca t.
Em xin c gi li cm n n nhm bn trung tm mica. Cc
cu chnh l ngun cm hng khin t khng ngng phn u bn thn
mnh. Chng ta cng tri qua nhiu k nim tht kh qun, cc bn s
mi l nhng ngi bn tt ca t.
vi
vii
Gii ngha
1.
CFG
Context-Free Grammar
2.
PCFG
3.
LPCFG
4.
CYK
Cocke-Younger-Kasami
5.
Earley
6.
TreeBank
viii
ix
LI NI U
My tnh? Mt cm t qu tr nn qu quen thuc i vi chng ta trong x
hi thng tin ngy nay. My tnh tc ng n mi lnh vc trong cuc sng ca
chng ta. My tnh gip con ngi rt nhiu trong cuc sng hi h v khc
nghit ny. Nh c my tnh, cng vic ca chng ta tr nn d dng hn rt nhiu.
Nh c my tnh, con ngi trn th gii c th xch li gn nhau hn. Chng ta
lun c v vn nhng iu th v khm ph v my tnh. My tnh cn l khi
ngun ca rt nhiu pht minh v sng to khoa hc. C th ni my tnh hin nay
tr thnh mt phn khng th thiu trong cuc sng ca chng ta.
C bao gi chng ta ngh rng my tnh l bn ca con ngi? C th chng ta
s tr li l khng. Nhng cu tr li thc s ch l cha. Chng ta hy th hnh
dung s tuyt vi bit bao nu bn cnh ta l mt chic my tnh va c th gip
ta lm c ni vic li va c th tr chuyn, tm s vi ta nh mt ngi bn. Vi
hi vng mt ngy no , my tnh c th thc s tr chuyn c vi con ngi,
cc nh khoa hc trn th gii mit mi nghin cu h thng tng hp ting ni.
Tng hp ting ni l h thng gip my mc c th m phng ging ni ca con
ngi mt cch chnh xc v t nhin nht c th. Cho n nay, rt nhiu nhng sn
phm nghin cu v tng hp ting ni trn th gii cho ra nhng kt qu rt kh
qua. Ti Vit Nam cng c nhiu b tng hp ting ni c pht trin nh b
tng hp Sao Mai ca trung tm Sao Mai, Hoa Sng ca trung tm nghin cu
Mica H BKHN, Ting ni phng Nam ca HQG-TPHCM. Tuy nhin, cc
b tng hp trn vn cn rt hn ch v mt cht lng ca ging ni tng hp
c. Vi mong mun c th ci thin c cht lng ca b tng hp ting ni,
n quyt nh i su vo nghin cu v h thng phn tch c php trong tng
hp ting Vit. Phn tch c php l giai on nm trong khu phn tch vn bn ca
tng hp ting ni, c nh hng rt mnh m n cc cng on khc trong tng
hp ting ni.
Ti Vit Nam, cc kt qu nghin cu ca phn tch c php vn cn rt hn
ch v y thc s l mt bi ton khng d. S phc tp ca bi ton ny th hin
mt s c im m cn phi c gii quyt nh nhp nhng ng ngha, bng n
t hp, v kh nng bao qut cc trng hp ca ngn ng.
Trn c s nhng nghin cu c sn v phn tch c php ting Vit, n
s tip tc pht trin v xut nhng gii php ci tin gip ti u ha hiu nng
ca b phn tch c php trong tng hp ting Vit.
B cc n:
Chng 1: Phn tch c php trong tng hp ting ni.
Chng ny chng ta s tp trung gii thiu v tng hp ting ni ting Vit
ng thi ch ra vai tr ca b phn tch c php trong tng hp ting ni ting
Vit. T , nu ln mc ch v nhim v ca n
xi
xii
Nhim v ca n tt nghip.
Tng hp mc cao
Phn tch vn bn
Phn tch cu trc
Chun ha vn bn
Phn tch ngn ng
Phn
tch
ng
m
Phn tch ng iu
Tn s c bn
Trng
Cng
La
chn
n v
m
Ghp
ni
n v
m
1.1.1. Chun ha vn bn
Chun ha vn bn l qu trnh pht hin v chun ha nhng on m h
thng tng hp khng th x l c v dng c th x l c. Trong h thng tng
hp ting ni, vic chun ha vn bn l cng on u tin c nh hng quan
trng trong vic m bo vn bn c c mt cch ng n. V d : mt on
vn bn ti bo v n vo ngy 08/06/2011 ti trng HBKHN s c
chuyn thnh dng c c ti bo v n vo ngy mng tm thng su nm
hai nghn khng trm mi mt ti trng i hc Bch Khoa H Ni
Hin nay c ba phng php tng hp ting ni. Phng php n gin nht
pht sinh ting ni tng hp l pht cc mu ting ni thu t ting ni t
nhin (nh cc t hoc cu). Phng php ny cho cht lng tng i tt nhng
gp phi hn ch l s lng t vng trong c s d liu rt ln. Bn cnh ting
ni cng c th to ra bng cch m phng h thng pht m. Phng php ny cho
cht lng rt tt nhng thc hin kh phc tp. Mt phng php na cng c
dng tng hp ting ni l tng hp formant.
Sinh vin thc hin: L Quang Thng-20062965-Kha K51-Lp Cng ngh Phn mm B
rt nhiu v mt thut ton, nhng phc tp trong qu trnh phn tch th li gim
i rt nhiu.
1.3.3.3. Nhn xt
Earley v CYK c th coi l hai thut ton ni ting nht trong phn tch c
php, tng trng cho hai chin lc phn tch top-down v bottom-up. Hai thut
ton c rt nhiu u im so vi chin lc phn tch c bn nhng li u mang
nhng nhc im ca hai chin lc m n da theo:
Thut ton Earley tuy chc chn tr ra c cy phn tch c php nhng li
khng m bo cy bao ph ht c ton b cu.
Thut ton CYK m bo cy c th bao ph ht c ton b cu nhng li
khng m bo cy a c n ch.
V trong tng hp ting ni, iu quan trng nht l cy phn tch c php phi
bao ph ht c ton b cu u vo (nu khng b tng hp s khng c ra y
cu u vo) nn n s tm hiu v xut gii php phn tch c php cho
ting Vit da trn hng i ca thut ton CYK.
Sinh vin thc hin: L Quang Thng-20062965-Kha K51-Lp Cng ngh Phn mm B
1.5. Kt chng
Trong chng u tin ny, ta xc nh c:
Phn tch c php trong tng hp ting ni c vai tr rt quan trng nh
hng n tt c cc cng on trong tng hp ting ni.
im qua mt s m hnh CFG, PCFG,.. v cc gii thut earley, CYK
p dng cho phn tch c php.
Xc nh c nhim v c th ca n l m hnh ha ting Vit,
nghin cu v ci tin CYK kt hp vi m hnh p dng cho phn tch
c php ting Vit.
Trong chng sau, chng ta s i tm hiu v ting Vit v i su vo nghin
cu c s l thuyt ca phn tch c php
Sinh vin thc hin: L Quang Thng-20062965-Kha K51-Lp Cng ngh Phn mm B
2.1.2. ng t - V
ng t c th chia nh thnh nhng loi nhng loi sau:
2.1.2.1. ng t ngoi ng - Vt
l cc ng t nh n, vit, c ... Khi s dng cc ng t ny
thng phi c ph t ch i tng (i tng chu tc ng ca hot ng).V d
nh n bnh, vit th, may o
2.1.2.2. ng t ni ng - Vi
Cc ng t ny c c im l khi dng n lm phn thuyt trong nng ct
cu l ngha, tc l khng cn c sau chnh t ca ng loi ph t ch i
tng ca hat ng. V d: em b ang ng, con chim ang bay...
2.1.2.3. ng t tn ti - Ve
S vt c th c, cn, ht hay mt. Nu nhng ng t loi ny l chnh t th
sau cn cc ph t ch s vt tn ti. V d: c tin, cn go, ht n
2.1.2.4. ng t bin ha - Vf
ch cc trng thi bin ha ca s vt, khi s dng phi c ph t ch kt
qu bin ha. V d: nn ngi
2.1.2.5. ng t ch - Vv
Cc trng thi ch l : mun, quyt, dm, toan, nh Khi s dng loi ng
t ny lm chnh t th phi c ph t ch ni dung ch. V d: dm ngh, toan
ni
2.1.2.6. ng t tip th - Va
y l trng thi mang tnh cht th ng. C hai trng thi chnh l b hoc
phi v c. ng sau cc ng t ny phi c ph t ch s vt tip th.V d: b
mng, c khen
2.1.2.7. ng t so snh - Vc
Cc s vt c th c so snh nh gi trong s so snh vi cc s vt khc
v mt phng din nht nh. C ba trng thi so snh: bng, hn v km. Cc
ng t biu hin cho cc trng thi c gi l ng t so snh. Cng nh hu
ht cc loi ng t trn, khi cc ng t ny c dng lm chnh t th thng
c ph t ch i tng i km. V d: bng nhau, hn ngi
2.1.2.8. ng t c bit: ng t l - Vz
ng t l c ngha ng php to ln trong ng php ting Vit. i vi
cu n bnh thng, vic phn chia gia hai loi cu t v cu lun ph thuc vo
s xut hin ca ng t l. V d: ti l ngi lnh c cng.
Sinh vin thc hin: L Quang Thng-20062965-Kha K51-Lp Cng ngh Phn mm B
2.1.3. Tnh t - A
C th phn t loi tnh t thnh cc tiu loi chnh sau y: tnh t hm cht
v tnh t hm lng.
2.1.3.1. Tnh t hm cht - Ai
Khi tnh t loi ny lm chnh t trong ng th trc c th xut hin cc
ph t ch mc . Ch l pha sau tnh t loi ny, trong trng hp ny, loi ph
t ch phm vi th hin tnh cht. Nhng t loi ny: tt, p, xu, thng minh,
ngoan, ngu xun V d: rt gii ton.
2.1.3.2. Tnh t hm lng - An
l nhng tnh cht nh: cao, thp, ngn, di, rng, hp, nng, su, xa,
gn Tnh t loi ny thng i km vi ph t ch nh lng, hay ch mt ci
mc c tc dng nh lng. V d: cao hai thc, di mt nghn km
2.1.4. Ph t - R
2.1.4.1. Ph t thi gian - Rt
y l cc ph t biu th ngha ng php v thi gian. l cc t: , s,
ang, va, mi, sp, tng, lin, bn, ri
2.1.4.2. Ph t mc - Rd
y l cc ph t biu th cc ngha ng php v mc . l cc t: rt,
kh, hi, qu, lm
2.1.4.3. Ph t so snh - Rc
y l cc ph t biu th rng hot ng, trang thi hay tnh cht din ra qua
so snh trong nhng iu kin thi gian, khng gian nht nh ca mt hon cnh.
Nhng ph t l: cng, u, vn, c, cn, lin tc, lin tip, khng ngng V
d: Mai v Lan u hc gii.
2.1.4.4. Ph t khng nh (RfY) ph nh (RfN)
y l cc ph t biu th ngha ph nh hay khng nh. Ngha ph nh:
khng, chng, cha. Ngha khng nh: c. V d: ti khng c tin, n c ni
di...
2.1.4.5. Ph t mnh lnh - Ri
Ph t biu th sai khin, khuyn bo, mi mc, can ngn. V d: em ng
i v mun, anh nn i hc ng gi
2.1.5. Kt t - C
Cc tiu loi t ca kt t gm c hai loi : kt t chnh ph (Cm) v kt t
lin hp (Cp).
Sinh vin thc hin: L Quang Thng-20062965-Kha K51-Lp Cng ngh Phn mm B
2.1.5.1. Kt t chnh ph - Cm
y l nhng kt t biu th quan h chnh ph. l nhng t nh: do, ca,
, bi, bi v V d: chng ti chin u anh dng nh vy ginh chin
thng.
2.1.5.2. Kt t lin hp - Cp
y l cc kt t biu th quan h lin hp. c th l cc t nh v, vi,
hay, hoc, cng hay cc cp nh nu th, tuy nhng. V d: nu tri ma
th chng ti s nh, n khng nhng ngoan m cn hc gii.
2.1.6. i t - P
2.1.6.1. i t s vt
y l cc i t dng ch s vt, ta c th s dng chng nh danh t.
Gm ba loi: i t xng h (Pp: ti, tao, my, chng my, chng n); i t
khng gian, thi gian (Pd: y, y, , kia, y); i t s lng (Pn: by nhiu).
V d: chng ti ang n trng.
2.1.6.2. i t hot ng tnh cht - Pl
y l cc i t dng ch hot ng, tnh cht: th, vy V d: vy l
ht!
2.1.6.3. i t nghi vn - Pi
Cc i t dng ch trong cu hi nh ai, g, chi, u, bao nhiu, sao, th
no
2.1.7. Tr t - M
Cm t(E): i ch, d, vng, i chao
Loi t(Nl): ci, con, cy, ngi, tm
S t(Nq): mt, hai, ba, vi, dm, mi
2.2.1. Cm danh t - NP
2.2.1.1. Khi nim
Cm danh t l mt t hp t c danh t lm thnh t chnh, cc thnh t ph
ng trc v sau b ng cho thnh t chnh.V d: Nhng bng hoa ny
2.2.1.2. Cu to
a) Thnh t chnh:
Sinh vin thc hin: L Quang Thng-20062965-Kha K51-Lp Cng ngh Phn mm B
10
2.2.2. Cm ng t - VP
2.2.2.1. Khi nim
Cm ng t l cc t hp t c ng t lm thnh t chnh, cc thnh t ph
ng trc v sau b ngha cho thnh t chnh. V d: ang c sch.
2.2.2.2. Cu to
a)Thnh t chnh:
Thng l mt ng t. Khi c hai ng t i lin nhau (ng t khng c
lp v ng t c lp) c th coi ng t th nht l thnh t chnh ca ng t.
V d:
- ang hc bi.
- toan v qu
11
c) Cc thnh t ph sau
V cu to: Phn ph sau c th l mt t, mt cm t hoc mt cm ch
v.
V d:
- hc bi (1t)
- n mt ci bnh (1 cm t).
- Mi ngi / bit anh y rt tch cc ( cm ch v. )
V ngha: thnh t ph sau thng b ngha cho ng t chnh.
2.2.2.3. Chc nng ng php ca cm ng t
Cm ng t c th lm ch ng, v ng, trng ng, b ng, nh ng.
Ch ng: Bo v t quc / l ngha v ca mi ngi.
V ng: Mt tri / ln cao.
Trng ng: Tan bui hp, mi ngi u ra v.
B ng: B i / i nh gic
nh ng: Quyn sch mn trn th vin / rt hay.
2.2.3. Cm tnh t - AP
2.2.3.1. Khi nim
Cm tnh t l mt t hp t c tnh lm thnh t chnh, cc thnh t ph ng
trc v sau b ngha cho thnh t chnh: VD: vn p mi
2.2.3.2. Cu to
a) Thnh t chnh: Thng l cc tnh t c mc .
V d: rt xinh p
b) Cc thnh t ph trc:
Cng nh cm ng t, thnh t ph trc ca cm tnh t cng c th l
cc ph t ch thi gian, s tip din, s khng nh hay ph nh v nht l ph t
ch mc d (rt, hi, kh, qu) tr cc ph t mnh lnh (hy, ng, ch)
V d: vn tt, cn p, rt hin
c) Cc thnh t ph sau:
V cu to: Phn ph sau c th l mt t, mt cm t mt cm ch v
V d: - ngoan lm (1 t)
- rng ba trm mt (1 cm t)
- p nh trng mi mc ( 1 cm C-V)
V ngha: cc thnh t ph sau ca cm tnh t thng b sung ngha
cho tnh t lm thnh t chnh.
2.2.3.3. Chc nng ng php
Cm tnh t cng c th lm ch ng, v ng, trng ng, nh ng, b ng.
V d: - Li cho tp th tc l li cho c nhn. (ch ng)
- N / nhanh nh sc. (v ng)
Sinh vin thc hin: L Quang Thng-20062965-Kha K51-Lp Cng ngh Phn mm B
12
13
NP
S
P
NP
AP;
AP;
N;
S
NP
AP
N
N
R
AP;
P;
A;
Sinh vin thc hin: L Quang Thng-20062965-Kha K51-Lp Cng ngh Phn mm B
14
15
P( N i
j ) =1.
Ni
q 1
j
r s
P (N N N ) r(p,d) s(d+1,q)
r ,s
dp
16
j(p,q) =
f , g j e q 1
f ,g
p 1
e 1
( p, e) P( N f N j N g ) g (q 1, e) +
NP VP
P NP
V NP
VP PP
bng
theo di
1.0
1.0
0.7
0.3
1.0
1.0
NP
NP
NP
NP
NP
NP
NP PP
Gia nh
s lin lc
theo di
con
im
0.4
0.1
0.18
0.04
0.18
0.1
Sinh vin thc hin: L Quang Thng-20062965-Kha K51-Lp Cng ngh Phn mm B
17
(t1)
2.6. Kt chng
Chng ny n trnh by v ting Vit v c s l thuyt phn tch c
php ting Vit vi thut ton CYK ci tin v m hnh PCFG. Chng sau, n
s trnh by v cc ci tin cng nh nhng tng, xut p dng cho phn tch
c php ting Vit.
Sinh vin thc hin: L Quang Thng-20062965-Kha K51-Lp Cng ngh Phn mm B
18
19
3.1.3. Quy trnh p dng v thc hin thut ton beam search
Nh trnh by trn, thut ton beam search ph thuc rt nhiu v c
lng nh gi cho mi nt. Nu hm c lng tnh ton c th khin cho vic cc
nt dn n cy phn tch c php c xc sut cao nht b ct ta mt, dn n vic
Sinh vin thc hin: L Quang Thng-20062965-Kha K51-Lp Cng ngh Phn mm B
20
y chng ta s dng log thay v php nhn nh cng thc tng qut v
php nhn cc xc sut khin cho kt qu tnh ra rt nh, gn nh bng 0. iu ny
dn n mt s tr ngi trong vic so snh v tnh ton.
21
Cng thc chnh l cng thc tnh inside ca mt nt hon chnh. Vy cch
lm c xut trong trng hp nt gi l chp nhn c.
3.1.3.2. Cch tnh xc sut outside trong beam search
Tnh inside ca mt nt c trnh by bn trn nghe c v phc tp nhng
thc ra li rt d dng. V gii thut inside cng c thc hin da theo chin lc
bottom-up y nh thut ton CYK. Nhng vi outside, s kh khn trong vic tnh
ton mt cp khc hon ton. V gii thut outside c tnh ton da theo
chin lc top-down, ngha l phi xut pht t nhng hng trn trong bng
CYK th mi c th tnh c nhng hng di, iu ny ngc hn vi inside.
Nh hnh v minh ha bn di, y l mt cy phn tch c php hon chnh,
v nt s 4 ang l nt c xt. By gi nu mun tnh c outside ca nt s 4
chng ta phi s dng n ton b phn c t m. C th:
Outside(4) = lg(2 4 5) +Outside(2) + inside(5).
Outside(2) c tnh quy np theo Outside(4):
Outside(2) = lg(1 2 3) +Outside(1) + inside(3).
Outside(1) = 1 nu {1} l S .
=0 nu {1} khng phi l S.
Sinh vin thc hin: L Quang Thng-20062965-Kha K51-Lp Cng ngh Phn mm B
22
23
24
ton. S dng thut ton beam search c th gip cho b phn tch c php ci thin
rt ng k v tc . Nu s dng thut ton CYK kt hp vi beam search, th k
c vi mt cu c phc tp l 40 t (cha k s m tit), b phn tch c php
ch mt vi pht l c th cho ra kt qu. Bnh thng nu s dng thut ton CYK
vt cn bng, thi gian b phn tch c php cho ra kt qu c th ti vi ting!
Nhc im: Tuy ni l c ci thin v mt tc nhng thc ra thut ton
beam search khng phi l mt thut ton ti u. Nu chng ta cho ngng qu
nh, beam search s c th ct ta mt c phng n ti u, cn nu m bo
rng beam search s khng ct ta mt phng n ti u th ch cn cch l tng
ngng ln rt ln, m iu ny cng gn tin ti vic s dng thut ton vt cn!
Chnh v vy khng c g m bo rng thut ton beam search s gip b phn tch
c php a ra c u ra ti u c.
25
function A*(im_xut_pht,ch)
var ng := tp rng
var q := to_hng_i(to_ng_i(im_xut_pht))
while q khng phi tp rng
var p := ly_phn_t_u_tin(q)
var x := nt cui cng ca p
if x in ng
continue
if x = ch
return p
b sung x vo tp ng
foreach y in cc_ng_i_tip_theo(p)
a_vo_hng_i(q, y)
return failure
26
27
28
29
X1(1-8)
X2(6-16)
X3(15-35)
X4(5-20)
X5(2-7)
X6(10-11)
X7(8-27)
X8(2-21)
X9(9-11)
X10(2-13)
X11(6-14)
X12(15-26)
X13(14-23)
X14(5-18)
X15(1-7)
X16(9-16)
X17(12-17)
X18(7-18)
X19(6-25)
X20(13-26) X21(11-16)
X22(9-24)
X23(11-20)
X24(8-18)
X25(7-16)
X26(14-16)
X27(4-6)
X28(13-21)
X29(4-8)
X30(11-13)
Tt c nhng chui c th em ra kt hp ca ng c vin X[7-10] l :
Bng 3-2. Cc chui kt hp ca X vi CHART
V tr
[2-7] [7-10]
K hiu
X5 X
Sinh vin thc hin: L Quang Thng-20062965-Kha K51-Lp Cng ngh Phn mm B
30
[1-7] [7-10]
X15X
[1-7] [7-10] [10-11] [11-13]
X15 X X6 X30
[2-7][7-10][10-11] [11-13] [13-26]
X5 X X6 X30 X20
[7-10] [10-11] [11-20]
X X6 X23
[7-10] [10-11] [11-13]
X X6 X30
[7-10] [10-11] [11-13] [13-26]
X X6 X30 X20
[7-10] [10-11] [11-13] [13-21]
X X6 X30 X28
[2-7] [7-10] [10-11]
X5 X X6
[2-7] [7-10] [10-11] [11-16]
X5 X X6 X21
[2-7] [7-10] [10-11] [11-20]
X5 X X6 X23
[2-7] [7-10] [10-11] [11-13]
X5 X X6 X30
[7-10] [10-11] [11-16]
X X6 X21
[1-7] [7-10] [10-11]
X15 X X6
[2-7] [7-10] [10-11] [11-13] [13-21]
X5 X X6 X30 X28
[1-7] [7-10] [10-11] [11-16]
X15 X X6 X21
[1-7] [7-10] [10-11] [11-20]
X15 X X6 X23
[7-10] [10-11]
X X6
[1-7] [7-10] [10-11] [11-13] [13-26]
X15 X X6 X30 X20
[1-7] [7-10] [10-11] [11-13] [13-21]
X15 X X6 X30 X28
Sau nhng chui ny s c kim tra, nu chui l v phi ca mt lut
no trong tp c php chng Z X15 X X6 X30 X20, th Z s c thm vo
AGENDA x l. V thut ton ny cng lp cho n khi tm c p n S(1-n)
trong CHART hoc trong AGENDA khng cn phn t no xt.
Gii thut ny s khin cho thi gian phi thc hin mi bc ca gii thut
A* tng ln tuy nhin s bc phi xt s nh hn nhiu v khng xy ra trng
hp bng n t hp. Bi ton t ra y l lm sao c th x l vic a ra cc
chui kt hp vi thi gian ngn nht. gii quyt vn ny, trong phn sau,
chng ta s i su vo m t r cch thc hin thut ton lelightwin.
3.3.2.2. M hnh thut ton lelightwin c bn
M hnh ca thut ton lelightwin c m t nh s bn di, gm c hai
giai on: giai on phn loi phn t v giai on sinh chui kt hp.
31
Cc khi trong gii thut lelightwin c chia lm hai loi: khi nhng phn
t nm bn tri X, v khi nhng phn t nm bn phi X.
Nhng phn t nm bn tri X: y l nhng phn t m v tr end (kt
thc) ca n <= v tr start (bt u) ca X. Nhng phn t c cng v tr end s
c add vo trong khi c gn nhn l end. V d khi 2 s gm nhng phn
t bn tri X c v tr kt thc bng 2. Tt c cc khi ny nm trong mt khi to
hn gi l khi bn tri.
Nhng phn t nm bn phi X: y l nhng phn t m v tr start ca n
c gi tr >= v tr end ca X. Nhng phn t c cng v tr start s c add vo
trong khi c gn nhn l start. Tt c cc khi ny cng nm trong mt khi
gi l khi bn phi.
3.3.2.2.2. Sinh chui kt hp da trn cc phn t phn loi
Vi u vo l tp cc phn t CHART c phn loi cn thn, chng ta
s bt u cng on sinh chui. Thut ton lelightwin dng sinh chui bao gm
Sinh vin thc hin: L Quang Thng-20062965-Kha K51-Lp Cng ngh Phn mm B
32
33
Sinh vin thc hin: L Quang Thng-20062965-Kha K51-Lp Cng ngh Phn mm B
34
35
Sinh vin thc hin: L Quang Thng-20062965-Kha K51-Lp Cng ngh Phn mm B
36
37
Sinh vin thc hin: L Quang Thng-20062965-Kha K51-Lp Cng ngh Phn mm B
38
3.5. Kt chng
Nh vy, gii quyt nhng kh khn ca bi ton phn tch c php ting
Vit, n thc hin c nhng cng vic sau
Sinh vin thc hin: L Quang Thng-20062965-Kha K51-Lp Cng ngh Phn mm B
39
Sinh vin thc hin: L Quang Thng-20062965-Kha K51-Lp Cng ngh Phn mm B
40
4.1. Phn tch v thit k chng trnh phn tch c php tng
hp ting ni ting Vit
4.1.1. M hnh tng th
M hnh ca h thng phn tch c php p dng cho tng hp ting ni ting
Vit c th c m t mt cch tng qut nh sau:
Sinh vin thc hin: L Quang Thng-20062965-Kha K51-Lp Cng ngh Phn mm B
41
42
43
C( N j )
C( N j )
u vo:
- Cu u vo sau khi c tch t v gn nhn.
- Cc cy phn cp d liu ca cc nhn c lu trong file.
- Bng bm cha tp lut c php ting Vit c tnh xc sut da vo
VietTreeBank.
u ra: mt cy phn tch c php ca cu u vo.
Cch thc hin: C hai cch thc thc hin chc nng ny:
S dng gii thut phn tch c php A*.
S dng gii thut CYK beam search.
44
45
<w pos="V">ng</w>
<w pos="N">trc</w>
<w pos="N">mt</w>
<w pos="P">ti</w>
<w pos="V">l</w>
<w pos="M">mt</w>
<w pos="P">g</w>
<w pos="N">n ng</w>
<w pos="A">cao</w>
<w pos="A">lu u</w>
4.1.3.3. Tp lut c php
Tp lut c php l d liu quan trng nht quyt nh rt nhiu n s thnh
cng ca h thng. Chnh v vy, vic lu tr n cng phi c t chc mt cch
hp l m bo thun tin cho qu trnh c ghi cng nh tc x l.
Nh gii thiu cc phn trc, tp lut c php ca h thng bao gm 938
lut c xy dng da trn tp d liu mu VietTreeBank v c s tinh chnh
cho ph hp vi b phn tch c php. Cu trc lu tr lut c php ca h thng l
mt file xml c dng:
<VietParserRuleSet>
<Rule id="{s th t ca lut}"
probability="{xc sut PCFG ca lut}">
<left>{v tri ca lut}</left>
<right>{v phi ca lut}</right>
</Rule>
</VietParserRuleSet>
4.1.3.4. Kho d liu VietTreeBank
Kho d liu mu VietTreeBank l mt tp d liu gm rt nhiu cu ting Vit
c phn tch c php chun xc bng tay. Tp d liu TreeBank rt hay c
dng trong nhng ng dng lin quan n x l ngn ng t nhin, m c bit rt
c ch trong vic xy dng nhng h phn tch c php cht lng cao. V vy, vi
mt h thng phn tch c php, c th ci thin cht lng u ra ca kt qu
phn tch th chc chn phi c mt TreeBank h tr cho ring mnh. Bn thn h
thng cng s dng mt tp d liu VietTreeBank c lu tr vi cu trc dng
cy. Mi mt cy phn tch c php s nm trong mt cp th ng m <s></s> v
s dng cp ngoc n () ngn cch gic cc nhn thnh phn ring bit. Cc
nhn con s nm trong () ca nhn cha.
Sinh vin thc hin: L Quang Thng-20062965-Kha K51-Lp Cng ngh Phn mm B
46
Mi mt cu s c lu trong mt cp th <s></s>.
Trong mi mt cp th <s></s>, s nhiu cp th <parse></parse> tng
ng vi cc cch phn tch c php cho cu nm trong cp th <s></s>
ang c xt.
Trong cc cp th <parse></parse> l cch phn tch ca cu c lu tr
di dng cy. Cc th tng ng vi nhn con s c nm trong cc th
tng ng vi nhn cha.
47
Sinh vin thc hin: L Quang Thng-20062965-Kha K51-Lp Cng ngh Phn mm B
48
4.2.2. Biu lp
Sinh vin thc hin: L Quang Thng-20062965-Kha K51-Lp Cng ngh Phn mm B
49
V tri ca lut
S ln xut hin ca lut
trong tp VietTreeBank
V phi ca lut
Xc sut PCFG ca lut
Tr v gi tr ca bin left
Gn gi tr cho bin left
Tr v gi tr bin count
Gn gi tr cho bin count
Tr v bin prob
Gn gi tr cho bin prob
Tr v gi tr ca bin right
Gn gi tr cho bin right
4.2.3.1.2. Lp RuleSet
L lp qun l ton b tp lut, cung cp cc phng thc ly lut
Bng 4-6. Bng m t ca lp RuleSet
Mng lu cc lut
Sinh vin thc hin: L Quang Thng-20062965-Kha K51-Lp Cng ngh Phn mm B
50
Tn: arlRule
Phm vi: private
Kiu:
HashMap<String, ArrayList<AStarRule>>
Tn: htRule
Phm vi: private
Kiu: ArrayList<Integer>
Tn: arlTotal
Phm vi: public
Kiu: boolean
Phng Tn+tham s: readRule()
thc
Phm vi: public
Kiu: ArrayList<AStarElement>
Tn+tham s:
getCombine(ArrayList<AStarElement> as)
Thuc
tnh
Bng bm lu cc
lut vi kha l v
phi ca lut
Mng lu tng s
lut ca mi cm t
Truy cp v c ra
cc lut c php c
lu trong c s d
liu
Tr v v tri ca cc
lut c v phi l
chui as.
Thuc
tnh
Lu tr nhn t loi
trong phn tch CYK.
Lu bin wait tng
ng vi phn t trong
phn tch CYK.
Lu v tr ca u
tin to ra phn t.
Lu ch s ca hai
thnh phn to ra
phn t.
Xc sut inside ca
phn t.
Xc sut outside ca
phn t.
Lu thnh phn cui
cng trong lut phn
Sinh vin thc hin: L Quang Thng-20062965-Kha K51-Lp Cng ngh Phn mm B
51
Tn: sLast
tch
Phng
thc
Tt c cc phng thc get v set ca cc thuc tnh trn
4.2.3.2.2. Lp Cell
y l lp qun l thng tin v mt trong qu trnh phn tch c php
CYK- Beam search.
Bng 4-8. Bng m t ca lp Cell
Thuc
tnh
Lu tr tp cc phn
t c cha trong .
Tr v tp arlElem
ca .
Tr v phn t th i
trong .
Xa phn t th i ra
khi .
Tr v s phn t
cha trong .
Tr v s phn t
cha b ct ta bi
beam search trong .
Thm phn t elem
vo .
Thm phn t c cha
t w vo trong .
4.2.3.2.3. Lp AstarElement
Nh tn gi ca mnh, lp ny qun l thng tin v phn t s c s
dng trong qu trnh phn tch c php bng gii thut A*.
Bng 4-9. Bng m t ca lp AstarElement
Lu tr nhn t loi
ca phn t.
Lu v tr bt u ca
phn t trong cu.
Sinh vin thc hin: L Quang Thng-20062965-Kha K51-Lp Cng ngh Phn mm B
52
Thuc
tnh
Lu v tr kt thc
ca phn t trong
cu.
Lu cc phn t
thnh phn to ra
phn t trong qu
trnh phn tch.
Xc sut inside ca
phn t.
Xc sut outside ca
phn t.
Lu cc gi tr
tnh xc sut outside
Lu xc sut ca lut
m element s dng
sinh ra cc
element thnh phn.
Tt c cc phng thc get v set ca cc thuc tnh trn
Phng Phm vi: public
Hm lm nhim v
thc
Kiu: void
thit lp mt danh
Tn+tham s:
sch phn t hp
contract( AStarRule rule,
thnh cho phn t
ArrayList<AStarElement> as)
ang c xt vi
mt lut c php.
Phm vi: public
Tr v t loi tng
Kiu: String
ng vi nhn trong
Tn+tham s: getWord()
trng hp phn t l
nt l.
Phm vi: public
Kim tra xem phn
Kiu: boolean
t ang xt c phi l
Tn+tham s: isWord()
nt l t loi khng
4.2.3.3. Gi x l chung Common
L mt gi ph trong h thng dng thc hin nhng x l thng thng.
4.2.3.3.1. Lp PartOfSpeech
Lp PartOfSpeech c s dng nh x t cc k hiu ting Anh sang ting
Vit, c s dng trong vic dng cy.
Bng 4-10. Bng m t ca lp ParOfSpeech
Sinh vin thc hin: L Quang Thng-20062965-Kha K51-Lp Cng ngh Phn mm B
53
Thuc
tnh
4.2.3.3.2. Lp Functions
Lp ny bao gm cc hm thc hin vic kim tra, bao gm kim tra xem t
u vo c phi l danh ring, s t hay khng.
Bng 4-11. Bng m t ca lp Functions
Thuc
tnh
4.2.3.4. Gi Analysis
4.2.3.4.1. Lp CYKBeamSearch
y l lp thc hin phn tch c php bng gii thut CYK kt hp ct ta
beam search.
Bng 4-12. Bng m t ca lp CYKBeamSeach
Cu u vo.
T l loi b trong ct ta beam
search.
Sinh vin thc hin: L Quang Thng-20062965-Kha K51-Lp Cng ngh Phn mm B
54
Tp lut c php.
L bin s nhn kt qu tr v t
b phn tch c php vi gi tr l
nt gc ca cy phn tch c php.
Mng lu cc t c bit, bao gm
danh t ring v s t.
Mng lu v tr cc du cu, s
dng trong vic thm nhn
Mng lu xu c c t cy c
php.
Mng lu tr cc t trong cu
Mng lu kch thc cc hng
trong bng CYK
Danh sch cc trong phn tch
CYK.
Thc hin phn tch CYK vi
ngng step.
L hm chnh thc hin cng vic
phn tch cu u vo.
Dng cy phn tch t bng CYK
4.2.3.4.2. Lp AStar
y l lp thc hin phn tch c php bng thut ton A* c s dng thut
ton lelightwin c bn.
Sinh vin thc hin: L Quang Thng-20062965-Kha K51-Lp Cng ngh Phn mm B
55
Tp lut c php.
Mng lu tr cc t trong
cu.
Bin lu v tr cc du cu,
dng cho vic thm nhn
cm t.
Tp CHART c s dng
trong thut ton A*.
Tp AGENDA c s
dng trong thut ton A*.
Ging nh trng hp ca
CYK, y l bin lu gi kt
qu sau khi phn tch.
y l hm chnh trong class
thc hin phn tch AStar.
Hm tr v tp phn t c
to ra bi s kt hp ca
phn t e vi cc phn t
trong CHART.
Thc hin tnh outside cho
phn t cand.
Tr v phn t ng c vin
c c lng cao nht.
Tr v nhnh cy m a lm
nt gc trong cy phn tch
c php u ra.
Tr v bin node.
Tr v kha ca phn t e
trong AGENDA.
4.2.3.4.3. Lp LeLightWin
Sinh vin thc hin: L Quang Thng-20062965-Kha K51-Lp Cng ngh Phn mm B
56
y l lp thc hin gii thut lelightwin c bn, thc hin cng vic x l v
tr v mt tp cc chui kt hp ca phn t ng c vin vi cc phn t trong
CHART.
Bng 4-14. Bng m t ca lp LeLightWin
Bin lu kt qu
ca thut ton l cc
chui kt hp.
Dy kt hp tri ca
phn t ng c vin
Dy kt hp phi ca
phn t ng c vin
Tp phn t nm bn
tri ng c vin sau
khi c phn loi
Tp phn t nm bn
phi ng c vin sau
khi c phn loi
Hm lm nhim v
sinh ra leftChain ca
ng c vin bt u
ti v tr phn t elem
Hm lm nhim v
sinh ra rightChain ca
ng c vin bt u
ti v tr phn t elem
Hm lm nhim v
phn loi cc phn t
trong CHART v lu
vo hai tp left v
right.
Hm lm nhim v
to ra cc chui con
kt hp ca phn t
ng c vin t hai tp
left v right
Hm cui cng, l
hm chnh trong
class, lm nhim v
quan trng nht: sinh
Sinh vin thc hin: L Quang Thng-20062965-Kha K51-Lp Cng ngh Phn mm B
57
getLLWChain(AStarElement e,
ra tt c cc chui kt
HashMap<String, AStarElement> chart) hp ca ng c vin e
v lu vo chain.
4.2.3.4.4. Lp Sentence
y c th coi l lp chnh ca gi phn tch cu analysis. Lp ny s m
nhn nhim v lu tr thng tin v mt cu vn bn u vo, cung cp cc phng
thc phn tch c php cho cu da vo tt c cc lp trn.
Bng 4-15. Bng m t ca lp Sentence
Bin lu cu vn bn u vo
bin lu cc t c bit: danh t
ring v s t
bin lu v tr cc du cu, s
dng trong vic thm nhn cm t
mng lu tr cc t trong cu
m nhn nhim v tch t cho
cu u vo. Kt qu tr ra file
text.txt.
c d liu t file text.txt v
thc hin gn nhn t loi. Kt qu
lu trong file wordTagged.txt
c d liu t file
wordTagged.txt
Thc hin phn tch CYK cho cu
u vo, c s dng kt hp beam
search.
Thc hin phn tch A* cho cu
u vo, c s dng kt hp thut
ton lelighwin c bn.
Tr v kt qu phn tch ca AStar
Tr v kt qu phn tch ca CYK
Sinh vin thc hin: L Quang Thng-20062965-Kha K51-Lp Cng ngh Phn mm B
58
4.3. Th nghim v nh gi
4.3.1. Giao din chng trnh
Giao din ca chng trnh phn tch c php ting Vit gm cc thnh phn
chnh nh sau:
Nt kch hot chc nng phn tch c php bng gii thut CYK-Beam
search, kt qu ca qu trnh phn tch s c hin th trong vng kt
qu phn tch ca CYK - Beam search.
Nt kch hot chc nng phn tch c php bng gii thut A*, kt qu
ca qu trnh phn tch s c hin th trong vng kt qu phn tch ca
Astar.
Bng danh sch cc cu cn phn tch: Vn bn u vo s c tch ra
thnh cc cu v hin th trn bng ny, mi dng tng ng vi mt cu.
Vng nhp vn bn u vo.
Cc vng hin th kt qu: hin th kt qu di dng cy, cp nht qu
trnh phn tch theo tng cy, mi khi c mt cy phn tch c th kt
qu s c cp nht.
59
4.3.2. Kt qu th nghim
4.3.2.1. Tp d liu th nghim
B d liu c dng th nghim hiu nng ca h thng phn tch c php
trong tng hp ting ni ting Vit gm c hai tp. Tp th nht gm 630 cu vn
bn c ly t vnspeechcorpus. Nhng cu vn bn ny u l nhng cu rt di
v lng nhng v mt c php. B d liu th hai th nghim chnh xc ca
b phn tch c php gm 200 cu c ly ra t b d liu VietTreeBank. Tp d
liu th nghim th hai hon ton c th chp nhn c v h thng phn tch c
php ca n cha h c hun luyn qua b d liu VietTreeBank m ch n
thun thng k xc sut PCFG, s th nghim ny s khng gy ra s thin v.
4.3.2.2. Kt qu phn tch.
Di y l mt s kt qu m b phn tch c php vi thut ton A* t
c vi cc cu t n gin n phc tp.
Ti l sinh vin
u ra xml:
<?xml version="1.0" ?>
<BKLightWinParser>
<sentence id="1">ti l sinh vin
<parse id="1">
Sinh vin thc hin: L Quang Thng-20062965-Kha K51-Lp Cng ngh Phn mm B
60
u ra xml:
<?xml version="1.0" ?>
<BKLightWinParser>
<sentence id="1">ti l mt sinh vin hc rt gii mn ton
<parse id="1">
<NP level="1" explain="cm danh t">ti
<P level="2" explain="i t">ti</P>
</NP>
<VP level="1" explain="cm ng t">l mt sinh vin hc rt gii mn ton
<V level="2" explain="ng t">l</V>
<NP level="2" explain="cm danh t">mt sinh vin hc rt gii mn ton
<M level="3" explain="s t">mt</M>
<NP level="3" explain="cm danh t">sinh vin hc rt gii mn ton
<N level="4" explain="danh t">sinh vin</N>
Sinh vin thc hin: L Quang Thng-20062965-Kha K51-Lp Cng ngh Phn mm B
61
62
Thut ton
A*
CYK-Beam search
Thi gian x l
15 pht
45 pht
Sinh vin thc hin: L Quang Thng-20062965-Kha K51-Lp Cng ngh Phn mm B
63
4.3.3. nh gi h thng
Vi phm vi ca mt n tt nghip, nhng kt qu m h thng t c l
kh kh quan. Tuy nhin, kt qu th nghim cho thy kt qu ca b phn tch c
php cha c vn cn thp. Nguyn nhn v nhng l do sau y:
B phn tch c php vn cha c gii thut hun luyn vi tp TreeBank
m ch n thun s dng thng k nn hiu nng ca chng trnh
khng c ci tin.
Tp lut c php vn cn cn phi hon thin thm.
B tch t v b gn nhn cho ra kt qu sai dn n u ra ca b phn
tch c cng sai.
Cc cu trong tp VietTreeBank l rt kh v di, hu ht l nhng cu c
phc tp 50-60 t v cu trc rt phc tp.
V mt tc , h thng lun gi c mt tc phn tch kh n nh k c
vi nhng cu di v kh cho thy s u vit ca thut A*. Hn na, mi ch l
bc u, nu kt hp thm gii thut lelightwin ct ta, tc ca h thng c th
c ci thin ln hng chc ln.
4.4. Kt chng
Chng ny trnh by kt qu kim th cng nh nh gi hiu nng ca
chng trnh phn tch c php ting Vit.
Gii thut phn tch c php A* cho kt qu rt kh quan khi phn tch 630
cu hnh vn trong thi gian 15 pht, tc trung bnh khong 3s/1 cu.
Nhng cu ny u l nhng cu rt di v kh.
So vi gii thut CYK-Beam search, gii thut A* t ra u th hn hn v
mt tc . V chnh xc, do khng thi gian nn vn cha c th
nghim cho chng trnh. Nhng trong tng lai nht nh s c hon
thnh th nghim nh gi hiu nng ca h thng mt cch chun xc
hn.
chnh xc khi phn tch cc cu mu trong tp TreeBank vn cha c
cao.
Sinh vin thc hin: L Quang Thng-20062965-Kha K51-Lp Cng ngh Phn mm B
64
65
Sinh vin thc hin: L Quang Thng-20062965-Kha K51-Lp Cng ngh Phn mm B
66
[2].
Conference
American
Association for
Computational Linguistics(HLT-NAACL).
[5].
Hong Anh Vit, Phn tch c php ting Vit s dng m hnh xc
syntactic parser using HPSG, Khoa Cng ngh thng tin, trng i hc
Bch khoa H Ni.
[10]. Dip Quang Ban, Hong Vn Thung, Ng php ting Vit, tp
1,2, Nh xut bn gio dc, 1991-1992.
[11]. Trung tm khoa hc x hi v nhn vn Quc Gia. Ng php
ting Vit. Nh xut bn Khoa hc X hi 2000.
[12]. Nguyn Phng Thi, V Xun Lng, Nguyn Th Minh Huyn.
Xy dng treebank ting Vit.
Sinh vin thc hin: L Quang Thng-20062965-Kha K51-Lp Cng ngh Phn mm B
67
PH LC
Bng cc k hiu dng trong tp lut c php
A tnh t
AE Ph t trc tnh t
AH Tnh t trung tm
AP cm tnh t
AR Ph t sau tnh t
AV trng ng
Ac tnh t tng th
Al tnh t nh lng
Ao tnh t tng thanh
Ap tnh t tnh cht
C gii t
Cm gii t chnh ph
Co ,
Cp gii t lin hp
D ph t kt qu
E cm t
H trung tm
I danh t s lng
M s t
N danh t
NE Ph t trc danh t
NH Danh t trung tm
NP cm danh t
NPC cm danh t
NR Ph t sau danh t
Na danh t tru tng
Nc danh t n th
Ng danh t tng th
Np tn ring
Nq s t
Ns loi t
Nu danh t n v o lng
P i t
PP cm gii t
PRD v ng
Pd i t ch nh
Pi i t nghi vn
Pl i t hot ng, tnh cht
Pn i t s lng
Pp i t xng h
Pu i t xng h
QP cm t ch s lng
R ph t
RP cm ph t
Rc ph t so snh
Rd ph t mc
Ri ph t mnh lnh
Root Cc cu phn tch c
Rt ph t thi gian
Rv ph t v tr
S Cu
SBAR Mnh ph
SBARS Mnh ph
SBJ ch ng
SC Cu mnh lnh
SE Cu cm thn
SF Cu thuyt
SN Cu trn thut
SQ Cu hi
V ng t
VE Ph t trc ng t
VH ng t trung tm
VP cm ng t
VPC cm ng t
VR Ph t sau ng t
Vc ng t tng th
Vit ng t ni ng
Vt ng t ngoi ng
Vz ng t l
WHAP cm tnh t nghi vn
WHNP cm danh t nghi vn
WHPP cm gii t nghi vn
WHRP cm ph t nghi vn
X t khng xc nh
Y t vit tt
68
69
70
</VP>
</NP>
</PP>
</NP>
</VP>
</VP>
</VP>
</parse>
</sentence>
<sentence id="4">ti ba mi bn tui , l bc s
<parse id="1">
<NP level="1" explain="cm danh t">ti ba mi bn tui
<NP level="2" explain="cm danh t">ti
<P level="3" explain="i t">ti</P>
</NP>
<NP level="2" explain="cm danh t">ba mi bn tui
<N level="3" explain="danh t">ba mi</N>
<M level="3" explain="s t">bn</M>
<N level="3" explain="danh t">tui</N>
</NP>
</NP>
<punc level="1">,</punc>
<VP level="1" explain="cm ng t">l bc s
<V level="2" explain="ng t">l</V>
<N level="2" explain="danh t">bc s</N>
</VP>
</parse>
</sentence>
<sentence id="5"> lp gia nh mi nm v c mt b trai , mt b gi
<parse id="1">
<VP level="1" explain="cm ng t"> lp gia nh mi nm
<R level="2" explain="ph t"></R>
<V level="2" explain="ng t">lp</V>
<NP level="2" explain="cm danh t">gia nh mi nm
<N level="3" explain="danh t">gia nh</N>
<M level="3" explain="s t">mi</M>
<N level="3" explain="danh t">nm</N>
</NP>
</VP>
<C level="1" explain="gii t">v</C>
<VP level="1" explain="cm ng t">c mt b trai , mt b gi
<V level="2" explain="ng t">c</V>
<NP level="2" explain="cm danh t">mt b trai , mt b gi
<M level="3" explain="s t">mt</M>
<NP level="3" explain="cm danh t">b trai , mt b gi
<N level="4" explain="danh t">b</N>
<N level="4" explain="danh t">trai</N>
<punc level="4">,</punc>
<NP level="4" explain="cm danh t">mt b gi
<M level="5" explain="s t">mt</M>
<NP level="5" explain="cm danh t">b gi
<N level="6" explain="danh t">b</N>
<N level="6" explain="danh t">gi</N>
Sinh vin thc hin: L Quang Thng-20062965-Kha K51-Lp Cng ngh Phn mm B
71
</NP>
</NP>
</NP>
</NP>
</VP>
</parse>
</sentence>
<sentence id="6">nhn chung v chng ti sng ho thun , nhng nhn nhau
<parse id="1">
<VP level="1" explain="cm ng t">nhn chung v chng ti
<V level="2" explain="ng t">nhn chung
<U level="3" explain="null">nhn chung</U>
</V>
<NP level="2" explain="cm danh t">v chng ti
<N level="3" explain="danh t">v chng</N>
<P level="3" explain="i t">ti</P>
</NP>
</VP>
<VP level="1" explain="cm ng t">sng ho thun , nhng nhn nhau
<VP level="2" explain="cm ng t">sng ho thun
<V level="3" explain="ng t">sng</V>
<A level="3" explain="tnh t">ho thun</A>
</VP>
<punc level="2">,</punc>
<VP level="2" explain="cm ng t">nhng nhn nhau
<V level="3" explain="ng t">nhng nhn</V>
<N level="3" explain="danh t">nhau</N>
</VP>
</VP>
</parse>
</sentence>
<sentence id="7">chng ti l mt ngi tt , thng yu v con
<parse id="1">
<NP level="1" explain="cm danh t">chng ti
<N level="2" explain="danh t">chng</N>
<P level="2" explain="i t">ti</P>
</NP>
<VP level="1" explain="cm ng t">l mt ngi tt , thng yu v con
<VP level="2" explain="cm ng t">l mt ngi tt
<V level="3" explain="ng t">l</V>
<NP level="3" explain="cm danh t">mt ngi tt
<M level="4" explain="s t">mt</M>
<NP level="4" explain="cm danh t">ngi tt
<N level="5" explain="danh t">ngi</N>
<A level="5" explain="tnh t">tt</A>
</NP>
</NP>
</VP>
<punc level="2">,</punc>
<VP level="2" explain="cm ng t">thng yu v con
<V level="3" explain="ng t">thng yu</V>
<N level="3" explain="danh t">v con</N>
</VP>
Sinh vin thc hin: L Quang Thng-20062965-Kha K51-Lp Cng ngh Phn mm B
72
</VP>
</parse>
</sentence>
<sentence id="8">nhn b ngoi , ai cng bo ti l ngi v hnh phc
<parse id="1">
<VP level="1" explain="cm ng t">nhn b ngoi
<V level="2" explain="ng t">nhn</V>
<N level="2" explain="danh t">b ngoi</N>
</VP>
<punc level="1">,</punc>
<NP level="1" explain="cm danh t">ai
<P level="2" explain="i t">ai</P>
</NP>
<VP level="1" explain="cm ng t">cng bo ti l ngi v hnh phc
<R level="2" explain="ph t">cng</R>
<V level="2" explain="ng t">bo</V>
<NP level="2" explain="cm danh t">ti l ngi v hnh phc
<P level="3" explain="i t">ti</P>
<VP level="3" explain="cm ng t">l ngi v hnh phc
<V level="4" explain="ng t">l</V>
<NP level="4" explain="cm danh t">ngi v hnh phc
<N level="5" explain="danh t">ngi</N>
<NP level="5" explain="cm danh t">v hnh phc
<N level="6" explain="danh t">v</N>
<A level="6" explain="tnh t">hnh phc</A>
</NP>
</NP>
</VP>
</NP>
</VP>
</parse>
</sentence>
<sentence id="9">nhng c nhng iu kh s m chng bit tm s cng ai
<parse id="1">
<C level="1" explain="gii t">nhng</C>
<VP level="1" explain="cm ng t">c nhng iu kh s m chng bit tm s cng ai
<V level="2" explain="ng t">c</V>
<NP level="2" explain="cm danh t">nhng iu kh s m chng bit tm s cng ai
<N level="3" explain="danh t">nhng</N>
<N level="3" explain="danh t">iu</N>
<AP level="3" explain="cm tnh t">kh s m chng bit tm s cng ai
<A level="4" explain="tnh t">kh s</A>
<PP level="4" explain="cm gii t">m chng bit tm s cng ai
<C level="5" explain="gii t">m</C>
<VP level="5" explain="cm ng t">chng bit tm s cng ai
<R level="6" explain="ph t">chng</R>
<V level="6" explain="ng t">bit</V>
<VP level="6" explain="cm ng t">tm s cng ai
<V level="7" explain="ng t">tm s</V>
<C level="7" explain="gii t">cng</C>
<P level="7" explain="i t">ai</P>
</VP>
</VP>
Sinh vin thc hin: L Quang Thng-20062965-Kha K51-Lp Cng ngh Phn mm B
73
</PP>
</AP>
</NP>
</VP>
</parse>
</sentence>
<sentence id="10"> l thi quen sinh hot bn thu v lum thum ca chng ti
<parse id="1">
<NP level="1" explain="cm danh t">
<P level="2" explain="i t"></P>
</NP>
<VP level="1" explain="cm ng t">l thi quen sinh hot bn thu v lum thum ca chng ti
<V level="2" explain="ng t">l</V>
<NP level="2" explain="cm danh t">thi quen sinh hot bn thu v lum thum ca chng ti
<N level="3" explain="danh t">thi quen</N>
<VP level="3" explain="cm ng t">sinh hot bn thu v lum thum ca chng ti
<V level="4" explain="ng t">sinh hot</V>
<AP level="4" explain="cm tnh t">bn thu v lum thum
<A level="5" explain="tnh t">bn thu</A>
<C level="5" explain="gii t">v</C>
<A level="5" explain="tnh t">lum thum</A>
</AP>
<PP level="4" explain="cm gii t">ca chng ti
<C level="5" explain="gii t">ca</C>
<NP level="5" explain="cm danh t">chng ti
<N level="6" explain="danh t">chng</N>
<P level="6" explain="i t">ti</P>
</NP>
</PP>
</VP>
</NP>
</VP>
</parse>
</sentence>
<sentence id="11"> mi ba n anh y ung nm n su chn ru , tr bui sng , mi ngy ht hai
gi thuc l , nhng hu nh khng nh rng , ra tay , tm ra
<parse id="1">
<S level="1" explain="Cu">mi ba n anh y ung nm n su chn ru , tr bui sng
<NP level="2" explain="cm danh t">mi ba n anh y
<N level="3" explain="danh t">mi</N>
<NP level="3" explain="cm danh t">ba n anh y
<N level="4" explain="danh t">ba</N>
<V level="4" explain="ng t">n</V>
<P level="4" explain="i t">anh y</P>
</NP>
</NP>
<VP level="2" explain="cm ng t">ung nm n su chn ru , tr bui sng
<V level="3" explain="ng t">ung</V>
<NP level="3" explain="cm danh t">nm n su chn ru , tr bui sng
<N level="4" explain="danh t">nm</N>
<VP level="4" explain="cm ng t">n su chn ru
<V level="5" explain="ng t">n</V>
<NP level="5" explain="cm danh t">su chn ru
Sinh vin thc hin: L Quang Thng-20062965-Kha K51-Lp Cng ngh Phn mm B
74
75
</NP>
</NP>
<V level="2" explain="ng t">tm ra</V>
</S>
</parse>
</sentence>
</BKLightWinParser>
Sinh vin thc hin: L Quang Thng-20062965-Kha K51-Lp Cng ngh Phn mm B
76