You are on page 1of 89

TRNG I HC BCH KHOA H NI

VIN CNG NGH THNG TIN V TRUYN THNG


*

TT NGHIP I HC
NGNH CNG NGH THNG TIN

PHN TCH C PHP TRONG TNG


HP TING NI TING VIT

Sinh vin thc hin :

L Quang Thng
Lp: Cng ngh phn mm K51

Gio vin hng dn:

TS. Cao Tun Dng


ThS Nguyn Th Thu Trang

H NI 05-2011

MC LC
TM TT NI DUNG N TT NGHIP ............................................................................................. v
LI CM N .................................................................................................................................................. vi
DANH SCH CC BNG TRONG LUN VN ......................................................................................viii
DANH SCH CC HNH TRONG LUN VN ......................................................................................... ix
DANH SCH CC BNG TRONG LUN VN ......................................................................................... x
LI NI U .................................................................................................................................................. xi
CHNG 1.

PHN TCH C PHP TRONG TNG HP TING NI ........................................... 1

1.1. Tng quan v tng hp ting ni .......................................................................................................... 1


1.1.1. Chun ha vn bn ........................................................................................................................ 2
1.1.2. Phn tch cu trc - ngn ng ........................................................................................................ 2
1.1.3. Phn tch ng iu ......................................................................................................................... 2
1.1.4. Tng hp mc thp ....................................................................................................................... 2
1.2. Vai tr ca phn tch c php trong tng hp ting ni ....................................................................... 3
1.3. Mt s nghin cu trn th gii v phn tch c php .......................................................................... 3
1.3.1. Cc m hnh phn tch c php trn th gii ................................................................................. 3
1.3.2. Cc chin lc phn tch c php .................................................................................................. 4
1.3.2.1. Cch tip cn t trn xung (Top-down) ............................................................................... 4
1.3.2.2. Cch tip cn t di ln (Bottom-up) ................................................................................... 4
1.3.3. Mt s gii thut phn tch c php ni ting ................................................................................ 4
1.3.3.1. Thut ton Earley ................................................................................................................... 4
1.3.3.2. Thut ton CYK ..................................................................................................................... 4
1.3.3.3. Nhn xt ................................................................................................................................. 5
1.4. Nhim v ca n tt nghip ............................................................................................................. 5
1.5. Kt chng ........................................................................................................................................... 6
CHNG 2.

TING VIT V C S L THUYT CHO PHN TCH C PHP ........................ 7

2.1. Cc t ting Vit ................................................................................................................................... 7


2.1.1. Danh t - N .................................................................................................................................... 7
2.1.1.1. Danh t n th - Ns .............................................................................................................. 7
2.1.1.2. Danh t tng th - Nc ............................................................................................................. 7
2.1.1.3. Danh t n v - Nu ............................................................................................................... 7
2.1.1.4. Danh t tru tng - Na ......................................................................................................... 7
2.1.1.5. Danh t ring - Np ................................................................................................................. 7
2.1.2. ng t - V ................................................................................................................................... 8
2.1.2.1. ng t ngoi ng - Vt ........................................................................................................ 8
2.1.2.2. ng t ni ng - Vi ............................................................................................................ 8
2.1.2.3. ng t tn ti - Ve................................................................................................................ 8
2.1.2.4. ng t bin ha - Vf ............................................................................................................ 8
2.1.2.5. ng t ch - Vv.................................................................................................................. 8
2.1.2.6. ng t tip th - Va.............................................................................................................. 8
2.1.2.7. ng t so snh - Vc .............................................................................................................. 8
2.1.2.8. ng t c bit: ng t l - Vz ....................................................................................... 8
2.1.3. Tnh t - A ..................................................................................................................................... 9
2.1.3.1. Tnh t hm cht - Ai ............................................................................................................. 9
2.1.3.2. Tnh t hm lng - An.......................................................................................................... 9
2.1.4. Ph t - R ...................................................................................................................................... 9
2.1.4.1. Ph t thi gian - Rt ............................................................................................................... 9
2.1.4.2. Ph t mc - Rd ................................................................................................................ 9
2.1.4.3. Ph t so snh - Rc ................................................................................................................. 9
2.1.4.4. Ph t khng nh (RfY) ph nh (RfN) ........................................................................... 9
2.1.4.5. Ph t mnh lnh - Ri ............................................................................................................ 9

2.1.5. Kt t - C ....................................................................................................................................... 9
2.1.5.1. Kt t chnh ph - Cm .......................................................................................................... 10
2.1.5.2. Kt t lin hp - Cp .............................................................................................................. 10
2.1.6. i t - P ..................................................................................................................................... 10
2.1.6.1. i t s vt ......................................................................................................................... 10
2.1.6.2. i t hot ng tnh cht - Pl ............................................................................................. 10
2.1.6.3. i t nghi vn - Pi .............................................................................................................. 10
2.1.7. Tr t - M .................................................................................................................................... 10
2.2. Cm t ting Vit ................................................................................................................................ 10
2.2.1. Cm danh t - NP ........................................................................................................................ 10
2.2.1.1. Khi nim ............................................................................................................................. 10
2.2.1.2. Cu to ................................................................................................................................. 10
2.2.1.3. Chc nng ng php ca cm danh t ................................................................................. 11
2.2.2. Cm ng t - VP ....................................................................................................................... 11
2.2.2.1. Khi nim ............................................................................................................................. 11
2.2.2.2. Cu to ................................................................................................................................. 11
2.2.2.3. Chc nng ng php ca cm ng t ................................................................................. 12
2.2.3. Cm tnh t - AP ......................................................................................................................... 12
2.2.3.1. Khi nim ............................................................................................................................. 12
2.2.3.2. Cu to ................................................................................................................................. 12
2.2.3.3. Chc nng ng php............................................................................................................. 12
2.3. Cc kiu cu ca ting Vit ................................................................................................................ 13
2.4. M hnh CFG v gii thut phn tch c php CYK .......................................................................... 13
2.4.1. M hnh vn phm CFG .............................................................................................................. 13
2.4.2. Thut ton CYK .......................................................................................................................... 13
2.4.2.1. Thut ton to bng ca CYK[7] ......................................................................................... 14
2.4.2.2. V d minh ha cho thut ton CYK .................................................................................... 14
2.4.2.3. Thut ton CYK ci tin....................................................................................................... 15
2.4.2.4. Nhng kh khn ca thut ton CYK .................................................................................. 16
2.5. M hnh xc sut PCFG ..................................................................................................................... 16
2.5.1. nh ngha c bn v PCFG ........................................................................................................ 16
2.5.2. Cc loi xc sut trong PCFG ..................................................................................................... 16
2.5.2.1. Xc sut trong (inside) ......................................................................................................... 16
2.5.2.2. Xc sut ngoi (outside) ....................................................................................................... 17
2.5.3. Cch kh nhp nhng mt c php vi PCFG ............................................................................. 17
2.6. Kt chng ......................................................................................................................................... 18
CHNG 3.

CC XUT CA N CHO PHN TCH C PHP TING VIT ............... 19

3.1. Thut ton CYK beam search .......................................................................................................... 19


3.1.1. Nhn xt v m hnh PCFG v thut ton CYK ci tin ............................................................. 19
3.1.2. Thut ton tm kim beam search ................................................................................................ 19
3.1.3. Quy trnh p dng v thc hin thut ton beam search .............................................................. 20
3.1.3.1. Cch tnh inside trong beam search ...................................................................................... 21
3.1.3.2. Cch tnh xc sut outside trong beam search ...................................................................... 22
3.1.3.3. Quy trnh thc hin thut ton beam search ......................................................................... 24
3.1.4. Nhn xt v thut ton beam search ............................................................................................ 24
3.2. xut s dng thut ton A* cho phn tch c php ....................................................................... 25
3.2.1. Thut ton A* .............................................................................................................................. 25
3.2.2. A* trong phn tch c php ......................................................................................................... 26
3.2.2.1. Gii thut A* p dng cho phn tch c php ...................................................................... 26
3.2.2.2. Hm u tin trong A* ........................................................................................................... 27
3.2.2.2.1. S dng duy nht xc sut inside .................................................................................. 28
3.2.2.2.2. Tnh outside mt tng ................................................................................................... 28
3.2.2.2.3. Tnh outside bng phng php rt gn tp lut ........................................................... 28
3.3. xut gii thut lelightwin nhm ci tin A* trong phn tch c php ting Vit. .......................... 29
3.3.1. t vn .................................................................................................................................... 29
3.3.2. Gii thut lelightwin .................................................................................................................... 30
3.3.2.1. tng ca gii thut lelightwin ......................................................................................... 30
3.3.2.2. M hnh thut ton lelightwin c bn ................................................................................... 31

ii

3.3.2.2.1. Phn loi phn t ........................................................................................................... 31


3.3.2.2.2. Sinh chui kt hp da trn cc phn t phn loi ................................................... 32
3.3.2.3. Thut ton lelightwin Prunning ............................................................................................ 35
3.3.2.3.1. Hun luyn cho b ct ta ca lelightwin ...................................................................... 35
3.3.2.3.2. Thc hin qu trnh ct ta trong gii thut lelightwin .................................................. 37
3.3.3. Thut ton A* kt hp vi thut ton lelightwin prunning trong phn tch c php ting Vit .. 38
3.4. Nhn xt thut ton A* - lelightwin prunning .................................................................................... 39
3.5. Kt chng ......................................................................................................................................... 39
CHNG 4.

PHT TRIN V TH NGHIM NH GI H THNG ....................................... 41

4.1. Phn tch v thit k chng trnh phn tch c php tng hp ting ni ting Vit.......................... 41
4.1.1. M hnh tng th ......................................................................................................................... 41
4.1.2. M t cc chc nng trong phn tch c php ............................................................................. 42
4.1.2.1. Tin x l vn bn................................................................................................................. 42
4.1.2.1.1. Tch t .......................................................................................................................... 42
4.1.2.1.2. Gn nhn ....................................................................................................................... 42
4.1.2.2. Qun l lut (Rule Manager) ................................................................................................ 43
4.1.2.2.1. Load file d liu cha tp lut ...................................................................................... 43
4.1.2.2.2. Tnh xc sut PCFG cho mi lut ................................................................................. 43
4.1.2.3. Phn tch c php ................................................................................................................. 44
4.1.3. T chc lu tr d liu ................................................................................................................ 44
4.1.3.1. D liu vn bn chun ha u vo...................................................................................... 44
4.1.3.2. D liu chuyn giao ca b tin x l vn bn ..................................................................... 45
4.1.3.3. Tp lut c php ................................................................................................................... 46
4.1.3.4. Kho d liu VietTreeBank ................................................................................................... 46
4.1.3.5. D liu u ra ca h thng.................................................................................................. 47
4.2. Xy dng h thng .............................................................................................................................. 48
4.2.1. Cng c la chn ......................................................................................................................... 48
4.2.2. Biu lp .................................................................................................................................. 49
4.2.3. Thit k chi tit lp...................................................................................................................... 49
4.2.3.1. Gi RuleManager ................................................................................................................. 49
4.2.3.1.1. Lp Rule ....................................................................................................................... 49
4.2.3.1.2. Lp RuleSet .................................................................................................................. 50
4.2.3.2. Gi phn t parsing Element ................................................................................................ 51
4.2.3.2.1. Lp CYK Element ........................................................................................................ 51
4.2.3.2.2. Lp Cell ........................................................................................................................ 52
4.2.3.2.3. Lp AstarElement ......................................................................................................... 52
4.2.3.3. Gi x l chung Common ..................................................................................................... 53
4.2.3.3.1. Lp PartOfSpeech ......................................................................................................... 53
4.2.3.3.2. Lp Functions ............................................................................................................... 54
4.2.3.4. Gi Analysis ......................................................................................................................... 54
4.2.3.4.1. Lp CYKBeamSearch .................................................................................................. 54
4.2.3.4.2. Lp AStar ...................................................................................................................... 55
4.2.3.4.3. Lp LeLightWin ........................................................................................................... 56
4.2.3.4.4. Lp Sentence ................................................................................................................. 58
4.3. Th nghim v nh gi ..................................................................................................................... 59
4.3.1. Giao din chng trnh ................................................................................................................ 59
4.3.2. Kt qu th nghim ..................................................................................................................... 60
4.3.2.1. Tp d liu th nghim ........................................................................................................ 60
4.3.2.2. Kt qu phn tch. ................................................................................................................ 60
4.3.3. nh gi h thng........................................................................................................................ 64
4.4. Kt chng ......................................................................................................................................... 64
KT LUN V HNG PHT TRIN ..................................................................................................... 65
Ti liu tham kho .......................................................................................................................................... 67
PH LC ........................................................................................................................................................ 68

iii

PHIU GIAO NHIM V N TT NGHIP


1. Thng tin v sinh vin
H v tn sinh vin: L Quang Thng
in thoi lin lc: 01675888962
Email: lelightwin@gmail.com
Lp: Cng ngh phn mm K51
H o to: i hc chnh quy
n tt nghip c thc hin ti: Trung tm nghin cu Mica Trng i hc Bch
Khoa H Ni.
Thi gian lm ATN: T ngy 21/02/2011 n 28/05/2011
2. Mc ch ni dung ca ATN
Tm hiu phng php phn tch c php ting Vit phc v cho b tng hp ting ni
ting Vit
3. Cc nhim v c th ca ATN
Tm hiu ting Vit, nghin cu cc phng php phn tch ting Vit trong nc
v ngoi nc.
xut ra gii php gip ti u ha hiu nng ca b phn tch c php.
Tin hnh kt ni d liu vi cc cng on khc trong tng hp ting ni.
4. Li cam oan ca sinh vin:
Ti L Quang Thng - cam kt ATN l cng trnh nghin cu ca bn thn ti di s
hng dn ca TS. Cao Tun Dng v ThS. Nguyn Th Thu Trang.
Cc kt qu nu trong ATN l trung thc, khng phi l sao chp ton vn ca bt k
cng trnh no khc.
H Ni, ngy 20 thng 05 nm 2011
Tc gi ATN

L Quang Thng
5. Xc nhn ca gio vin hng dn v mc hon thnh ca ATN v cho php bo
v:
H Ni, ngy 28 thng 05 nm 2011
Gio vin hng dn

TS. Cao Tun Dng

ThS. Nguyn Th Thu Trang

iv

TM TT NI DUNG N TT NGHIP
Tng hp ting ni l mt lnh vc p ng li mong mun ca con ngi trong giao
tip vi my tnh qua ting ni. Trong , con ngi mun my tnh c ln nhng on
vn bn mong mun. Lnh vc ny c tm hiu v pht trin t kh sm v n by gi
t c nhng thnh qu quan trng. Vi mong mun tm hiu v pht trin b tng
hp ting ni cho ting Vit, n chn lnh vc tng hp ting ni lm hng nghin
cu. Trong n ny, n tp trung vo phn phn tch c php trong tng hp ting ni.
Vi mong mun ci thin hiu nng cho h thng ting ni tng hp.
Trong n ny, n tp trung i tm hiu c s l thuyt phn tch c php v ng
php ting Vit. Sau , n tp trung vo vn ci tin gii thut phn tch c php,
ng thi xut ra thut ton hon thin hn cht lng, tc ca b phn tch c php
p dng cho tng hp ting ni ting Vit. n cng tin hnh ci t v nh gi hiu
qu ca h thng. T a ra hng pht trin tip theo cho n.

LI CM N
u tin, con xin cm n b m, nhng ngi nui nng, ng
vin, gip con cho n tn lc con c th t mnh t tay g ra nhng
dng ny. Anh xin cm n c em gi d dn, nghch ngm ca anh
ng vin, chc tcanh trong sut qu trnh anh lm n.
Em xin c gi li cm n chn thnh ti cc thy c gio trong
trng i hc Bch Khoa H Ni cng nh cc thy c trong Vin Cng
ngh thng tin v truyn thng truyn dy cho em nhng kin thc v
kinh nghim qu gi trong sut qu trnh hc tp tu dng trong sut 5
nm qua.
Em xin c gi li cm n n TS. Cao Tun Dng, TS. Trn
t v Ths.Nguyn Th Thu Trang dn dt em trong sut qu trnh lm
n. Em xin cm n mt ln na, thy c l nhng ngi gio vin tn
tnh nht m em tng bit n.
Em xin c bit gi li cm n n thy gio B Lm. D khng
phi l gio vin hng dn ca em nhng nu khng c thy, em
khng th hon thnh n tt n nh vy.
Em xin c gi li cm n n bn T Hong Long lp Vit Nht
K51. Cu l ngi bn thn lun st cnh cng t t cp 3 n gi. V
cho n tn khi ln i hc, ri n c khi t c th t ho v hon
thnh n, cu vn lun l ngi gip t c ngh lc vt qua nhng
cn ng lng v mt mi.
Em xin c gi li cm n n nhm bn Q4T ca lp cng ngh
phn mm. Chng ta c th khng phi l anh em, nhng chng ta l mt
i tht tuyt vi phi khng cc bn. Hy cng nhau hon thnh tt
n cuc i cc bn nh, cc chin hu ca t.
Em xin c gi li cm n n nhm bn trung tm mica. Cc
cu chnh l ngun cm hng khin t khng ngng phn u bn thn
mnh. Chng ta cng tri qua nhiu k nim tht kh qun, cc bn s
mi l nhng ngi bn tt ca t.

vi

Em cng by t lng bit n ti trung tm nghin cu Mica to


iu kin v c s vt cht cho em trong qu trnh thc hin n tt
nghip.
Em cng mun gi li cm n ti tp th lp Cng ngh phn mm
K51 to mt mi trng thi ua hc tp lnh mnh, to iu kin cho
s pht trin ca cc thnh vin trong lp.
H Ni, ngy 25 thg 05 nm 2011
L Quang Thng
Lp CNPM K51
Vin CNTT & TT H Bch Khoa HN

vii

DANH SCH CC BNG TRONG LUN VN


S TT

Gii ngha

1.

CFG

Context-Free Grammar

2.

PCFG

Probalistic Context-Free Grammar

3.

LPCFG

lexical probabilistic context-free grammar

4.

CYK

Cocke-Younger-Kasami

5.

Earley

Thut ton Earley

6.

TreeBank

Kho ng liu lu tr cc cy c php c phn tch.

viii

DANH SCH CC HNH TRONG LUN VN


Hnh 1-1. M hnh h thng tng hp ting ni. ................................................................... 1
Hnh 1-2. Cy phn tch c php ca cu ti ang lm n ............................................ 2
Hnh 2-1. Mt phn t trong bng CYK. ............................................................................. 13
Hnh 2-2. CYK ci tin vi cu"anh ma kim v vt cn". ............................................... 15
Hnh 2-3. Cy phn tch t1. .................................................................................................. 18
Hnh 2-4. Cy phn tch t2. .................................................................................................. 18
Hnh 3-1. Minh ha vui cho thut ton ct ta beam search. ............................................... 20
Hnh 3-2. Hnh nh ca mt nt phn tch hon chnh. ....................................................... 21
Hnh 3-3. Hnh nh ca mt nt phn tch s dng bin wait. ............................................ 22
Hnh 3-4. Hnh nh m phng outside ca mt nt. ............................................................ 23
Hnh 3-5. Outside ca trng hp nt hon chnh............................................................... 23
Hnh 3-6. Outside ca trng hp nt cha hon chnh. ..................................................... 24
Hnh 3-7. M hnh ca thut ton lelightwin. ...................................................................... 31
Hnh 3-8. Giai on phn loi phn t trong CHART......................................................... 32
Hnh 3-9. Cc cng on ca thut ton sinh chui. ........................................................... 33
Hnh 3-10. V d minh ha thut ton sinh chui con tri................................................... 33
Hnh 3-11. Hnh nh cy phn cp d liu chnh. ............................................................... 36
Hnh 3-12. Hnh nh cy phn cp d liu con. .................................................................. 37
Hnh 4-1. M hnh tng th ca h thng. ........................................................................... 41
Hnh 4-2. Cu trc ca d liu vn bn chun ha. ............................................................. 45
Hnh 4-3. V d minh ha v d liu chun ha vn bn. ................................................... 45
Hnh 4-4. Cu trc ca d liu VietTreeBank. .................................................................... 47
Hnh 4-5. M phng d liu u ra ca h thng. ............................................................... 48
Hnh 4-6. Giao din chnh ca chng trnh. ...................................................................... 59
Hnh 4-7.Hnh v b d liu 630 cu vn bn th nghim. . Error! Bookmark not defined.
Hnh 4-8. CPTCP ti l sinh vin ..................................................................................... 60
Hnh 4-9. CPTCP ti l mt sinh vin hc rt gii mn ton .......................................... 61
Hnh 4-10. Hnh nh phn tch ca mt cu rt kh v di. ................................................ 63

ix

DANH SCH CC BNG TRONG LUN VN


Bng 2-1.Phn tch CYK cho cu anh y rt ngu ........................................................... 14
Bng 3-1. Cc phn t trong CHART ................................................................................. 30
Bng 3-2. Cc chui kt hp ca X vi CHART ................................................................ 30
Bng 4-3. Bng m t ca chc nng load d liu .............................................................. 43
Bng 4-4. Bng m t ca chc nng tnh xc sut PCFG .................................................. 43
Bng 4-5. Bng m t ca chc nng phn tch c php ..................................................... 44
Bng 4-6. Minh ha d liu u ra ca b tin x l vn bn .............................................. 46
Bng 4-7. Bng m t ca lp Rule ..................................................................................... 50
Bng 4-8. Bng m t ca lp RuleSet ................................................................................ 50
Bng 4-9. Bng m t ca lp Element ............................................................................... 51
Bng 4-10. Bng m t ca lp Cell .................................................................................... 52
Bng 4-11. Bng m t ca lp AstarElement .................................................................... 52
Bng 4-12. Bng m t ca lp ParOfSpeech ..................................................................... 53
Bng 4-13. Bng m t ca lp Functions ........................................................................... 54
Bng 4-14. Bng m t ca lp CYKBeamSeach ............................................................... 54
Bng 4-15. Bng m t ca lp AStar ................................................................................. 56
Bng 4-16. Bng m t ca lp LeLightWin ....................................................................... 57
Bng 4-17. Bng m t ca lp Sentence ............................................................................ 58
Bng 4-18. Bng tng kt th nghim vi 630 cu hnh vn .............................................. 63

LI NI U
My tnh? Mt cm t qu tr nn qu quen thuc i vi chng ta trong x
hi thng tin ngy nay. My tnh tc ng n mi lnh vc trong cuc sng ca
chng ta. My tnh gip con ngi rt nhiu trong cuc sng hi h v khc
nghit ny. Nh c my tnh, cng vic ca chng ta tr nn d dng hn rt nhiu.
Nh c my tnh, con ngi trn th gii c th xch li gn nhau hn. Chng ta
lun c v vn nhng iu th v khm ph v my tnh. My tnh cn l khi
ngun ca rt nhiu pht minh v sng to khoa hc. C th ni my tnh hin nay
tr thnh mt phn khng th thiu trong cuc sng ca chng ta.
C bao gi chng ta ngh rng my tnh l bn ca con ngi? C th chng ta
s tr li l khng. Nhng cu tr li thc s ch l cha. Chng ta hy th hnh
dung s tuyt vi bit bao nu bn cnh ta l mt chic my tnh va c th gip
ta lm c ni vic li va c th tr chuyn, tm s vi ta nh mt ngi bn. Vi
hi vng mt ngy no , my tnh c th thc s tr chuyn c vi con ngi,
cc nh khoa hc trn th gii mit mi nghin cu h thng tng hp ting ni.
Tng hp ting ni l h thng gip my mc c th m phng ging ni ca con
ngi mt cch chnh xc v t nhin nht c th. Cho n nay, rt nhiu nhng sn
phm nghin cu v tng hp ting ni trn th gii cho ra nhng kt qu rt kh
qua. Ti Vit Nam cng c nhiu b tng hp ting ni c pht trin nh b
tng hp Sao Mai ca trung tm Sao Mai, Hoa Sng ca trung tm nghin cu
Mica H BKHN, Ting ni phng Nam ca HQG-TPHCM. Tuy nhin, cc
b tng hp trn vn cn rt hn ch v mt cht lng ca ging ni tng hp
c. Vi mong mun c th ci thin c cht lng ca b tng hp ting ni,
n quyt nh i su vo nghin cu v h thng phn tch c php trong tng
hp ting Vit. Phn tch c php l giai on nm trong khu phn tch vn bn ca
tng hp ting ni, c nh hng rt mnh m n cc cng on khc trong tng
hp ting ni.
Ti Vit Nam, cc kt qu nghin cu ca phn tch c php vn cn rt hn
ch v y thc s l mt bi ton khng d. S phc tp ca bi ton ny th hin
mt s c im m cn phi c gii quyt nh nhp nhng ng ngha, bng n
t hp, v kh nng bao qut cc trng hp ca ngn ng.
Trn c s nhng nghin cu c sn v phn tch c php ting Vit, n
s tip tc pht trin v xut nhng gii php ci tin gip ti u ha hiu nng
ca b phn tch c php trong tng hp ting Vit.

B cc n:
Chng 1: Phn tch c php trong tng hp ting ni.
Chng ny chng ta s tp trung gii thiu v tng hp ting ni ting Vit
ng thi ch ra vai tr ca b phn tch c php trong tng hp ting ni ting
Vit. T , nu ln mc ch v nhim v ca n
xi

Chng 2 : Ting Vit v mt s chin lc phn tch c php c bn.


Chng ny gii thiu cho mi v ting Vit v mt s l thuyt c s v cc
phng php phn tch c php. y l mt chng rt quan trng, l kin thc tin
cho vic dn n quyt nh a ra xut cho m hnh v phng php phn
tch c php ca n.
Chng 3: Cc xut ca n cho phn tch c php ting Vit.
Hai chng u ch l c s l thuyt m n s dng, sang chng th 3
n s nu ra cc m hnh v cc phng php m n p dng cho phn tch c
php ting Vit. Ngoi ra, mt s ci tin trong qu trnh lm n cng s c
m t k cng.
Chng 4: Xy dng v nh gi chng trnh.
y l chng m t qu trnh ci t v xy dng chng trnh theo nhng
phng php trnh by chng 3. Sau , s th nghim v nh gi hiu nng
ca chng trnh da trn nhng kt qu t c.

xii

CHNG 1. PHN TCH C PHP TRONG TNG


HP TING NI
Trong chng ny, lun vn s gii thiu:

Tng quan v tng hp ting ni.

V tr v vai tr ca bi ton phn tch c php trong tng hp


ting ni.

Nhng nghin cu trn th gii v phn tch c php.

Nhng kh khn i vi bi ton phn tch c php ting Vit.

Nhim v ca n tt nghip.

1.1. Tng quan v tng hp ting ni


Tng hp ting ni (TTS text to speech) l tin trnh to ra ging ni nhn
to ca con ngi t u vo l vn bn hoc cc m pht m, nhng ch yu l vn
bn. i vi mt h thng TTS, vic to ra c ging ni ca con ngi t vn
bn khng kh, ci kh lm sao to ra c mt ging ni c cht lng tht tt.
Hai tnh cht quan trng dng xc nh cht lng ca h thng tng hp ting
ni l mc t nhin v mc d nghe. Mc t nhin l mc tng ng
gia ging ca ngi vi ging ca my, cn vic cu pht m ra c d hiu c
hay khng l ch mc d nghe. Mt h thng tng hp ting ni l tng phi
l mt h thng c th tha mn ti a hai tnh cht ny.

H thng tng hp ting ni


Tng hp mc thp

Tng hp mc cao
Phn tch vn bn
Phn tch cu trc
Chun ha vn bn
Phn tch ngn ng

Phn
tch
ng
m

Phn tch ng iu

Tn s c bn
Trng
Cng

La
chn
n v
m

Ghp
ni
n v
m

Hnh 1-1. M hnh h thng tng hp ting ni.

B tng hp ting ni c chia lm hai phn chnh: tng hp mc cao v


tng hp mc thp. Nhim v phn tng hp mc cao l chun ha vn bn, phn
tch cu trc, phn tch ngn ng, pht sinh thng tin v ng m, ng iu. Phn
Sinh vin thc hin: L Quang Thng-20062965-Kha K51-Lp Cng ngh Phn mm B

tng hp mc thp da vo cc thng tin pha trn s tin hnh tm kim v la


chn n v m, thc hin ghp ni v lm trn tn hiu, cho ra ting ni cn tng
hp. Sau y ta s i m t tng cng vic.

1.1.1. Chun ha vn bn
Chun ha vn bn l qu trnh pht hin v chun ha nhng on m h
thng tng hp khng th x l c v dng c th x l c. Trong h thng tng
hp ting ni, vic chun ha vn bn l cng on u tin c nh hng quan
trng trong vic m bo vn bn c c mt cch ng n. V d : mt on
vn bn ti bo v n vo ngy 08/06/2011 ti trng HBKHN s c
chuyn thnh dng c c ti bo v n vo ngy mng tm thng su nm
hai nghn khng trm mi mt ti trng i hc Bch Khoa H Ni

1.1.2. Phn tch cu trc - ngn ng


Phn tch cu trc v ngn ng l qu trnh phn tch v a ra cu trc lin
kt ca vn bn gia cc thnh phn vn bn. Trong phn tch cu trc - ngn ng
th giai on mu cht nht chnh l phn tch c php. Kt qu ca phn tch c
php c th nh hng trc tip n rt nhiu cng on trong tng hp ting ni.
V d vi mt cy phn tch c php ca cu ti ang lm n

Hnh 1-2. Cy phn tch c php ca cu ti ang lm n

1.1.3. Phn tch ng iu


B phn phn tch ng iu c nhim v m hnh ha c ng iu ca ting
ni v a ra cc thng tin v ng iu di dng s liu v lm u vo cho b
tng hp mc thp. B phn tch ng iu c nh hng ln n mc t nhin
ca ting ni tng hp.

1.1.4. Tng hp mc thp


Tng hp mc thp l qu trnh kt hp cc on tn hiu (v d nh diphone).
Cc on tn hiu ny c phn tch, x l qua mc cao (phn tch ng m,
phn tch ngn iu).
Sinh vin thc hin: L Quang Thng-20062965-Kha K51-Lp Cng ngh Phn mm B

Hin nay c ba phng php tng hp ting ni. Phng php n gin nht
pht sinh ting ni tng hp l pht cc mu ting ni thu t ting ni t
nhin (nh cc t hoc cu). Phng php ny cho cht lng tng i tt nhng
gp phi hn ch l s lng t vng trong c s d liu rt ln. Bn cnh ting
ni cng c th to ra bng cch m phng h thng pht m. Phng php ny cho
cht lng rt tt nhng thc hin kh phc tp. Mt phng php na cng c
dng tng hp ting ni l tng hp formant.

1.2. Vai tr ca phn tch c php trong tng hp ting ni


Phn tch c php ng mt vai tr rt quan trong trng tng hp ting ni.
Mt h thng tng hp ting ni mun c c cht lng ting ni tt nht chc
chn phi xem xt n lin kt ng php ca vn bn. Cc on ngt ngh gia ch
ng, v ng; cc cm t loi t mc cao n mc thp; cc t chnh trong cu,
trong on h tr trong vic c ng trng m Rt nhiu li ch c th mang li
t vic phn tch c php.
Nhng ch l mc li ch ring r. Mt ci nhn tng th hn v li ch
ca phn tch c php l c th cung cp cho h thng tng hp ting ni mt ci
nhn ton cnh v cu trc ng php ca vn bn, lm tin cho rt nhiu tng
khoa hc thng minh, sng to ci thin cht lng ca b tng hp ting ni.
Nh vy, trong tng hp ting ni, phn tch c php ng mt vai tr khng h
nh trong vic gp phn to nn mt b tng hp ting ni cht lng.

1.3. Mt s nghin cu trn th gii v phn tch c php


1.3.1. Cc m hnh phn tch c php trn th gii
Trn th gii, bi ton phn tch c php c cc nh khoa hc nghin cu
v trin khai t rt lu. Rt nhiu nhng phng php cng nh nhng m hnh
phn tch c php ra i vi nhng thnh cng ng k v cng ngy cng hon
thin hn.
M hnh vn phm phi ng cnh CFG c p dng cho phn tch c php u
tin, c s dng trong vic biu din tp lut c php, lut t vng v cc k hiu
nhn t loi, cm t loi. Tuy nhin, m hnh ny cn qu s khai v gp rt nhiu
s nhp nhng trong cng on phn tch.
M hnh PCFG l m hnh pht trin ln t CFG, k tha tt c cc c im
ca CFG. Tuy nhin, PCFG c thm mt tham s cho mt lut c php h tr b
phn tch trong vic xa b nhp nhng v mt c php ca ngn ng.
M hnh LPCFG l m hnh tin tin nht hin nay v cho kt qu kh quan
nht. LPCFG tn dng tt c cc u im ca PCFG v thm vo nhng tham s v
t vng gip b phn tch c php c th trnh c khng ch s nhp nhng v
mt ng php m c s nhp nhng cp t vng.

Sinh vin thc hin: L Quang Thng-20062965-Kha K51-Lp Cng ngh Phn mm B

1.3.2. Cc chin lc phn tch c php


1.3.2.1. Cch tip cn t trn xung (Top-down)
V mt phng php, phn tch c php theo cch tip cn t trn xung bt
u vi k hiu S (sentence). y chnh l cu trc cao nht ca mt cu v hnh
thnh nn trng thi ban u ca cu trc cu. K tip, mi k hiu trong chui trng
thi hin ti s c vit li thnh nhng cu trc thp hn da vo cc lut c sn
to thnh mt danh sch cc k hiu.
V d : Cu bt u vi k hiu S, sau n p dng lut S NP VP. Danh
sch k hiu lc ny l (NP VP). Sau , k hiu NP c ly ra xt v n tho
mn lut NPNP AP nn NP v AP s c thm vo danh sch. Danh sch k
hiu lc ny s l (NP AP VP)
Qu trnh c lp li mt cch quy cho n khi no trng thi ca cu bao
gm ton nhng k hiu kt thc.
1.3.2.2. Cch tip cn t di ln (Bottom-up)
Ging nh tn c gi, qu trnh hnh thnh cy c php ca phng php
ny i t mc thp ln mc cao hay t l ln gc. im khc bit gia cch tip cn
t di ln v t trn xung c trnh by trn l cch m lut ng php c s
dng. V d khi xt n lut :
NP ART ADJ N
Trong h thng t trn xung, ta s dng lut tm NP bng cc tm kim
chui ART ADJ N. Ngc li, trong h thng t di ln, t kt qu hnh thnh
bc trc , bn c mt chui ART ADJ N v bn gn cho chui ny nhn l
NP. V qu trnh ny cng c lp i lp li cho n khi tm c nt S. Qu trnh
phn tch kt thc thnh cng.

1.3.3. Mt s gii thut phn tch c php ni ting


1.3.3.1. Thut ton Earley
Thut ton phn tch c php Earley da trn chin lc tip cn t trn xung
bng cch i t gc v l. Tuy nhin, trnh phi xt i xt li cng mt t loi
cho mt t duy nht, gii thut ny s i cng mt lc tt c cc hng (tng ng
vi cc lut ng vin tho mn xt n thi im hin ti). y chnh l im tng
ng ca Earley so vi cch tip cn t di ln. Nh vy n khai thc c u
im ca hai phng php trn v cng ng ngha vi vic loi b i nhng
khuyt im ca tng phng php.
1.3.3.2. Thut ton CYK
Ging nh thut ton Earley, thut ton CYK (Cocke Younger Kasami)
cng l mt phng php nng cp ca hai chin lc c bn top-down v bottomup. CYK v Earley u c mt im chung l phc tp hn top-down v bottom-up
Sinh vin thc hin: L Quang Thng-20062965-Kha K51-Lp Cng ngh Phn mm B

rt nhiu v mt thut ton, nhng phc tp trong qu trnh phn tch th li gim
i rt nhiu.
1.3.3.3. Nhn xt
Earley v CYK c th coi l hai thut ton ni ting nht trong phn tch c
php, tng trng cho hai chin lc phn tch top-down v bottom-up. Hai thut
ton c rt nhiu u im so vi chin lc phn tch c bn nhng li u mang
nhng nhc im ca hai chin lc m n da theo:
Thut ton Earley tuy chc chn tr ra c cy phn tch c php nhng li
khng m bo cy bao ph ht c ton b cu.
Thut ton CYK m bo cy c th bao ph ht c ton b cu nhng li
khng m bo cy a c n ch.
V trong tng hp ting ni, iu quan trng nht l cy phn tch c php phi
bao ph ht c ton b cu u vo (nu khng b tng hp s khng c ra y
cu u vo) nn n s tm hiu v xut gii php phn tch c php cho
ting Vit da trn hng i ca thut ton CYK.

1.4. Nhim v ca n tt nghip


Hin nay, vi s pht trin ca bi ton x l ngn ng t nhin, phn tch c
php ting Vit cng t c mt s thnh tu nht nh. Tuy nhin, cc h
thng ny vn ang trong qu trnh hon thin v kt qu t c vn ch mc
trung bnh. L gii ca vic ny l do s kh khn gp phi trong qu trnh phn tch
c php ting Vit:
Nhp nhng v mt t loi: kh khn ny l do s a dng v mt t loi
ca ting Vit, cng mt t nhng cc ng cnh khc nhau li mang
ngha v loi khc nhau.
Nhp nhng v mt c php: kh khn ny l do s phc tp v nhp nhng
trong ng php ca ting Vit.
V d cu : ng gi i nhanh qu c hai cch phn tch:
Cch 1: ng gi//i//nhanh qu.
Cch 2: ng//gi i//nhanh qu.
Nh vy, qua vic xem xt qua mt s vn nu, nhng vn cn t ra
cho lun vn cn thc hin y s l :
Tm hiu v ng php ting Vit v cc gii thut phn tch c php
a ra cch m hnh ha ting Vit v phng hng gii quyt
nhng vn kh khn trn.
Tm hiu xem gii thut CYK c nhng u nhc im g khi phn tch
c php ting Vit, t a ra cc xut ci tin v mt gii thut
p dng cho ting Vit.

Sinh vin thc hin: L Quang Thng-20062965-Kha K51-Lp Cng ngh Phn mm B

Pht trin, th nghim v nh gi hiu nng ca h thng phn tch c


php.

1.5. Kt chng
Trong chng u tin ny, ta xc nh c:
Phn tch c php trong tng hp ting ni c vai tr rt quan trng nh
hng n tt c cc cng on trong tng hp ting ni.
im qua mt s m hnh CFG, PCFG,.. v cc gii thut earley, CYK
p dng cho phn tch c php.
Xc nh c nhim v c th ca n l m hnh ha ting Vit,
nghin cu v ci tin CYK kt hp vi m hnh p dng cho phn tch
c php ting Vit.
Trong chng sau, chng ta s i tm hiu v ting Vit v i su vo nghin
cu c s l thuyt ca phn tch c php

Sinh vin thc hin: L Quang Thng-20062965-Kha K51-Lp Cng ngh Phn mm B

CHNG 2. TING VIT V C S L THUYT


CHO PHN TCH C PHP
Chng ny, chng ta s tp trung tm hiu vo nhng yu t sau:

Cc c trng ca ting Vit bao gm t loi, ng loi, cc kiu cu.


Thut ton CYK phn tch c php da trn m hnh CFG
Kh nhp nhng vi m hnh PCFG

2.1. Cc t ting Vit


2.1.1. Danh t - N
Danh t l nhng t mang ngha khi qut v s vt .
V d: xe, ngi Ta c th phn danh t thnh nhng loi sau:
2.1.1.1. Danh t n th - Ns
S vt n th l nhng s vt m rt d nhn ra v chng c th tn ti thnh
tng n th, nh: nh, ngi, xe, my tnh
2.1.1.2. Danh t tng th - Nc
Danh t ny dng ch nhng s vt khng tn ti ring l m m thnh mt
tng th bao gm nhiu n th gp li, v d nh: nhn dn, qun i, bn
gh
2.1.1.3. Danh t n v - Nu
Nhng s vt ch cc vt liu, cht liu nh nc, t, ru, tht,
st, thp c c im l c th tn ti dng n th, nhng phi qua n v o
lng, tnh ton nh lt, mu, cn V d: hai lt ru, mt mu rung
2.1.1.4. Danh t tru tng - Na
S vt tru tng y c th hiu l l nhng khi nim nh: t tng, quan
im, lp trng, ngh, tr tu Khi s dng cc danh t ny lm chnh t nhn
chung khng khc g so vi cc loi danh t ch s vt khc: c th c danh t loi
th hay danh t ch s lng lm ph t. V d: mt nn t tng, nhng tm
t
2.1.1.5. Danh t ring - Np
Danh t ring l nhng danh t ch tn ring ca tng ngi, tng s vt. Ch
l trong ting Vit hin nay tn ring cn phi vit hoa. chnh l du hiu nhn
bit c bn phn bit danh t ring v danh t chung. V d: Nguyn Vn
Tun, sng Hng
Sinh vin thc hin: L Quang Thng-20062965-Kha K51-Lp Cng ngh Phn mm B

2.1.2. ng t - V
ng t c th chia nh thnh nhng loi nhng loi sau:
2.1.2.1. ng t ngoi ng - Vt
l cc ng t nh n, vit, c ... Khi s dng cc ng t ny
thng phi c ph t ch i tng (i tng chu tc ng ca hot ng).V d
nh n bnh, vit th, may o
2.1.2.2. ng t ni ng - Vi
Cc ng t ny c c im l khi dng n lm phn thuyt trong nng ct
cu l ngha, tc l khng cn c sau chnh t ca ng loi ph t ch i
tng ca hat ng. V d: em b ang ng, con chim ang bay...
2.1.2.3. ng t tn ti - Ve
S vt c th c, cn, ht hay mt. Nu nhng ng t loi ny l chnh t th
sau cn cc ph t ch s vt tn ti. V d: c tin, cn go, ht n
2.1.2.4. ng t bin ha - Vf
ch cc trng thi bin ha ca s vt, khi s dng phi c ph t ch kt
qu bin ha. V d: nn ngi
2.1.2.5. ng t ch - Vv
Cc trng thi ch l : mun, quyt, dm, toan, nh Khi s dng loi ng
t ny lm chnh t th phi c ph t ch ni dung ch. V d: dm ngh, toan
ni
2.1.2.6. ng t tip th - Va
y l trng thi mang tnh cht th ng. C hai trng thi chnh l b hoc
phi v c. ng sau cc ng t ny phi c ph t ch s vt tip th.V d: b
mng, c khen
2.1.2.7. ng t so snh - Vc
Cc s vt c th c so snh nh gi trong s so snh vi cc s vt khc
v mt phng din nht nh. C ba trng thi so snh: bng, hn v km. Cc
ng t biu hin cho cc trng thi c gi l ng t so snh. Cng nh hu
ht cc loi ng t trn, khi cc ng t ny c dng lm chnh t th thng
c ph t ch i tng i km. V d: bng nhau, hn ngi
2.1.2.8. ng t c bit: ng t l - Vz
ng t l c ngha ng php to ln trong ng php ting Vit. i vi
cu n bnh thng, vic phn chia gia hai loi cu t v cu lun ph thuc vo
s xut hin ca ng t l. V d: ti l ngi lnh c cng.

Sinh vin thc hin: L Quang Thng-20062965-Kha K51-Lp Cng ngh Phn mm B

2.1.3. Tnh t - A
C th phn t loi tnh t thnh cc tiu loi chnh sau y: tnh t hm cht
v tnh t hm lng.
2.1.3.1. Tnh t hm cht - Ai
Khi tnh t loi ny lm chnh t trong ng th trc c th xut hin cc
ph t ch mc . Ch l pha sau tnh t loi ny, trong trng hp ny, loi ph
t ch phm vi th hin tnh cht. Nhng t loi ny: tt, p, xu, thng minh,
ngoan, ngu xun V d: rt gii ton.
2.1.3.2. Tnh t hm lng - An
l nhng tnh cht nh: cao, thp, ngn, di, rng, hp, nng, su, xa,
gn Tnh t loi ny thng i km vi ph t ch nh lng, hay ch mt ci
mc c tc dng nh lng. V d: cao hai thc, di mt nghn km

2.1.4. Ph t - R
2.1.4.1. Ph t thi gian - Rt
y l cc ph t biu th ngha ng php v thi gian. l cc t: , s,
ang, va, mi, sp, tng, lin, bn, ri
2.1.4.2. Ph t mc - Rd
y l cc ph t biu th cc ngha ng php v mc . l cc t: rt,
kh, hi, qu, lm
2.1.4.3. Ph t so snh - Rc
y l cc ph t biu th rng hot ng, trang thi hay tnh cht din ra qua
so snh trong nhng iu kin thi gian, khng gian nht nh ca mt hon cnh.
Nhng ph t l: cng, u, vn, c, cn, lin tc, lin tip, khng ngng V
d: Mai v Lan u hc gii.
2.1.4.4. Ph t khng nh (RfY) ph nh (RfN)
y l cc ph t biu th ngha ph nh hay khng nh. Ngha ph nh:
khng, chng, cha. Ngha khng nh: c. V d: ti khng c tin, n c ni
di...
2.1.4.5. Ph t mnh lnh - Ri
Ph t biu th sai khin, khuyn bo, mi mc, can ngn. V d: em ng
i v mun, anh nn i hc ng gi

2.1.5. Kt t - C
Cc tiu loi t ca kt t gm c hai loi : kt t chnh ph (Cm) v kt t
lin hp (Cp).

Sinh vin thc hin: L Quang Thng-20062965-Kha K51-Lp Cng ngh Phn mm B

2.1.5.1. Kt t chnh ph - Cm
y l nhng kt t biu th quan h chnh ph. l nhng t nh: do, ca,
, bi, bi v V d: chng ti chin u anh dng nh vy ginh chin
thng.
2.1.5.2. Kt t lin hp - Cp
y l cc kt t biu th quan h lin hp. c th l cc t nh v, vi,
hay, hoc, cng hay cc cp nh nu th, tuy nhng. V d: nu tri ma
th chng ti s nh, n khng nhng ngoan m cn hc gii.

2.1.6. i t - P
2.1.6.1. i t s vt
y l cc i t dng ch s vt, ta c th s dng chng nh danh t.
Gm ba loi: i t xng h (Pp: ti, tao, my, chng my, chng n); i t
khng gian, thi gian (Pd: y, y, , kia, y); i t s lng (Pn: by nhiu).
V d: chng ti ang n trng.
2.1.6.2. i t hot ng tnh cht - Pl
y l cc i t dng ch hot ng, tnh cht: th, vy V d: vy l
ht!
2.1.6.3. i t nghi vn - Pi
Cc i t dng ch trong cu hi nh ai, g, chi, u, bao nhiu, sao, th
no

2.1.7. Tr t - M
Cm t(E): i ch, d, vng, i chao
Loi t(Nl): ci, con, cy, ngi, tm
S t(Nq): mt, hai, ba, vi, dm, mi

2.2. Cm t ting Vit


Ng l n v ng php bc trung gian gia t v cu [7].
Vic tm hiu cu to cng nh cc loi ng l cn thit tm hiu cu to
ca cu. Qua cu to ca ng, c th nhn r thm c im ng php ca t loi v
cc tiu loi.

2.2.1. Cm danh t - NP
2.2.1.1. Khi nim
Cm danh t l mt t hp t c danh t lm thnh t chnh, cc thnh t ph
ng trc v sau b ng cho thnh t chnh.V d: Nhng bng hoa ny
2.2.1.2. Cu to
a) Thnh t chnh:
Sinh vin thc hin: L Quang Thng-20062965-Kha K51-Lp Cng ngh Phn mm B

10

Thng l mt danh t chung.


Danh t chung kt hp danh t ch loi th hoc n v.
V d: - Nhng hc sinh ny, hai quyn sch, nm cn ng.
b) Cc thnh t ph trc: gm cc t:
T ch on th: Tt c, ht thy
T ch s lng: mt, haivi, dm, nhng
T ch loi th: ci, cn, chic, quyn
T ch n v: cn, mt, thc
c) Cc thnh t ph sau: rt a dng v phong ph. V cu to c th l:
1 t:
o T ch nh: y, kia, ny, nVD: Ci gh ny
o T ch tnh cht, c trng (thng l cc tnh t) VD: ng h
vng; hc sinh chuyn
1 cm t: Th ca cc em thiu nhi
1 cm ch v: Ngi nh cha ti va mi mua
2.2.1.3. Chc nng ng php ca cm danh t
Cng ging danh t, cm danh t c th lm ch ng, v ng, trng ng, b
ng, nh ng.V d:
Lan/ ang c truyn -r-mn
Hc sinh trng Chu Vn An / rt ngoan.

2.2.2. Cm ng t - VP
2.2.2.1. Khi nim
Cm ng t l cc t hp t c ng t lm thnh t chnh, cc thnh t ph
ng trc v sau b ngha cho thnh t chnh. V d: ang c sch.
2.2.2.2. Cu to
a)Thnh t chnh:
Thng l mt ng t. Khi c hai ng t i lin nhau (ng t khng c
lp v ng t c lp) c th coi ng t th nht l thnh t chnh ca ng t.
V d:
- ang hc bi.
- toan v qu

b)Cc thnh t ph ng trc: Thng l cc ph t:


Nhng t ch thi gian: , s, ang, sp, va
Nhng t ch s tip din: u, c, vn, cn, li
Nhng t ch khng nh, ph nh: khng, cha, chng
Nhng t ch mnh lnh: hy, ng, ch
C th c mt hoc nhiu ph t lm thnh t ph trc:
V d: cng vn c n lp
Sinh vin thc hin: L Quang Thng-20062965-Kha K51-Lp Cng ngh Phn mm B

11

c) Cc thnh t ph sau
V cu to: Phn ph sau c th l mt t, mt cm t hoc mt cm ch
v.
V d:
- hc bi (1t)
- n mt ci bnh (1 cm t).
- Mi ngi / bit anh y rt tch cc ( cm ch v. )
V ngha: thnh t ph sau thng b ngha cho ng t chnh.
2.2.2.3. Chc nng ng php ca cm ng t
Cm ng t c th lm ch ng, v ng, trng ng, b ng, nh ng.
Ch ng: Bo v t quc / l ngha v ca mi ngi.
V ng: Mt tri / ln cao.
Trng ng: Tan bui hp, mi ngi u ra v.
B ng: B i / i nh gic
nh ng: Quyn sch mn trn th vin / rt hay.

2.2.3. Cm tnh t - AP
2.2.3.1. Khi nim
Cm tnh t l mt t hp t c tnh lm thnh t chnh, cc thnh t ph ng
trc v sau b ngha cho thnh t chnh: VD: vn p mi
2.2.3.2. Cu to
a) Thnh t chnh: Thng l cc tnh t c mc .
V d: rt xinh p
b) Cc thnh t ph trc:
Cng nh cm ng t, thnh t ph trc ca cm tnh t cng c th l
cc ph t ch thi gian, s tip din, s khng nh hay ph nh v nht l ph t
ch mc d (rt, hi, kh, qu) tr cc ph t mnh lnh (hy, ng, ch)
V d: vn tt, cn p, rt hin
c) Cc thnh t ph sau:
V cu to: Phn ph sau c th l mt t, mt cm t mt cm ch v
V d: - ngoan lm (1 t)
- rng ba trm mt (1 cm t)
- p nh trng mi mc ( 1 cm C-V)
V ngha: cc thnh t ph sau ca cm tnh t thng b sung ngha
cho tnh t lm thnh t chnh.
2.2.3.3. Chc nng ng php
Cm tnh t cng c th lm ch ng, v ng, trng ng, nh ng, b ng.
V d: - Li cho tp th tc l li cho c nhn. (ch ng)
- N / nhanh nh sc. (v ng)
Sinh vin thc hin: L Quang Thng-20062965-Kha K51-Lp Cng ngh Phn mm B

12

- Nhanh vn vt, on tu chy v hng Nam. (trng ng)


- Mu xanh mn mn ca l / lm du c tra h. (nh ng)
- Cm / rt mun xinh nh tm. (b ng)

2.3. Cc kiu cu ca ting Vit


Cu trn thut (S): l kiu cu ni ting nht, dng miu t, nhn nh v
mt s kin. V d: Ti ang lm n tt nghip ti nh.
Cu nghi vn (SQ):L kiu cu nu ni dung hoi nghi c gii p. V
d: Cu lm xong n cha.
Cu cu khin (SC): l kiu cu nhm i hi thc hin mt hnh ng, mt
chuyn bin. V d: Phi np n vo sng nay.
Cu cm thn (SE): cu cm thn dng bc l tnh cm, cm xc. V d:
i ng tri i!.

2.4. M hnh CFG v gii thut phn tch c php CYK


2.4.1. M hnh vn phm CFG
M hnh CFG l mt m hnh s dng vn phm phi ng cnh biu din tp
lut c php. i vi ngn ng t nhin, m c bit l ting Vit, mt loi ngn
ng c tnh phc tp rt cao v mt ng php th s t do v t b rng buc ca vn
phm phi ng cnh l mt la chn rt hp l. [7]
Vn phm l mt h thng G = (N,T,S,P) trong :
- N l tp hu hn cc k hiu, gi l cc k hiu khng kt thc hay bin.
- T l tp hu hn cc k hiu , gi l cc k hiu kt thc.
- S thuc N l tp k hiu bt u.
- P l tp hu hn cc lut c dng XY, trong :
X V*NV* vi V = N T.
Y V*

2.4.2. Thut ton CYK


tng ca thut ton ny l xy dng cy phn tch c php bng cch in
y mt bng tam gic kch thc (n-1)*(n-1) vi n l s t vng u vo.
Mi ca bng tam gic gm 3 thng s: v tr bt u chui sinh, k hiu
sinh, v tr kt thc chui sinh.

Hnh 2-1. Mt phn t trong bng CYK.


Sinh vin thc hin: L Quang Thng-20062965-Kha K51-Lp Cng ngh Phn mm B

13

2.4.2.1. Thut ton to bng ca CYK[7]


u vo: vn phm G = (N,T,S,P) dng chun Chomsky, khng cha sn
xut trng, xu vo , = a1a2...an T+.
u ra: Bng phn tch T i vi sao cho tij cha A khi v ch khi
A + aiai+1...ai+j-1
a) Tp hp ti1 = {A | A ai P} , i =1..n. Sau bc ny nu ti1 cha A th r
rng ta c A+ ai.
b) Gi s tij tnh vi i ( 1 i n) v vi j' (1 j' <j)
Xt mt k hiu khng kt thc A, nu tn ti mt suy dn
A BC
m B tik v C ti+k,j-k th ta thy r rng A tij.
c) Lp li bc trn cho ti khi tij c tnh vi 1 i n, 1 j < n-i+1.
2.4.2.2. V d minh ha cho thut ton CYK
Phn tch cu : anh y rt ngu
Vi tp lut :
S

NP
S

P
NP

AP;
AP;
N;

S
NP
AP

N
N
R

AP;
P;
A;

Bng 2-1.Phn tch CYK cho cu anh y rt ngu

Bc 1: ti hng i=1, gn nhn cho t: ring vi trng hp ca t anh c


hai nhn l A or N.

Sinh vin thc hin: L Quang Thng-20062965-Kha K51-Lp Cng ngh Phn mm B

14

Bc 2: ti hng i=2, do c lut NPN P, v N sinh ra v tr(1,2) kt hp


c vi v tr(2,3) ca P nn gom c N v P thnh NP(1,3).Tng t nh th ta
c vi trng hp ca APA R:
Bc 3: S(2,5) P(2,3) AP(3,5)
Bc 4: S(1,5) NP(1,3) AP(3,5)
n y gp c k hiu bt u S, thnh cng!!!!
2.4.2.3. Thut ton CYK ci tin
Mt nhc im ca thut ton CYK c th thy r l ch p dng c vi
tp lut dng chun Chomsky, c ngha l v phi ca lut lun lun nh hn
hoc bng 2 k hiu. Trong khi , tp lut ca chng ta c rt nhiu lut v phi
nhiu hn 3 k hiu, v d nh:
S ABCD.
Gii php xut cho trng hp ny l, thay v mi ca bng ch c 3 tham
s, ta thm mt tham s wait cui, chnh l phn v phi cn thiu s dng
c lut ny (mt s ni gi l tham s cho vay).
V d vi lut
SABCD
Th ta s gp nh sau vi trng hp ca A v B thnh S (wait = CD)
nh hnh minh ha bn di:
Thng s CD trong ngoc cho thy cn thiu hai phn t CD na th mi
hon thnh c lut ni trn.
S dng phn cn thiu kt hp cc k hiu nh sau: nu mt k hiu B
l k hiu bt u tp wait ca A v v tr chui sinh ca A v B khp nhau, th A
v B s c gp li thnh k hiu A vi thng s wait = {wait(A)/B}. C gp nh
th cho n khi ta gp c k hiu bt u (S,wait=) trn cng ca bng
CYK th thut ton thnh cng.
Ly mt v d: Ta xt v d nh : phn tch cu anh ma kim v vt cn
vi tp lut:
SN VP; SNP VP;VPV N;VPVP C VP
Ta c bng phn tch CYK ca v d trn s nh sau :

Hnh 2-2. CYK ci tin vi cu"anh ma kim v vt cn".


Sinh vin thc hin: L Quang Thng-20062965-Kha K51-Lp Cng ngh Phn mm B

15

2.4.2.4. Nhng kh khn ca thut ton CYK


V thut ton CYK xt qua tt c cc trng hp c th kt hp vi nhau da
vo tp lut c php (khong 1000 lut), nn rt d dn n hin tng bng n t
hp khin h thng b qu ti hoc chy rt chm.
V d, nu c hai k hiu c nhn l N-danh t v tr thch hp N(2,7) v
N(7,8) c kt hp vi nhau th s lng k hiu c to ra bi s kt hp ca
chng l 64, mt con s qu ln! Hn na, k c khi phn tch cu xong, s c
rt nhiu cy phn tch c php c a ra khin b phn tch khng th bit c
cy no l u ra chnh xc, to s nhp nhng rt ln cho b phn tch c php.

2.5. M hnh xc sut PCFG


2.5.1. nh ngha c bn v PCFG
Mt m hnh xc sut vn phm phi ng cnh (PCFG) c nh ngha n
gin l m hnh CFG m mi mt lut sinh Ni j s c gn km thm
vi mt xc sut P(Ni j) tng ng. V cc xc sut ny phi tha mn iu
kin: i

P( N i

j ) =1.

Ni

2.5.2. Cc loi xc sut trong PCFG


2.5.2.1. Xc sut trong (inside)
Xc sut trong ca mt nt Njpq (nt Nj sinh ra on t v tr p n q trong cu)
c tnh bng quy np da trn xc sut cc nt con ca n, c th hiu mt cch
tru tng xc sut trong l gi tr hin thi ca nt v c k hiu l j(p,q).
Chng ta vn thng hay gi vui y l xc sut trong nh ca mt nt, l php o
lng nhng gi tr bn trong ca nt.[1]
Trng hp c s : Cn tnh j(k,k) ( xc sut ca lut Nj wk):
j(k,k) = P ( wk | Njkk,G) = P ( Nj wk|G)
Phng php quy np: Ta mun tnh j(p,q), vi p<q. Bi v iu ny bc
quy np s dng mt vn phm theo chun Chomsky, lut u tin phi theo dng
Nj Nr Ns, do chng ta c th thc hin theo quy np, phn chia xu u vo
cc phn nh hn v ly tng cc kt qu :
Do , j,1 p q m,
j (p,q) =

q 1

j
r s
P (N N N ) r(p,d) s(d+1,q)
r ,s

dp

Trn y, u tin chng ta chia xc sut cn tm thnh tch cc xc sut


cp nh hn, v trong cng tnh tnh xc sut trn nh vic p dng cc gi thit phi
ng cnh ca PCFG trn, nn ta c th biu din biu thc tnh cui cng thnh
mt cng thc nh ngha cho cc xc sut trong. Bng vic s dng biu thc dng
Sinh vin thc hin: L Quang Thng-20062965-Kha K51-Lp Cng ngh Phn mm B

16

truy hi ny- xc sut ca mt thnh phn c biu din bi xc sut ca cc thnh


phn nh hn, cc xc sut trong c th c p dng mt cch hiu qu khi tnh
ton vi thut ton bottom-up.
2.5.2.2. Xc sut ngoi (outside)
Xc sut ngoi ca mt nt l xc sut ha hn i n ch ca nt , hay
chnh l xc sut nt lin kt vi nt gc. Ni mt cch d hiu th xc sut
outside l kh nng i ngoi ca mt nt.
Cc xc sut ngoi c tnh t trn xung. S dng php quy np ca xc
sut ngoi yu cu s dng n cc xc sut trong, do chng ta tnh ton cc xc
sut ngoi sau khi tnh cc xc sut trong, s dng gii thut ngoi.
Trng hp c bn: Trng hp c bn l xc sut ca cy vi gc l k hiu
khng kt thc Ni :
1(1,m) = 1; j (1,m) = 0 vi j 1
Trng hp quy np : Vi trng hp ca nt Nf sinh ra Nj v Ng, nt Nj c
th nm pha bn tri hoc bn phi Ng. Chng ta ly tng ca c hai trng
hp[2]:

j(p,q) =

f , g j e q 1

f ,g

p 1

e 1

( p, e) P( N f N j N g ) g (q 1, e) +

(e, q)P( N f N g N j ) g (e, p 1)

2.5.3. Cch kh nhp nhng mt c php vi PCFG


Rt d dng tnh c xc sut ca mt cy phn tch trong m hnh PCFG :
bng tch tt c cc xc sut ca cc lut c s dng trong cy phn tch .
Trong trng hp b phn tch c php cho ra nhiu cy phn tch c php nh
trn ni, PCFG s a ra cy no c xc sut cao nht lm kt qu u ra.
Xt mt v d c th ca PCFG :
S
PP
VP
VP
P
V

NP VP
P NP
V NP
VP PP
bng
theo di

1.0
1.0
0.7
0.3
1.0
1.0

NP
NP
NP
NP
NP
NP

NP PP
Gia nh
s lin lc
theo di
con
im

0.4
0.1
0.18
0.04
0.18
0.1

Bng trn m t cc lut ca vn phm cng cc xc sut tng ng ca


chng. Nh vy, vi mt b PCFG nh trn v vi mt cu cn phn tch l Gia
nh theo di con bng s lin lc th chng ta c 2 cy phn tch cng vi xc
sut ca chng:

Sinh vin thc hin: L Quang Thng-20062965-Kha K51-Lp Cng ngh Phn mm B

17

(t1)

Hnh 2-3. Cy phn tch t1.

Xc sut ca cy phn tch t1 ny l :


P(t1) = 1.0 0.1 0.7 1.0 0.4 0.18 1.0 1.0 0.18= 0.0009072
(t2)

Hnh 2-4. Cy phn tch t2.

Xc sut ca cy phn tch t2 ny l :


P(t2) = 1.0 0.1 0.3 0.7 1.0 1.0 0.18 1.0 0.18= 0.0006804
Do P(t1)>P(t2) nn kt qu u ra s l cy t1.
Nh vy ta c th ni rng, mc ch ca b phn tch c php s dng vn
phm PCFG l a ra c cy phn tch c php c xc sut cao nht hay nt S c
inside ln nht.

2.6. Kt chng
Chng ny n trnh by v ting Vit v c s l thuyt phn tch c
php ting Vit vi thut ton CYK ci tin v m hnh PCFG. Chng sau, n
s trnh by v cc ci tin cng nh nhng tng, xut p dng cho phn tch
c php ting Vit.

Sinh vin thc hin: L Quang Thng-20062965-Kha K51-Lp Cng ngh Phn mm B

18

CHNG 3. CC XUT CA N CHO PHN


TCH C PHP TING VIT
Chng ny n s trnh by v nhng gii php m n s s dng trong
phn tch c php ting Vit da trn m hnh PCFG:

Ci tin tc ca h thng phn tch c php vi thut ton Beam Search


p dng cho thut ton CYK.

Ti u ha u ra ca h thng phn tch c php, khc phc nhng nhc


im ca thut ton CYK-Beam Search bng gii thut tm kim A*.

Ci tin tc ca b phn tch c php A* bng mt sng kin ca chnh


n: gii thut lelightwin!

3.1. Thut ton CYK beam search


3.1.1. Nhn xt v m hnh PCFG v thut ton CYK ci tin
Vi m hnh PCFG, chng ta c th kh c nhp nhng gia cc cy phn
tch c php u ra bng cch cho ra cy phn tch c php c gi tr xc sut cao
nht. Tuy nhin, n vn cn nhng vn m chng ta cn phi xem xt:

Trong tng hp ting ni, vn tc cng l mt yu t cn c xem


trng, phn tch c php cng khng ngoi l. Tuy nhin vi thut ton CYK
ci tin th vn tc c v kh nan gii. S lng t hp bng n trong
qu trnh phn tch khin cho thut ton sinh bng kiu vt cn ny mt
rt nhiu thi gian cho ra kt qu.
c bit l vi ting Vit, mt ngn ng vi mt b lut c php rt phc tp
th s lng t hp bng n l rt ln. Vi nhng trng hp ca cu di v
kh, thut ton CYK c th b trn b nh hoc phi mt khong vi ting
phn tch xong, v cn cho ra qu nhiu cy phn tch c php, k c khi
c PCFG kh nhp nhng th y vn l mt bi ton kh.

Chnh v vy, trong phn tip theo, n s trnh by mt s nhng phng


php ci tin gip b phn tch c php va c th tn dng li th ca PCFG li
va c th ci tin c tc ca h thng phn tch c php.

3.1.2. Thut ton tm kim beam search


Vi mt nt, xc sut trong ca mt nt l kh nng i ni trong vng m n
bao ph, cn xc sut ngoi ca mt nt l kh nng ngoi giao, l kh nng n kt
hp vi vng bn ngoi vng bao ph ca n v dn n kt qu. Cho nn, nu kt
hp c hai xc sut ny li n s tr thnh tiu ch nh gi xem nt no c gi
tr hn v mt thut ton. C th:
Sinh vin thc hin: L Quang Thng-20062965-Kha K51-Lp Cng ngh Phn mm B

19

P(nt) = inside(nt) + outside(nt). (1)


Nt no c P(nt) cng ln th nt s cng l ng c vin nng k cn phi
xt duyt. Ngc li, nt no c P(nt) b th rt c th s l mt nt tha, khng
th hng n ch.
Chnh v l , gim bt s lng cc phng n cn xt duyt, n quyt
nh s dng phng php beam search. Vi phng php ny ti mi bc ca
thut ton CYK, ta ch xt mt s lng phng n n cho trc, vi n c gi l
b rng ca beam.
tng ca thut ton beam search p dng cho phn tch c php: vi mi
mt bc lp ca thut ton, chng ta s sp xp cc nt hay cc ng c vin th t
t cao n thp ca p(nt) v ch ly n ng c vin sng nht, nhng ng c vin
thp hn s b loi i. B rng ny c th thay i hoc khng thay i ty vo s
tinh t ca ngi s dng.[8]
Gi s ti mt bc c n ng c vin v max l nt ng c vin sang gi nht
Nu P(max) P(nti) > threshold th loi b nt i vi i=1,n
Trc quan v thut ton beam search c th c m phng bi hnh v di
y, vi cc nt xanh l cc nt c xt, cn cc nt mu en l cc nt b ct ta
i bi beam search.

Hnh 3-1. Minh ha vui cho thut ton ct ta beam search.

Nh vy, trong mi bc ca CYK, chng ta c th loi i mt c s nhng


phng n c cho l khng th dn n ch. S lng t hp bng n cng v
th m gim i rt nhiu.

3.1.3. Quy trnh p dng v thc hin thut ton beam search
Nh trnh by trn, thut ton beam search ph thuc rt nhiu v c
lng nh gi cho mi nt. Nu hm c lng tnh ton c th khin cho vic cc
nt dn n cy phn tch c php c xc sut cao nht b ct ta mt, dn n vic
Sinh vin thc hin: L Quang Thng-20062965-Kha K51-Lp Cng ngh Phn mm B

20

kt qu u ra khng chnh xc. Mc d c cng thc nh (1), nhng t cng


thc i n qu trnh thc hin l c mt vn .
3.1.3.1. Cch tnh inside trong beam search
Vi trng hp ca mt nt A bt k no y, A c th c to nn t nhiu
hn mt nhnh cy c php. iu ny nghe c v kh hiu, nhng chng ta th
tng tng trong tp cc nt c xt c tn ti bn nt B C D E c th kt hp
vi nhau theo v tr thnh hai cp (B,C) v (D,E) v trong tp lut c php tn ti
hai lut {A B C} v {A D E}, vy th vi phn tch ca ring nt A ta c hai
nhnh:
Theo nh cng thc tnh xc sut inside theo m hnh PCFG xut trn th
cng thc s l :
inside(A) = P(AB C)*inside(B)*inside(C)+P(AD E)*inside(D)*inside(E)
Tuy nhin, thut ton phn tch c php ca chng ta hng n mt v ch
mt cy phn tch c xc sut ln nht nn n quyt nh s dng phng n tnh
inside ca mt nt bng xc sut ln nht ca nhnh to nn nt . C th:
PAB C = P(AB C)*inside(B)*inside(C),
PAD E = P(AD E)*inside(D)*inside(E),
Th inside(A) = max(PAB C , PAD E )
Vy cng thc tng qut ca chng ta tnh inside ca mt nt s l :
j (p,q) = max(P(Nj Nr Ns) r(p,d) s(d+1,q))

Tuy nhin mi ch dng li mt cng thc, khi tnh inside ca mt nt S


bt k vn cn mt cht kh khn. cho d hiu, ta s gi s S c to thnh t
lut S NP NP VP ,ta c hai trng hp cn phi xt:
Nt S l nt hon chnh, ngha l bin wait ca n bng rng
Trong trng hp ny chng ta s dng y nguyn cng thc nh va ny,
tc l :
inside(S(wait=""))=lg(P(SNP NP VP))+ inside(NP)+inside(NP)+ inside (VP)

y chng ta s dng log thay v php nhn nh cng thc tng qut v
php nhn cc xc sut khin cho kt qu tnh ra rt nh, gn nh bng 0. iu ny
dn n mt s tr ngi trong vic so snh v tnh ton.

Hnh 3-2. Hnh nh ca mt nt phn tch hon chnh.


Sinh vin thc hin: L Quang Thng-20062965-Kha K51-Lp Cng ngh Phn mm B

21

Nt S l nt cha hon chnh, hay ni cch khc n l mt nt gi, bin


wait ca S khc rng .
Trong trng hp ny, chng ta c th hnh dung y l nt S(wait= VP)
c to nn t hai phn t NP NP bng lut S VP NP NP. Vy cng thc inside
trong trng hp ny s l :
inside(S(wait="VP"))=lg(P(SNP NP VP))+ inside(NP)+inside(NP)(1)

Hnh 3-3. Hnh nh ca mt nt phn tch s dng bin wait.

Nu sau nt S(wait= VP) gp c phn t VP v tr tng ng th n


s kt hp vi VP thnh mt nt S mi vi inside l :
inside(S(wait=""))= inside(S(wait="VP"))+inside(NP)(2)
T (1) v (2) ta c th suy ra:
inside(S(wait=""))=lg(P(SNP NP VP))+ inside(NP)+inside(NP)+ inside (VP)

Cng thc chnh l cng thc tnh inside ca mt nt hon chnh. Vy cch
lm c xut trong trng hp nt gi l chp nhn c.
3.1.3.2. Cch tnh xc sut outside trong beam search
Tnh inside ca mt nt c trnh by bn trn nghe c v phc tp nhng
thc ra li rt d dng. V gii thut inside cng c thc hin da theo chin lc
bottom-up y nh thut ton CYK. Nhng vi outside, s kh khn trong vic tnh
ton mt cp khc hon ton. V gii thut outside c tnh ton da theo
chin lc top-down, ngha l phi xut pht t nhng hng trn trong bng
CYK th mi c th tnh c nhng hng di, iu ny ngc hn vi inside.
Nh hnh v minh ha bn di, y l mt cy phn tch c php hon chnh,
v nt s 4 ang l nt c xt. By gi nu mun tnh c outside ca nt s 4
chng ta phi s dng n ton b phn c t m. C th:
Outside(4) = lg(2 4 5) +Outside(2) + inside(5).
Outside(2) c tnh quy np theo Outside(4):
Outside(2) = lg(1 2 3) +Outside(1) + inside(3).
Outside(1) = 1 nu {1} l S .
=0 nu {1} khng phi l S.
Sinh vin thc hin: L Quang Thng-20062965-Kha K51-Lp Cng ngh Phn mm B

22

Hnh 3-4. Hnh nh m phng outside ca mt nt.

Nh vy mun tnh c outside ca mt nt ta phi bit c ton b nhnh


cy phn tch c php mc trn, iu ny l khng th vi gii thut CYK!
gii quyt kh khn ca bi ton tnh outside, ngi lm n xut
ra hai t tng i. Ti sao li l tng i? V chng ta s khng tnh outside
tht s, chng ta ch tnh outside mc tng i. Ngha l thay v tnh outside s
dng ton b cc nt mc trn th chng ta ch s dng cc nt trn nt c xt
1 mc tnh outside. Khi mun tnh outside ca cc bc i, chng ta s tm
thi b trng outside ca cc ny trong bc i. Khi n bc i+1, chng ta mi s
dng cc c to ra bc ny tnh outside cho cc bc i. Ni mt cch
ngn gn, l chng ta s dng cc hng trn tnh outside cho cc hng
di.[8]
C th, cch tnh outside cng c chia lm hai trng hp nh inside:
o Nt c xt l nt hon chnh, v d nh trng hp ca nt D di
y, th cng thc tnh outside ca nt D s l :
Outside(D) = lg(P(A E D)) * 10^inside(E) +lg(P(B D H)) * 10^inside(H)

Hnh 3-5. Outside ca trng hp nt hon chnh.

o Nt c xt l nt cha hon chnh, li vn trng hp ca nt D


trn, c to ra bi hai nt F G nhng lut hon chnh l D F G H
Sinh vin thc hin: L Quang Thng-20062965-Kha K51-Lp Cng ngh Phn mm B

23

nn bin wait ca D s vn cn li H. Lc ny cng thc tnh outside


ca D s l:
Outside(D) = lg(P(D F G H)) * 10^inside(H)

Hnh 3-6. Outside ca trng hp nt cha hon chnh.

3.1.3.3. Quy trnh thc hin thut ton beam search


Quy trnh thc hin thut ton beam search gm cc bc nh sau :
o Khi to cc hng 1 (l cc t trong cu c gn nhn).
o Ti hng th i, lm cc cng vic sau:
To ngng B lm bin ct ta (ty chnh cho ph hp).
s dng cc phn t to ra trong cc c tnh outside cho
cc phn t trong cc hng th i-1.
Vi cc phn t hng i-1 c c outside v inside, tnh
heuristic cho cc phn t , heuristic = inside + outside ca
mt phn t. Sau chn ra phn t c heuristic ln nht k
hiu l max.
Vi mi phn t j hng i-1 kim tra nu :
Heuristic(max)-heuristic(j)>ngng B
th loi j ra khi danh sch c xt, ng thi loi lun c cc
phn t thuc hng i c to ra bi j.
o Tip tc lp li bc 2 cho n khi in y bng phn tch ca thut
ton CYK.

3.1.4. Nhn xt v thut ton beam search


Beam search thc cht l mt thut ton tm kim ng i ph bin trong tr
tu nhn to. Phng thc tm kim da trn hm c lng ca n c th cho kt
qu rt tt v ci thin ng k v mt tc . Tuy nhin, cng nh bao thut ton
khc, n cng c nhng nhc im v nhng u im m chng ta cn phi xem
xt.
u im: thut ton beam search c xut trong n c u im l
nhanh, khng qu phc tp trong khu tnh ton outside nh mt s cc thut ton
v mc ch ca n ch l ct ta i nhng ng c vin ti nht ch khng phi l ly
ra ng c vin sng gi nht nn i hi outside khng cn phi chnh xc hon
Sinh vin thc hin: L Quang Thng-20062965-Kha K51-Lp Cng ngh Phn mm B

24

ton. S dng thut ton beam search c th gip cho b phn tch c php ci thin
rt ng k v tc . Nu s dng thut ton CYK kt hp vi beam search, th k
c vi mt cu c phc tp l 40 t (cha k s m tit), b phn tch c php
ch mt vi pht l c th cho ra kt qu. Bnh thng nu s dng thut ton CYK
vt cn bng, thi gian b phn tch c php cho ra kt qu c th ti vi ting!
Nhc im: Tuy ni l c ci thin v mt tc nhng thc ra thut ton
beam search khng phi l mt thut ton ti u. Nu chng ta cho ngng qu
nh, beam search s c th ct ta mt c phng n ti u, cn nu m bo
rng beam search s khng ct ta mt phng n ti u th ch cn cch l tng
ngng ln rt ln, m iu ny cng gn tin ti vic s dng thut ton vt cn!
Chnh v vy khng c g m bo rng thut ton beam search s gip b phn tch
c php a ra c u ra ti u c.

3.2. xut s dng thut ton A* cho phn tch c php


Trong tr tu nhn to, thut ton A* thuc h thut ton Best-first-search l
mt thut ton c bit n rng ri nht. L mt thut ton thng c s dng
rt nhiu trong cc bi ton tm kim, A* xy dng tng dn tt c cc tuyn ng
t im xut pht cho n khi n tm thy im kt thc.
Best-first-search ni chung u tm ng i bng c lng heuristic tc l
thut ton ny s i theo mt con ng c v dn v ch nht. A* cng ging nh
vy, nhng im khc bit ca thut ton A* so vi cc thut ton khc thuc h
BFS l A* c xt n c qung ng i qua, iu ny khin cho thut ton A*
tr nn ti u v y . Ti u ngha l nu c mt ng i ngn nht dn
n ch, A* chc chn s tm ra c. y ngha l A* s lun tm thy li gii
nu nh bi ton c li gii. Hay ni mt cch khc, ch cn mt bi ton c li
gii, A* chc chn c th tm ra c li gii tt nht!

3.2.1. Thut ton A*


A* lu gi mt tp cc li gii cha hon chnh, ngha l cc ng i qua
th, bt u t nt xut pht. Tp li gii ny c lu trong mt hng i u tin
(priority queue). Th t u tin gn cho mt ng i x c quyt nh bi hm
f(x) = g(x) + h(x).
Trong , g(x) l chi ph ca ng i cho n thi im hin ti, ngha l tng
trng s ca cc cnh i qua. h(x) l hm nh gi heuristic v chi ph nh nht
n ch t x.
Trong , vic la chn h(x) rt quan trng, n quyt nh s hiu qu ca
thut ton A*. Hm c lng h(x) c 1 iu kin l phi tha mn h(x)<h(x) vi
h(x) l chi ph thc s n ch t x.
V d, nu "chi ph" c tnh l khong cch i qua, khong cch ng
chim bay gia hai im trn mt bn l mt nh gi heuristic cho khong cch
cn phi i tip.
Hm f(x) c gi tr cng thp th u tin ca x cng cao (do c th s
dng mt cu trc heap ti thiu ci t hng i u tin ny).
Sinh vin thc hin: L Quang Thng-20062965-Kha K51-Lp Cng ngh Phn mm B

25

function A*(im_xut_pht,ch)
var ng := tp rng
var q := to_hng_i(to_ng_i(im_xut_pht))
while q khng phi tp rng
var p := ly_phn_t_u_tin(q)
var x := nt cui cng ca p
if x in ng
continue
if x = ch
return p
b sung x vo tp ng
foreach y in cc_ng_i_tip_theo(p)
a_vo_hng_i(q, y)
return failure

Trong , cc_ng_i_tip_theo(p) tr v tp hp cc ng i to bi vic


ko di p thm mt nt k cnh. Gi thit rng hng i c sp xp t ng bi
gi tr ca hm f.
"Tp hp ng" (ng) lu gi tt c cc nt cui cng ca p (cc nt m cc
ng i mi c m rng ti ) trnh vic lp li cc chu trnh (vic ny
cho ra thut ton tm kim theo th). i khi hng i c gi mt cch tng
ng l "tp m". Tp ng c th c b qua (ta thu c thut ton tm kim theo
cy) nu ta m bo c rng tn ti mt li gii hoc nu hm
cc_ng_i_tip_theo c chnh loi b cc chu trnh.

3.2.2. A* trong phn tch c php


3.2.2.1. Gii thut A* p dng cho phn tch c php
Nh phn tch trn, i vi mt bi ton tm kim, ch cn c li gii, A*
chc chn c th a ra c li gii ti u. Tham chiu sang trng hp ca phn
tch c php, A* c th m bo a ra c cy phn tch c php tt nht. Tm
thi trong phm vi ca n, n ch c th p ng ci tt nht ca mt cy
phn tch c php mc l cy s c xc sut cao nht.
A* trong phn tch c php ngc li vi A* t nhin. Thay v tm theo u tin
nh nht, chng ta li tm ng theo u tin ln nht v ci chng ta cn l cy
phn tch c php c xc sut cao nht. A* c thc hin trn cc phn t c bn
c gi l element gm 3 thng s nh ca CYK: {nhn, start, end}. V mt gii
thut, A* c hai tp hp, c gi l AGENDA v CHART [5]. Trong
AGENDA l tp cc element ang ch c xem xt, cn CHART l tp cc
element c xt. Thut ton A* gm cc bc nh sau:
o Bc 1: u tin, cc t trong cu s c gn nhn ri tt c s c
y ht vo trong AGENDA. V d, t th i trong cu vi nhn ci s
bin thnh phn t ci[i,i+1] ri c thm vo AGENDA .
o Bc 2: Sau lp i lp li cc cng vic sau:
Ly mt phn t c c lng u tin cao nht ra khi
AGENDA xt.
Sinh vin thc hin: L Quang Thng-20062965-Kha K51-Lp Cng ngh Phn mm B

26

Nu ng c vin ly ra khi AGENDA vn cha c trong


CHART th kt hp n vi tng phn t trong CHART da
vo tp lut c php. Tt c cc phn t mi c to ra s
c thm vo trong AGENDA. V d : NP[0,2] l phn t
c ly ra khi AGENDA, trong CHART c tn ti phn
VP[2,8] v trong b lut c php c lut S NP VP th
NP[0,2] v VP[2,8] s kt hp vi nhau to thnh S[0,8].
S[0,8] s c thm vo AGENDA, cn NP[0,2] th c thm
vo CHART.
o Bc 3: bc 2 c lp li cho n trong AGENDA khng cn phn
t no xt hoc trong CHART xut hin element S[1,n].
Cn y l code gi m phng thut ton A*:
Procedure A* Parsing
For wi in sentence
wi~<Xi ,i ,i+1>.
AGENDA.add(<<Xi ,i ,i+1>,0>); //inside lc u = 0;
Endfor;
CHART empty;
while ((!AGENDA.isempty) and ( !CHART.contain(<S, 1, n + 1>)))
Choose <<Y, i, j>,w> from AGENDA with max(w+h(Y));
if (!CHART.contains(<Y, i, j>)) then
CHART. add (<<Y, i, j>,w>);
For <<Z, j, k>,w> in CHART and (X Y Z,w)
AGENDA. add(<<X, i, k>,w + w + w >);
Endfor;
For <<Z, k, i>,w> in CHART and (X Z Y,w)
AGENDA. add(<<X, k, j>,w + w + w >);
Endfor;
Endif;
Endwhile;
If S.contain(<S, 1, n + 1>) then successParse
else failureParse;
EndProcedure;
Ch thch :

w, w tng trng cho inside ca nt i km.


w tng trng cho xc sut ca lut i km.
h(Y) l c lng chi ph nh nht i n ch t Y.

3.2.2.2. Hm u tin trong A*


Cch tnh hm u tin trong A* nh ni trn gm c hai thnh phn l
g(x) chi ph ng i hin ti, h(x) c lng chi ph nh nht n ch t X.
Cn trong phn tch c php, cng thc tnh hm u tin cho A* nh sau.
F(nt) = inside(nt) + outside(nt)
Phn tch: inside ca mt nt l xc sut ca cy con c sinh ra t nt ,
thng s ny c th coi nh g(x), v tnh c th tnh c n mt cch rt d dng.
Sinh vin thc hin: L Quang Thng-20062965-Kha K51-Lp Cng ngh Phn mm B

27

Nhng outside th li khc, v ti mt nt khng th no tnh c outside chnh xc


c v mun tnh outside thc s th phi phn tch c ton b cu, iu ny l
khng th. V th, chng ta ch c th tnh c outside mt cch tng i k
hiu l a(nt) nh trng hp ca CYK. V phi chn cch tnh sao cho
a(nt)>= (nt) outside thc s ca nt, ngc li vi h(x)>h(x) thut ton A*
gc v y chng ta cn tm phng n c tng trng s ln nht ch khng phi
nh nht. c th tnh c outside ca gii thut A* trong phn tch c php,
n xut ra 3 phng n:
3.2.2.2.1. S dng duy nht xc sut inside
Mt trong nhng cch tnh n gin nht l khng s dng outside, outside
ca tt c mi trng hp u bng khng. Lc ny ng c vin s l nt c inside
ln nht, bi ton lc ny tr v tm kim theo thut ton ni ting Dijkstra. Do
khng phi tnh ton outside cho cc nt nn thi gian tnh ton ca mi bc l rt
nhanh. Tuy nhin, do khng c hm c lng nn s bc phi xt l kh ln.
3.2.2.2.2. Tnh outside mt tng
Cch tnh th hai l s dng cc nt trn nt cn tnh mt tng tnh
outside ca nt . C th, mun tnh outside ca mt nt X trong AGENDA, ta ly
X kt hp vi cc nt trong CHART to ra cc nt mc trn tnh ri chn ra
outside ln nht. S kt hp ny ch xy ra mt ln duy nht. v outside c chn
ra l outside ln nht li ch kt hp mt tng nn m bo (X)>=a(X). Tuy nhin
v ch kt hp mt tng nn s c lng n ch t nt hin ti kh hi ht, cng
thm vic tnh ton outside cho tt c cc nt trong AGENDA ly ra ng c vin
rt phc tp. S hi ht cng thm s phc tp khin cho thut ton A* tm c
ch rt chm. Gii php ny c mt u im l d ci t, v trong mt s trng
hp vn c th dn n ch rt nhanh, nhng cng gp phi rt nhiu nhng trng
hp khng th n c ch do c lng qu hi ht. Nhng lc nh vy, gii
thut A* mt tng t ra km hiu qu hn c thut ton Dijkstra.
3.2.2.2.3. Tnh outside bng phng php rt gn tp lut
Nguyn nhn chnh gy ra s kh khn trong cng on tnh trc outside
chnh l do tp lut c php qu phc tp (khong 938 lut). Nn khi kt hp nt X
vi bt k mt nt Y no , s lng t hp bng n ra l qu ln khin cho vic
tnh ton outside cho mt nt cng tr nn v cng kh khn ch cha ni n
vic tnh outside cho tt c cc nt trong AGENDA. Chnh v th n quyt nh
xut ra phng n tnh ton outside bng cch rt gn tp lut c th tnh
tng i outside ca mt nt mt cch d dng hn.
tng ca gii thut ny l vic chia tp lut c php ra thnh cc nhm nh
Ri, mi nhm Ri s c ra mt lut i din ri c xc sut ln nht. Gom tt c cc ri
li, chng ta s c mt tp lut mi : Max = {ri} tp hp nhng lut ln nht.
Tp lut Max n gin hn nhiu so vi tp lut c php gc nn d nhin phn tch
bng Max cng s gim thiu phc tp i mt cch ng k. C th, vi mt nt
Sinh vin thc hin: L Quang Thng-20062965-Kha K51-Lp Cng ngh Phn mm B

28

X bt k trong AGENDA, X s c kt hp vi cc nt trong CHART bng Max


to ra tp cc nt Y, cc nt Y li c kt hp vi nt trong CHART bng tp
lut Max. Lp li i lp qu trnh trn cho n khi khng th kt hp c na hoc
tm thy nt S[1,n]. Sau qu trnh ny ta c mt c s cc cy phn tch outside
t v tr ca nt X, chn ra cy c xc sut ln nht, ta s c c outside tng i
c tnh theo phng php ti gin tp lut. V kt qu l phn tch outside c xc
sut ln nht ca nt X c phn tch bng tp lut ln nht, nn trng hp ny
cng tha mn iu kin (X)>=a(X).
Di y l code gi ca qu trnh tnh ton outside bng phng php rt gn
tp lut.
outside(state)
for y in CHART
for (xy state) in maxGrammar
cost = inside(y) + outside(x) + log P(xy state);
score = max(score,cost);
endfor;
for (xstate y) in maxGrammar
cost = inside(y) + outside(x) + log P(xstate y);
score = max(score,cost);
endfor;
endfor;
end;

Tp lut Max cng phc tp th c lng cng chnh xc nhng ng thi


phc tp trong tnh ton outside cng tng ln. Qua thc t khi ci t v s
dng, thut ton ny t ra rt hu dng, vi tc v chnh xc rt ng k, v
tuy phi tnh ton nhiu bc xc nh outside nhng s bc lp l rt t do hm
c lng tt.

3.3. xut gii thut lelightwin nhm ci tin A* trong phn


tch c php ting Vit.
3.3.1. t vn
Theo nh gii thut A* trnh by trn th ti bc 2, ta s ly ng c vin
sng gi nht ra khi AGENDA v cho kt hp vi tng phn t trong CHART, s
kt hp ny s to ra mt t hp mi trong tp AGENDA. Nu nh hai phn t c
th kt hp thnh mt phn t y th mi chuyn tr nn qu d dng. Tuy
nhin, a s cc trng hp kt hp nh trn li u li bin wait dn n vic
cc t hp c sinh ra l qu nhiu. V d nh trng hp kt hp hai phn t N
v N to ra n 64 phn t trong c n 63 trng hp l c bin wait. Nu phn
tch nhng cu nh tm khong 20 tokens xung th khng gp nhiu vn .
Nhng nu vi nhng cu di v kh vi phc tp l 40-50 tokens th s lng
cc bc cn phi lp c th ln n gn 10000 bc!
L do ca s bng n ny l do tp lut c php ca chng ta qu ln. Cho nn
khi kt hp hai phn t X v Y c s dng bin wait th tt c cc lut c php c
dng Z X Y. u gp phn to ra t hp phn t khng l.
Sinh vin thc hin: L Quang Thng-20062965-Kha K51-Lp Cng ngh Phn mm B

29

Qua , ta c th thy rng trong trng hp tp lut c php qu ln, th vic


s dng bin wait c th gy bng n t hp. Th nn n quyt nh xut
ra mt gii thut ci tin cho thut ton A* v vn ny, c gi l gii thut
lelightwin.

3.3.2. Gii thut lelightwin


3.3.2.1. tng ca gii thut lelightwin
tng ca gii thut ny l khng s dng phn t wait hay cn gi l phn
t cha hon chnh phn tch, m s kt hp tt c cc phn t c th to
thnh mt phn t hon chnh, ngha l s lng phn t kt hp c th ln hn 2.
V d nh khi kt hp phn t ng c vin N(2,7) ly ra t agenda vi phn t
V(7,8) ta s c duy nht mt phn t NP(7,8), ch khng to ra mt t hp nh
trng hp di y:
NP(2,8, wait="");NP(2,8, wait=", A");NP(2,8, wait="AP");
NP(2,8, wait="AP NP");NP(2,8, wait="AP PP");NP(2,8, wait="MP");
NP(2,8, wait="N");NP(2,8, wait="NP");NP(2,8, wait="NP PP");
NP(2,8, wait="NP VP");NP(2,8, wait="P");NP(2,8, wait="PP");
NP(2,8, wait="PP PP");NP(2,8, wait="VP");
Vi trng hp ca gii thut to bng CYK, th vic thc hin ci tin
lelightwin l iu khng th, v bng CYK ch c th cho php kt hp 2 trong
bng. Nhng vi gii thut A*, mi mt phn t ly ra khi AGENDA s c kt
hp vi mt tp CHART, nn vic s dng ci tin lelightwin l iu hon ton c
th c.
Nh phn tch trn, gii thut A* - lelightwin s khng s dng bin wait
cho trng hp phn t khng hon chnh na. Nhng iu ng ngha vi vic
ti bc 2 ca gii thut A*, ng c vin X c ly ra khi AGENDA s khng
ch n thun l kt hp vi tng phn t trong CHART. Chng ta s phi lit k ra
tt c nhng chui gm nhng phn t cha trong CHART c th kt hp vi X.
Tt nhin, nhng phn t trong cng mt chui phi c v tr thch hp vi nhau.
V d: gi s vi phn t ng c vin c v tr bt u v kt thc l X(7-10)
v bng CHART nh sau:
Bng 3-1. Cc phn t trong CHART

X1(1-8)
X2(6-16)
X3(15-35)
X4(5-20)
X5(2-7)
X6(10-11)
X7(8-27)
X8(2-21)
X9(9-11)
X10(2-13)
X11(6-14)
X12(15-26)
X13(14-23)
X14(5-18)
X15(1-7)
X16(9-16)
X17(12-17)
X18(7-18)
X19(6-25)
X20(13-26) X21(11-16)
X22(9-24)
X23(11-20)
X24(8-18)
X25(7-16)
X26(14-16)
X27(4-6)
X28(13-21)
X29(4-8)
X30(11-13)
Tt c nhng chui c th em ra kt hp ca ng c vin X[7-10] l :
Bng 3-2. Cc chui kt hp ca X vi CHART

V tr
[2-7] [7-10]

K hiu
X5 X

Sinh vin thc hin: L Quang Thng-20062965-Kha K51-Lp Cng ngh Phn mm B

30

[1-7] [7-10]
X15X
[1-7] [7-10] [10-11] [11-13]
X15 X X6 X30
[2-7][7-10][10-11] [11-13] [13-26]
X5 X X6 X30 X20
[7-10] [10-11] [11-20]
X X6 X23
[7-10] [10-11] [11-13]
X X6 X30
[7-10] [10-11] [11-13] [13-26]
X X6 X30 X20
[7-10] [10-11] [11-13] [13-21]
X X6 X30 X28
[2-7] [7-10] [10-11]
X5 X X6
[2-7] [7-10] [10-11] [11-16]
X5 X X6 X21
[2-7] [7-10] [10-11] [11-20]
X5 X X6 X23
[2-7] [7-10] [10-11] [11-13]
X5 X X6 X30
[7-10] [10-11] [11-16]
X X6 X21
[1-7] [7-10] [10-11]
X15 X X6
[2-7] [7-10] [10-11] [11-13] [13-21]
X5 X X6 X30 X28
[1-7] [7-10] [10-11] [11-16]
X15 X X6 X21
[1-7] [7-10] [10-11] [11-20]
X15 X X6 X23
[7-10] [10-11]
X X6
[1-7] [7-10] [10-11] [11-13] [13-26]
X15 X X6 X30 X20
[1-7] [7-10] [10-11] [11-13] [13-21]
X15 X X6 X30 X28
Sau nhng chui ny s c kim tra, nu chui l v phi ca mt lut
no trong tp c php chng Z X15 X X6 X30 X20, th Z s c thm vo
AGENDA x l. V thut ton ny cng lp cho n khi tm c p n S(1-n)
trong CHART hoc trong AGENDA khng cn phn t no xt.
Gii thut ny s khin cho thi gian phi thc hin mi bc ca gii thut
A* tng ln tuy nhin s bc phi xt s nh hn nhiu v khng xy ra trng
hp bng n t hp. Bi ton t ra y l lm sao c th x l vic a ra cc
chui kt hp vi thi gian ngn nht. gii quyt vn ny, trong phn sau,
chng ta s i su vo m t r cch thc hin thut ton lelightwin.
3.3.2.2. M hnh thut ton lelightwin c bn
M hnh ca thut ton lelightwin c m t nh s bn di, gm c hai
giai on: giai on phn loi phn t v giai on sinh chui kt hp.

Hnh 3-7. M hnh ca thut ton lelightwin.

3.3.2.2.1. Phn loi phn t


Sinh vin thc hin: L Quang Thng-20062965-Kha K51-Lp Cng ngh Phn mm B

31

Cch phn loi phn t ca gii thut lelightwin c to ra da trn tng


ca thut ton pigeonhole sort, hay cn gi l thut ton sp xp b cu. Mt gii
thut tng chng nh khng c g lin quan y, nhng thc t l c. Gii thut
lelightwin cng to ra cc khi add phn t (b cu) vo, nhng khng phi
sp xp m l sinh chui.

Hnh 3-8. Giai on phn loi phn t trong CHART.

Cc khi trong gii thut lelightwin c chia lm hai loi: khi nhng phn
t nm bn tri X, v khi nhng phn t nm bn phi X.
Nhng phn t nm bn tri X: y l nhng phn t m v tr end (kt
thc) ca n <= v tr start (bt u) ca X. Nhng phn t c cng v tr end s
c add vo trong khi c gn nhn l end. V d khi 2 s gm nhng phn
t bn tri X c v tr kt thc bng 2. Tt c cc khi ny nm trong mt khi to
hn gi l khi bn tri.
Nhng phn t nm bn phi X: y l nhng phn t m v tr start ca n
c gi tr >= v tr end ca X. Nhng phn t c cng v tr start s c add vo
trong khi c gn nhn l start. Tt c cc khi ny cng nm trong mt khi
gi l khi bn phi.
3.3.2.2.2. Sinh chui kt hp da trn cc phn t phn loi
Vi u vo l tp cc phn t CHART c phn loi cn thn, chng ta
s bt u cng on sinh chui. Thut ton lelightwin dng sinh chui bao gm
Sinh vin thc hin: L Quang Thng-20062965-Kha K51-Lp Cng ngh Phn mm B

32

cc bc: sinh dy kt hp tri, sinh dy kt hp phi v cui cng l sinh chui.


Sau y, ta s i vo m t tng cng on.

Hnh 3-9. Cc cng on ca thut ton sinh chui.

Cng on u tin l cng on sinh ra tt c cc chui con kt hp tri ca


X (leftChain), c th c m t nh sau :
- Truy xut n tp cc phn t kt hp tri ca X (tc l tp cc phn t
c v tr end = v tr start ca X) s dng d liu c phn loi trn.
C mi mt phn t trong tp ny s sinh ra mt chui kt hp tri ca X.
- Lp quy bc trn vi cc phn t kt hp tri ca X.
Theo mt cch khc, thut ton sinh ny gn ging nh thut ton duyt
cy vi X l nt gc. Khi i n nt Y no th sinh ra chui l ng i t
Y n X tr ng phn t X ra.
V d minh ha:

Hnh 3-10. V d minh ha thut ton sinh chui con tri.


Sinh vin thc hin: L Quang Thng-20062965-Kha K51-Lp Cng ngh Phn mm B

33

X l nt gc, truy cp n tp phn t kt hp tri ca X gm A v B, A


v B li truy cp n tp phn t kt hp tri ca mnh, c th chng ta c
gii thut duyt cy nh hnh trn. Tt c cc chui c sinh ra bi hm
sinh chui u c lu vo trong tp leftChain ch x l.
Tip theo l sinh ra tt c cc dy kt hp phi ca X (rightChain), c th
c m t nh sau :
- Truy xut n tp cc phn t kt hp phi ca X (tc l tp cc phn t
c v tr start = v tr end ca X). C mi mt phn t trong tp ny s
sinh ra mt chui kt hp phi ca X.
- Lp quy bc trn vi cc phn t kt hp phi ca X.
Ngc vi thut ton sinh tri, thut ton sinh phi cng duyt cy vi X
l nt gc nhng khi duyt n nt Y th chui c sinh ra li l ng i
t X n Y ngoi tr phn t X.
Sau hai bc trn, ta thu c mt tp leftChain gm cc dy kt hp
tri ca X, rightChain gm cc dy kt hp phi ca X. Kt hp leftChain, X,
rightChain to thnh mt t hp chui Chain hon chnh (Chain lc u l
rng).
u tin l chui kt hp tri ca X:
for (mi chui left trong tp leftChain)
to ra mt chui chaini =[left X];
chain.add(chaini);
endfor;
Sau ti lt cc chui kt hp phi ca X c to ra:
for (mi chui right trong tp rightChain)
to ra mt chui chainj =[X right];
chain.add(chainj);
endfor;
Cui cng, phc tp nht l trng hp cc chui tri phi kt hp ln ln:
for (mi chui right trong tp rightChain)
for (mi chui left trong tp leftChain)
to ra mt chui chaink =[left X right];
chain.add(chaink);
endfor;
endfor;
Sau khi hon thnh xong c 3 cng on trn, chng ta s thu c tt c cc
chui kt hp ca ng c vin X vi cc phn t trong CHART. Thut ton kt thc
thnh cng.

Sinh vin thc hin: L Quang Thng-20062965-Kha K51-Lp Cng ngh Phn mm B

34

3.3.2.3. Thut ton lelightwin Prunning


Nh n phn tch, gii thut lelightwin c xut ra l tng tc ,
lm gim s lng t hp bng n mi bc lp ca thut ton A*. Tuy nhin,
nu ch dng li nhng m t trn, gii thut lelightwin cha hn ti u v
mt tc v thi gian x l cho mi bc l kh lu do phi xt tt c cc chui kt
hp ca X. Trong khi , xt trn thc nghm trung bnh ch c khong 8% s
lng cc chui c to ra l kt hp c bng lut c php to ra phn t
mi. V mt ny, gii thut lelightwin v gii thut s dng bin wait c mt im
chung ging nhau: l s tha thi. Nu nh gii thut s dng bin wait b tha
qu nhiu v s lng t hp sinh ra th gii thut lelightwin li tha qu nhiu v
s lng chui phi xt. Chnh iu ny khin cho gii thut lelightwin khng
nhng khng nhanh hn gii thut wait mt cch ng k m cn chm hn rt
nhiu trong nhng trng hp s bc lp ln n 400-500. Thm vo , vic to
ra qu nhiu b nh lu tr cc chui c th gy trn b nh, v c nhng trng
hp s lng chui c th ln n vi t!!!!
gii quyt vn ny, n phi suy ngh rt nhiu v quyt nh
xut ra thm mt ci tin cho gii thut lelightwin, l gii thut lelightwin s
dng ct ta. Nu nh gii thut lelightwin c bn xt n tt c cc chui kt hp
c th c ca X gy ra s tha thi khng cn thit th gii thut lelightwin ct ta c
th ti u c s tha thi ny. Thay v phi x l tt c cc chui kt hp c th
c ca X, lelightwin prunning s ch xt va s chui kt hp c thnh
phn t mi, nhng nhnh dn n chui khng th kt hp thnh phn t mi s b
ct ta.
Gii thut prunning ca lelightwin bao gm hai giai on nh sau:

Giai on hun luyn thng k: y l giai on rt quan trng quyt nh


cho vic ct ta c thnh cng hay khng. Nh vo giai on hun luyn ny
m trong qu trnh duyt cy, gii thut lelightwin c th quyt nh nn ct
ta mt nhnh hay khng.
Giai on thc hin thut ton.

3.3.2.3.1. Hun luyn cho b ct ta ca lelightwin


Di y l mt s k hiu m phn ny s s dng ngi c tin theo
di:
Rchain tp cc lut c php c cha chui chain trong v phi ca tp lut.
Fchain tp cc lut c php c chain ng u trong v phi ca tp lut.
Gii thut lelightwin c bn sinh chui kt hp ch da trn v tr tng ng
ca cc phn t nhng khng h tnh n nhn ca cc phn t . V d, vi mt
phn t PP th ng trc n trong cc lut c php c th l nhng phn t no?
ng sau l nhng phn t no? Gii thut cha h xt n iu ny.
gii quyt vn ny, vi mi mt nhn t loi X xut hin trong tp lut
c php, chng trnh s to ra mt tp thng k cho nhn bao gm tp cc nhn
Sinh vin thc hin: L Quang Thng-20062965-Kha K51-Lp Cng ngh Phn mm B

35

t loi c th xut hin bn tri v bn phi ca X, c lu theo cu trc gn nh


cy phn nhnh, tm gi l cy phn cp d liu. C th nh sau:

Hnh 3-11. Hnh nh cy phn cp d liu chnh.

y l mt cy lu tr d liu v cc k hiu nm trc X trong tp lut c


php gi l cy phn cp d liu chnh. Sau y l cc bc to ra cy ny:

u tin xt tp R[X]. Hai nt con ca X gm A v B c sinh ra vi t cch


l tp nhng k hiu nm ngay trc X trong v phi ca cc lut R[X].
Hai nt con C v D ca A cng c sinh ra vi t cch l tp tt c nhng
k hiu nm ngay trc A trong v phi ca cc lut R[A X]. Tng t nh
vy, H v I l tp tt c nhng k hiu nm ngay trc B trong v phi ca
cc lut R[B X].
Thc hin quy thut ton cho cc nt mc di cho n khi tt c cc
nt mc thp nht u khng cn k hiu ng trc trong tp lut tng
ng na.

Trong cy c mt thng s rt quan trng dnh cho mi nt, l thng s


first. Thng s first ca nt C bng true thng bo rng trong tp lut c php c tn
ti t nht mt lut c php m trong v phi ca n, C ng u.
Vi mi mt nt c first = true trong cy lu tr trn s c mt cy lu tr
tng ng lu cc phn t c th nm bn phi ca X gi l cy phn cp d
liu con. V d vi nt D mu trng trong hnh minh ha trn chng hn. V tr ca
n trong cy thng bo cho ta bit rng n thuc trng hp nhng lut c php c
v phi bt u bng (D A X). Ta gi tp hp nhng lut ny l tp F[D A X]. Ta cng
s xy dng cy lu cc phn t bn phi ca X trong tp F[D A X] nh trn. C
th, gi s cy phn tch ca nt D c dng nh sau:

Sinh vin thc hin: L Quang Thng-20062965-Kha K51-Lp Cng ngh Phn mm B

36

Hnh 3-12. Hnh nh cy phn cp d liu con.

X gm c hai nt con K v L l tp nhng k hiu nm ngay sau X trong tp


F[D A X]. Thut ton lc ny din ra tng t vi trng hp ca cy phn cp d
tri. M v N l nhng k hiu nm ngay sau L trong tp F[D A X L].
Ging nh trng hp ca cy d liu tri, cy d liu phi ca X cng c
bin logic last cho mi nt thng bo cho ta bit nt c phi l nt ng cui
trong v phi ca t nht mt lut trong tp lut c php khng? Nu mt nt bt k
tha mn iu kin ny th chui sinh ra tng ng vi n s l v phi ca t nht
mt lut trong tp lut c php. V d nh nt M c last = true, n thng bo cho
ta bit rng chui tng ng vi M l [D A X L M] l v phi ca mt lut trong tp
c php.
Th nn vi nhng nt ny, ta s duyt tp lut tm ra nhn v tri tng
ng vi chui ca nt v cc nhn v tri ny s c lu vo trong mt tp GIFT
nh mt thnh phn ca nt.
V d:
Nt c nhn l VP bin last = true, chui tng ng vi VP l [NP VP].
Trong tp lut c php c hai lut c v phi nh trn :
S NP VP
SQ NP VP
Th suy ra tp GIFT ca VP s gm S v SQ.
Trong tp lut c php 938 lut ca chng ta c khong 60-70 nhn t loi, v
vy tng ng vi con s , chng ta s thu c 60-70 cy phn cp d liu.
Cng on chun b d liu cho vic ct ta xong, sau y chng ta s trnh by
cng on tip theo v cng l cng on chnh ca phng php lelightwin
prunning.
3.3.2.3.2. Thc hin qu trnh ct ta trong gii thut lelightwin
Vn tip nhn u vo l tp d liu CHART c phn loi, vn tin hnh
gii thut lelightwin nh bnh thng. Ch khc mt iu l cc nt khng ng s
Sinh vin thc hin: L Quang Thng-20062965-Kha K51-Lp Cng ngh Phn mm B

37

b ct ta i. Gi s ng c vin l X (phn bit vi X phn trc) , cn kt hp


vi cc phn t trong CHART, gii thut ca chng ta nh sau :
Hm chnh
Gi hm lelighwin(X, Tree(X)).
Hm Tree(phn_t E)
Tr v cy phn cp d liu tng ng vi nhn ca E.
Hm lelightwin(phn_t E, cy T)
Kim tra xem bin first ca E c bng true khng? Nu c th thc
hin hm lelightwin_sub(E, subTree(E, T)).
Truy cp n tp phn t kt hp tri ca E tp CHART c
phn loi, nhng phn t no khng phi nt con ca E trong cy T s
b ct ta.
Vi tng phn t Z trong tp cc phn t cha b ct ta, gi quy
hm lelightwin(Z, T).
Hm subTree(phn_t E, cy T)
Tr v cy phn cp d liu con tng ng vi E trong cy phn cp
d liu T.
Hm lelightwin_sub(phn_t E, cy T)
Kim tra xem trong T, E c last = true khng? Nu c th add tp
GIFT ca E vo AGENDA vi v tr start v end thch hp.
Truy cp n tp phn t kt hp phi ca E tp CHART c
phn loi, nhng phn t no khng phi nt con ca E trong cy T s
b ct ta.
Vi tng phn t Y trong tp cc phn t cha b ct ta, gi quy
hm lelightwin_sub(Y, T).

3.3.3. Thut ton A* kt hp vi thut ton lelightwin prunning trong


phn tch c php ting Vit
trn ta trnh by cch thc thc hin thut ton lelightwin vi ci tin
ct ta. By gi ta s tin n bc tip theo l tch hp thut ton ny vo thut
ton A* dnh cho phn tch c php.
Cc bc thc hin thut ton:

Bc 1: Khi to u vo cho b phn tch c php. Cu vn bn u vo


c tch ra lm n t, mi t wi s c gn mt nhn Li, v tr bao ph
trong cu ca wi: start=i, end=i+1. Tc l chng ta s c mt tp n phn t

Sinh vin thc hin: L Quang Thng-20062965-Kha K51-Lp Cng ngh Phn mm B

38

<Li,i,i+1>, tp ny s c lu trong AGENDA phc v cho bc tip


theo ca thut ton.
Bc 2: Tnh ton c lng cho mi phn t, ly ra phn t c u tin cao
nht xt, gi phn t ny l candidate.
Bc 3: Nu candidate vn cha tn ti trong CHART th:
Thc hin hm lelightwin(candidate, Tree(candidate)).
Thm candidate vo trong tp CHART.
Bc 4: lp li bc ny cho n khi trong AGENDA khng cn phn t
xt hoc tm c phn t ch <S,1,n+1> trong CHART.

3.4. Nhn xt thut ton A* - lelightwin prunning


u im:

Tc nhanh (gp khong 3,4 ln so vi gii thut A* lelightwin thng


thng khi p dng cho phn tch c php) do s lng bc phi i qua t
v khng b bng n t hp qua mi bc. V nht l vic x l ti u cc
trng hp sinh chui nh thut ton ct ta.
Khng s dng phn t khng hon chnh (phn t vi bin wait) phn
tch nn vic nh gi u tin gia cc nt s tr nn chnh xc hn v cc
nt c xt u tin lc ny l nt hon chnh tht s, ng thi vic tnh
inside outside tr nn n gin do khng phi xt thm trng hp nt o.
Nhc im:

Tn kh nhiu dung lng b nh do phi lu tr tt c cc chui sinh ra


trong qu trnh phn tch.
Cn huy ng mt lng d liu lu tr hu cn rt ln.
S lng t hp ghp ni cng nhiu th thi gian cho mi bc cng tng
ln.
Ngoi ra, thut ton c mt nhc im l qu phc tp ci t v kim
th.

3.5. Kt chng
Nh vy, gii quyt nhng kh khn ca bi ton phn tch c php ting
Vit, n thc hin c nhng cng vic sau

xut s dng thut ton CYK-Beam search da trn m hnh PCFG


nhm tng tc cho b phn tch c php.

xut ra vic s dng thut ton A* gii quyt nhng kh khn ca


thut ton CYK-Beam search c th ci tin ti a u ra ca b phn tch c
php m vn c th m bo cho b phn tch v mt tc .

a ra c 3 cch tnh hm u tin cho A*

Sinh vin thc hin: L Quang Thng-20062965-Kha K51-Lp Cng ngh Phn mm B

39

o Khng s dng outside vi thut ton Dijkstra.


o S dng cch tnh c lng bng outside mt tng.
o S dng phng php tnh c lng bng cch rt gn tp lut.

c bit l na cui chng 3, n trnh by v mt tng do chnh


n sng to ra: thut ton lelightwin ct ta. Thut ton ny nhm mc
ch gip ci tin tc ca b phn tch c php ln mt mc cao hn.
Hin ti do gii hn trong thi gian lm n, nn chng trnh mi ch xy
dng c thut ton lelightwin mc c bn, cha c ct ta. Nhng n
c trnh by rt r rng n tng bc thc hin ca thut ton
lelightwin ct ta. Chnh v vy, vic xy dng thut ton ny trong tng lai
l chc chn c th.

Sau y, chng ta s i su phn ci t chng trnh Chng 4 Pht trin,


th nghim v nh gi h thng phn tch c php ting Vit.

Sinh vin thc hin: L Quang Thng-20062965-Kha K51-Lp Cng ngh Phn mm B

40

CHNG 4. PHT TRIN V TH NGHIM NH


GI H THNG
Chng ny cp ti vic cy dng chng trnh phn tch c php ting
Vit, nhng vn gp phi v cch gii quyt. Chng ny bao gm:
Phn tch v thit k chng trnh.
T chc lu tr d liu.
Ci t chng trnh.
Th nghim v nh gi chng trnh.

4.1. Phn tch v thit k chng trnh phn tch c php tng
hp ting ni ting Vit
4.1.1. M hnh tng th
M hnh ca h thng phn tch c php p dng cho tng hp ting ni ting
Vit c th c m t mt cch tng qut nh sau:

Hnh 4-1. M hnh tng th ca h thng.

Sinh vin thc hin: L Quang Thng-20062965-Kha K51-Lp Cng ngh Phn mm B

41

Phn tch h thng:


u vo: mt cu vn bn chun ha bt k c tin x l vn bn;
u ra: cy phn tch c php.
Cc chc nng ca h thng
Qun l lut: L chc nng gip cho b phn tch c php c th
d dng giao tip vi tp lut c php v ng thi tnh ton lun
xc sut PCFG cho mi lut c php.
Phn tch c php: y l chc nng quan trng v mu cht
nht, tip nhn u vo ca tt c cc chc nng trn, s dng
gii thut phn tch cho ra cy phn tch c php.
C s d liu
RuleSet: Tp lut c php ting Vit.
VietTreeBank: l tp d liu mu vi hn 2000 cu c
phn tch c php cn thn bng tay.

4.1.2. M t cc chc nng trong phn tch c php


4.1.2.1. Tin x l vn bn
4.1.2.1.1. Tch t
Tch t l cng on rt quan trng trong phn tch c php, tip nhn trc
tip u vo ca ton b h thng. Nhim v ca h thng tch t l vi mt cu
u vo phi tch ra c thnh cc t ting Vit(khi nim t c m t
chng 2). Vi s a dng v phong ph ca t ting Vit th c th ni rng,
nhim v ny khng h n gin.
V d vi cu hm nay ti n trng bng xe my s c tch thnh hm
nay, ti, n, trng, bng, xe my.
Hin ti Vit Nam cng c khng t nghin cu v h thng tch t dnh
cho ting Vit cho kt qu rt kh quan, c bit trong s ni tri ln b tch
t VNTokenzier thuc nhnh ti x l vn bn ting Vit do gio s H T
Bo ng u. VNTokenizer l h thng tch t s dng kt hp gia t in ting
Vit v m hnh ngn ng ngram, trong m hnh ngram c hun luyn s
dng kho d liu hun luyn khng l (70,000 cu c tch t bng tay).
chnh xc ca h thng tch t ny t ti 97%. Sau khi th nghim vi rt nhiu
loi vn bn phc tp, n quyt nh s dng VNTokenzier cho b tin x l vn
bn thay v xy dng mt b tch t ring.
4.1.2.1.2. Gn nhn
Cng vi h thng tch t, h thng gn nhn t loi ting Vit cng t ra
mt cu hi hc ba khng km cho chc nng tin x l vn bn ca h thng phn
tch c php. Nhim v ca h thng gn nhn l tip nhn cu c tch t v
gn nhn t loi cho mi t trong cu. Nhn t loi ting Vit vn phong ph li
cn c gn ty vo hon cnh, vi mt t ting Vit trong trng hp A c th
Sinh vin thc hin: L Quang Thng-20062965-Kha K51-Lp Cng ngh Phn mm B

42

gn nhn l danh t, trng hp B li l ng t. Phc tp nh vy nn nhim v


ca chc nng gn nhn t loi so vi b tch t khng h n gin hn.
u vo: hm_nay ti n trng bng xe_my
u ra: hm_nay N(danh t), ti P(i t), n V(ng t), trng N(danh t),
bng C(gii t), xe_my N(danh t).
Tuy nhin, cng ging nh h thng tch t, h thng gn nhn t loi ting Vit
cng c nghin cu rt nhiu Vit Nam. Trong tm hiu bit ca mnh,
n tm hiu v bit n hai h thng tch t cho kt qu rt kh quan. l h
thng tch t VnTagger cng thuc nhnh ti x l vn bn ting Vit do gio
s H T Bo ch tr v h thng vnqtag ca L Hng Phng. Qua thc nghim
trn d liu l cc bi vn su tm trn mng, n nhn thy b VnTagger cho kt
qu kh quan hn nhng b VnTagger s dng mt tp nhn t loi c phn hi
khc bit so vi tp nhn t loi m tp lut c php s dng. Do khng kp thi
gian config li tp nhn cho thch hp, nn n quyt nh s s dng lun b
vnqtag ca L Hng Phng thc hin chc nng gn nhn t loi cho b phn
tch c php. Bn thn b gn nhn t loi vnqtag cng cho chnh xc rt cao,
trn 90% nn la chn ny l hon ton c th chp nhn c.
4.1.2.2. Qun l lut (Rule Manager)
Chc nng qun l lut c hai nhim v chnh l load file d liu cha tp lut
c php v tnh ton xc sut PCFG cho mi lut da vo tp VietTreeBank.
4.1.2.2.1. Load file d liu cha tp lut
Bng 4-1. Bng m t ca chc nng load d liu

u vo: file d liu cha tp lut c php c lu trn my.


u ra: Tp lut c php c lu tr bng cu trc d liu bng bm.
Cch thc hin:
Truy cp file ly ra cc lut c php c lu trong file.
Vi mi lut c c, y lut vo khi d liu ca bng bm vi
kha l v phi ca lut, nhng lut c cng v phi s nm trong cng
mt khi d liu.
4.1.2.2.2. Tnh xc sut PCFG cho mi lut
y l chc nng tnh ton cc thng s cho mi lut da vo tp d liu hun
luyn mu. Chc nng ny cn phi s dng mt lng d liu mu ln cho
tham s PCFG tr nn chnh xc tng i.
Bng 4-2. Bng m t ca chc nng tnh xc sut PCFG

u vo: tp d liu VietTreeBank


u ra: tp cc xc sut tng ng cho mi lut c php ca vn phm phi ng
cnh.
Cch thc hin: Chi tit cc bc thc hin ca chc nng ny nh sau :
Bc 1: Duyt qua ln lt cc cy phn tch trong VietTreebank,
Sinh vin thc hin: L Quang Thng-20062965-Kha K51-Lp Cng ngh Phn mm B

43

chuyn cy v cu trc ph hp c th d dng tnh c s ln s


dng cho mi lut.
Bc 2: Vi mi mt cy, tin hnh m s ln s dng ca mi lut
trong cy v cp nht li con s ny vo bng bm lut c php.
Bc 3: Sau khi duyt ht tt c cc cy trong Treebank, ta thu c tp
lut c php vi s ln xut hin trong TreeBank. Tin hnh tnh xc sut
cho mi mt lut theo dng:
P ( N j )

C( N j )
C( N j )

Trong : C(*) l s ln m c ca lut *.


Bc 4: Sau khi c c tt c cc lut vi tham s PCFG tng ng,
lu tt c thng tin ra mt file.
4.1.2.3. Phn tch c php
Nh trnh by chng 3, gii thut chnh m n s s dng cho chc
nng phn tch c php ca h thng chnh l gii thut A*. Nhng n cng s
xy dng c gii thut CYK-Beam search nh mt phng n th hai v tin so
snh gia hai gii thut ny.
Bng 4-3. Bng m t ca chc nng phn tch c php

u vo:
- Cu u vo sau khi c tch t v gn nhn.
- Cc cy phn cp d liu ca cc nhn c lu trong file.
- Bng bm cha tp lut c php ting Vit c tnh xc sut da vo
VietTreeBank.
u ra: mt cy phn tch c php ca cu u vo.
Cch thc hin: C hai cch thc thc hin chc nng ny:
S dng gii thut phn tch c php A*.
S dng gii thut CYK beam search.

4.1.3. T chc lu tr d liu


H thng phn tch c php trong tng hp ting ni ting Vit, phn d liu
bao gm: d liu vn bn chun ha u vo, d liu u ra ca b tin x l vn
bn, tp lut c php, kho d liu VietTreeBank v d liu u ra.
4.1.3.1. D liu vn bn chun ha u vo
Nh gii thiu chng 1, v tr ca b phn tch c php trong h thng
tng hp ting ni ting Vit l ngay sau b chun ha vn bn. Th nn d liu u
vo ca phn tch c php s l vn bn sau khi c x l chun ha. D liu
chun ha c lu di dng xml vi cc gi tr trong th tng ng vi vn bn
chun ha. Sau y l cu trc xml ca d liu chun ha:
Sinh vin thc hin: L Quang Thng-20062965-Kha K51-Lp Cng ngh Phn mm B

44

Hnh 4-2. Cu trc ca d liu vn bn chun ha.

Mt v d v lu tr d liu chun ha ca vn bn: Tng cng ty VTC c tng


gim c mi. (Dn tr)-ng Nguyn Xun Cng, sinh ngy 10/11/1973, qu qun
thnh ph Vinh, Ngh An, va c b nhim l Tng Gim c Tng Cng ty
truyn thng a phng tin (VTC).

Hnh 4-3. V d minh ha v d liu chun ha vn bn.

4.1.3.2. D liu chuyn giao ca b tin x l vn bn


D liu ca b tin x l vn bn l cc t c tch ra t trong cu v cc
nhn t loi tng ng. Cu trc ca b d liu ny rt n gin, mi mt t c
gn nhn s c lu trn mt dng ca file vi cu trc nh sau:
Sinh vin thc hin: L Quang Thng-20062965-Kha K51-Lp Cng ngh Phn mm B

45

<w pos=(nhn t loi)>t</w>


Di y l v d minh ha v d liu lu tr ca b tin x l vn bn:
Bng 4-4. Minh ha d liu u ra ca b tin x l vn bn

<w pos="V">ng</w>
<w pos="N">trc</w>
<w pos="N">mt</w>
<w pos="P">ti</w>
<w pos="V">l</w>
<w pos="M">mt</w>
<w pos="P">g</w>
<w pos="N">n ng</w>
<w pos="A">cao</w>
<w pos="A">lu u</w>
4.1.3.3. Tp lut c php
Tp lut c php l d liu quan trng nht quyt nh rt nhiu n s thnh
cng ca h thng. Chnh v vy, vic lu tr n cng phi c t chc mt cch
hp l m bo thun tin cho qu trnh c ghi cng nh tc x l.
Nh gii thiu cc phn trc, tp lut c php ca h thng bao gm 938
lut c xy dng da trn tp d liu mu VietTreeBank v c s tinh chnh
cho ph hp vi b phn tch c php. Cu trc lu tr lut c php ca h thng l
mt file xml c dng:
<VietParserRuleSet>
<Rule id="{s th t ca lut}"
probability="{xc sut PCFG ca lut}">
<left>{v tri ca lut}</left>
<right>{v phi ca lut}</right>
</Rule>
</VietParserRuleSet>
4.1.3.4. Kho d liu VietTreeBank
Kho d liu mu VietTreeBank l mt tp d liu gm rt nhiu cu ting Vit
c phn tch c php chun xc bng tay. Tp d liu TreeBank rt hay c
dng trong nhng ng dng lin quan n x l ngn ng t nhin, m c bit rt
c ch trong vic xy dng nhng h phn tch c php cht lng cao. V vy, vi
mt h thng phn tch c php, c th ci thin cht lng u ra ca kt qu
phn tch th chc chn phi c mt TreeBank h tr cho ring mnh. Bn thn h
thng cng s dng mt tp d liu VietTreeBank c lu tr vi cu trc dng
cy. Mi mt cy phn tch c php s nm trong mt cp th ng m <s></s> v
s dng cp ngoc n () ngn cch gic cc nhn thnh phn ring bit. Cc
nhn con s nm trong () ca nhn cha.

Sinh vin thc hin: L Quang Thng-20062965-Kha K51-Lp Cng ngh Phn mm B

46

Hnh 4-4. Cu trc ca d liu VietTreeBank.

4.1.3.5. D liu u ra ca h thng


Cui cng l t chc lu tr d liu u ra ca h thng phn tch c php.
Cch t chc d liu u ra cng rt quan trng v giai on tng hp ting ni mc
thp v cc giai on khc sau s s dng u ra ny. Chnh v vy, h thng phi
t chc d liu u ra thch hp cc giai on sau c th d dng trong vic c
v s dng d liu.
Vn bn u vo c th c nhiu cu, mi mt cu c th c nhiu cch phn
tch c php c xp theo th t t cao nht n thp nht theo gi tr PCFG (vi
trng hp ca CYK-Beam search). Th nn, d liu u ra ca h thng s c t
chc nh sau:
-

Mi mt cu s c lu trong mt cp th <s></s>.
Trong mi mt cp th <s></s>, s nhiu cp th <parse></parse> tng
ng vi cc cch phn tch c php cho cu nm trong cp th <s></s>
ang c xt.
Trong cc cp th <parse></parse> l cch phn tch ca cu c lu tr
di dng cy. Cc th tng ng vi nhn con s c nm trong cc th
tng ng vi nhn cha.

Sau y l minh ha v m hnh d liu u ra vi trng hp ca mt cu vn


bn nh sau: Nguyn nhn mt phn, do ng lc n trng khng c.
Sinh vin thc hin: L Quang Thng-20062965-Kha K51-Lp Cng ngh Phn mm B

47

Hnh 4-5. M phng d liu u ra ca h thng.

Ngoi ra, ta c th thy vi mi mt th trong cy phn tch c php c thm


thuc tnh level, l thuc tnh ch ra mc ca nt trong cy. Thuc tnh ny nhm
h tr cc giai on sau trong vic mun tm kim nhanh mt cm t mc no .

4.2. Xy dng h thng


4.2.1. Cng c la chn
Hin nay, vi s pht trin ca ngnh CNTT, c rt nhiu cc cng ngh c
la chn pht trin mt ng dng. Trong s , ni bt hn c l Java v .Net.
V mt pht trin nhanh mt ng dng vi giao din thn thuc, d dng th .Net c
phn nhnh hn. Tuy nhin, n thin v hng nghin cu nn cn phi ti s
dng, chnh sa v chy th trn nhiu nn khc nhau. Thm ch hng pht trin
ca n l s a h thng ln web v in thoi di ng. V nhng mt trn th
Java thc hin tt hn .Net. Ngoi ra, h thng c to ra nhm mc ch phc v
cho cng ng khng mang tnh thng mi nn vic la chn hng i theo m
ngun m s l thch hp.
Nh vy, cng ngh m n la chn s l Java vi mi trng pht trin l
NetBean IDE 7.0.

Sinh vin thc hin: L Quang Thng-20062965-Kha K51-Lp Cng ngh Phn mm B

48

4.2.2. Biu lp

4.2.3. Thit k chi tit lp


Chng trnh phn tch c php c xy dng vi rt nhiu gi v cc lp rt
phc tp nn n s khng nu ra tt c cc lp v cc phng thc c s dng
m ch tp trung m t nhng gi chnh ca chng trnh.
4.2.3.1. Gi RuleManager
y l gi c s dng lm cng vic qun l v thac tc vi lut c php,
gi ny gm c hai lp chnh l Rule v RuleSet.
4.2.3.1.1. Lp Rule

Sinh vin thc hin: L Quang Thng-20062965-Kha K51-Lp Cng ngh Phn mm B

49

y l class c s dng qun l thng tin v mt lut c php, m phng


mt kiu i lng lut c php ting Vit.
Bng 4-5. Bng m t ca lp Rule

Phm vi: private


Kiu: String
Tn: left
Phm vi: private
Kiu: int
Tn: count
Thuc Phm vi : private
tnh
Kiu: String
Tn: right
Phm vi: private
Kiu: float
Tn: prob
Phm vi: public
Kiu: String
Tn + tham s: getLeft()
Phm vi: public
Kiu: void
Tn+tham s: setLeft(String left)
Phm vi: public
Kiu: int
Tn+tham s: getCount()
Phm vi: public
Kiu: void
Phng Tn+tham s: setCount(int count)
thc
Phm vi: public
Kiu: float
Tn+tham s: getProb()
Phm vi: public
Kiu: void
Tn+tham s: setProb(float prob)
Phm vi: public
Kiu: String
Tn+tham s: getRight()
Phm vi: public
Kiu: void
Tn+tham s: setRight(String right)

V tri ca lut
S ln xut hin ca lut
trong tp VietTreeBank
V phi ca lut
Xc sut PCFG ca lut
Tr v gi tr ca bin left
Gn gi tr cho bin left
Tr v gi tr bin count
Gn gi tr cho bin count
Tr v bin prob
Gn gi tr cho bin prob
Tr v gi tr ca bin right
Gn gi tr cho bin right

4.2.3.1.2. Lp RuleSet
L lp qun l ton b tp lut, cung cp cc phng thc ly lut
Bng 4-6. Bng m t ca lp RuleSet

Phm vi: Private


Kiu: ArrayList<AStarRule>

Mng lu cc lut

Sinh vin thc hin: L Quang Thng-20062965-Kha K51-Lp Cng ngh Phn mm B

50

Tn: arlRule
Phm vi: private
Kiu:
HashMap<String, ArrayList<AStarRule>>
Tn: htRule
Phm vi: private
Kiu: ArrayList<Integer>
Tn: arlTotal
Phm vi: public
Kiu: boolean
Phng Tn+tham s: readRule()
thc
Phm vi: public
Kiu: ArrayList<AStarElement>
Tn+tham s:
getCombine(ArrayList<AStarElement> as)
Thuc
tnh

Bng bm lu cc
lut vi kha l v
phi ca lut
Mng lu tng s
lut ca mi cm t
Truy cp v c ra
cc lut c php c
lu trong c s d
liu
Tr v v tri ca cc
lut c v phi l
chui as.

4.2.3.2. Gi phn t parsing Element


L gi qun l thng tin ca cc phn t trong qu trnh phn tch, bao gm
hai loi phn t tng ng vi hai thut ton A* v CYK-Beam search:
4.2.3.2.1. Lp CYK Element
y l lp qun l thng tin v phn t c s dng trong qu trnh phn tch
c php bng thut ton CYK Beam search.
Bng 4-7. Bng m t ca lp Element

Thuc
tnh

Phm vi: Private


Kiu: String
Tn: sLeft
Phm vi: private
Kiu: String
Tn: sWait
Phm vi: private
Kiu: int
Tn: iPos1
Phm vi: private
Kiu: int
Tn: iIndex1, iIndex2
Phm vi: private
Kiu: float
Tn: in_side
Phm vi: private
Kiu: float
Tn: out_side
Phm vi: private
Kiu: String

Lu tr nhn t loi
trong phn tch CYK.
Lu bin wait tng
ng vi phn t trong
phn tch CYK.
Lu v tr ca u
tin to ra phn t.
Lu ch s ca hai
thnh phn to ra
phn t.
Xc sut inside ca
phn t.
Xc sut outside ca
phn t.
Lu thnh phn cui
cng trong lut phn

Sinh vin thc hin: L Quang Thng-20062965-Kha K51-Lp Cng ngh Phn mm B

51

Tn: sLast

tch

Phng
thc
Tt c cc phng thc get v set ca cc thuc tnh trn
4.2.3.2.2. Lp Cell
y l lp qun l thng tin v mt trong qu trnh phn tch c php
CYK- Beam search.
Bng 4-8. Bng m t ca lp Cell

Thuc
tnh

Phm vi: Private


Kiu: ArrayList<Element>
Tn: arlElem
Phm vi: public
Kiu: ArrayList<Element>
Tn+tham s: getArrElem()
Phm vi: public
Kiu: Element
Tn+tham s: getElement(int i)
Phm vi: public
Kiu: void
Phng Tn+tham s: removeElement(int i)
thc
Phm vi: public
Kiu: int
Tn+tham s: size()
Phm vi: public
Kiu: int
Tn+tham s: sizeUsed()
Phm vi: public
Kiu: void
Tn+tham s: add(Element elem)
Phm vi: public
Kiu: void
Tn+tham s: addWord(Word w)

Lu tr tp cc phn
t c cha trong .
Tr v tp arlElem
ca .
Tr v phn t th i
trong .
Xa phn t th i ra
khi .
Tr v s phn t
cha trong .
Tr v s phn t
cha b ct ta bi
beam search trong .
Thm phn t elem
vo .
Thm phn t c cha
t w vo trong .

4.2.3.2.3. Lp AstarElement
Nh tn gi ca mnh, lp ny qun l thng tin v phn t s c s
dng trong qu trnh phn tch c php bng gii thut A*.
Bng 4-9. Bng m t ca lp AstarElement

Phm vi: Private


Kiu: String
Tn: sCar
Phm vi: private
Kiu: int
Tn: start

Lu tr nhn t loi
ca phn t.
Lu v tr bt u ca
phn t trong cu.

Sinh vin thc hin: L Quang Thng-20062965-Kha K51-Lp Cng ngh Phn mm B

52

Thuc
tnh

Phm vi: private


Kiu: int
Tn: end
Phm vi: private
Kiu: ArrayList<AStarElement>
Tn: subElement

Lu v tr kt thc
ca phn t trong
cu.
Lu cc phn t
thnh phn to ra
phn t trong qu
trnh phn tch.
Xc sut inside ca
phn t.

Phm vi: private


Kiu: float
Tn: in_side
Phm vi: private
Kiu: float
Tn: out_side
Phm vi: private
Kiu: ArrayList<AStarElement>
Tn: outsideElem
Phm vi: private
Kiu: float
Tn: Prob

Xc sut outside ca
phn t.
Lu cc gi tr
tnh xc sut outside

Lu xc sut ca lut
m element s dng
sinh ra cc
element thnh phn.
Tt c cc phng thc get v set ca cc thuc tnh trn
Phng Phm vi: public
Hm lm nhim v
thc
Kiu: void
thit lp mt danh
Tn+tham s:
sch phn t hp
contract( AStarRule rule,
thnh cho phn t
ArrayList<AStarElement> as)
ang c xt vi
mt lut c php.
Phm vi: public
Tr v t loi tng
Kiu: String
ng vi nhn trong
Tn+tham s: getWord()
trng hp phn t l
nt l.
Phm vi: public
Kim tra xem phn
Kiu: boolean
t ang xt c phi l
Tn+tham s: isWord()
nt l t loi khng
4.2.3.3. Gi x l chung Common
L mt gi ph trong h thng dng thc hin nhng x l thng thng.
4.2.3.3.1. Lp PartOfSpeech
Lp PartOfSpeech c s dng nh x t cc k hiu ting Anh sang ting
Vit, c s dng trong vic dng cy.
Bng 4-10. Bng m t ca lp ParOfSpeech

Phm vi: private


Kiu: ArrayList<String>
Tn: arlPos

Lu tr cc k hiu ting Anh (t,


cm t, cu)

Sinh vin thc hin: L Quang Thng-20062965-Kha K51-Lp Cng ngh Phn mm B

53

Thuc
tnh

Phm vi: private


Kiu: ArrayList<String>
Tn: arlName
Phm vi: private
Kiu: int
Tn: iPos
Phm vi: public
Kiu: String
Tn+tham s:
Phng
getName(String sPos)
thc
Phm vi: public
Kiu: String
Tn+tham s:
getPos(String sName)

Lu tr cc tn ting Vit tng


ng.
Lu tr s lng cc k hiu
Ly tn ting Vit ca k hiu u
vo.
Ly k hiu ting Anh ca tn
ting Vit u vo.

4.2.3.3.2. Lp Functions
Lp ny bao gm cc hm thc hin vic kim tra, bao gm kim tra xem t
u vo c phi l danh ring, s t hay khng.
Bng 4-11. Bng m t ca lp Functions

Thuc
tnh

Lp ny khng c thuc tnh g ni bt


v ch l lp x l cc vic ph
Phm vi: public
Kiu: boolean
Tn+tham s:
Phng
isDanhTuRieng(String strParam)
thc
Phm vi: public
Kiu: boolean
Tn+tham s:
isSymbol(String strParam)
Phm vi: public
Kiu: boolean
Tn+tham s:
isSotu(String strParam)

Kim tra xem xu u vo c


phi l mt danh t ring hay
khng
Kim tra xem xu u vo c
phi l mt k hiu c bit
hay khng
Kim tra xem xu u vo c
phi l mt s t hay khng

4.2.3.4. Gi Analysis
4.2.3.4.1. Lp CYKBeamSearch
y l lp thc hin phn tch c php bng gii thut CYK kt hp ct ta
beam search.
Bng 4-12. Bng m t ca lp CYKBeamSeach

Phm vi: private


Kiu: String
Tn: sentence
Phm vi: private
Kiu: float
Tn: fRatio

Cu u vo.
T l loi b trong ct ta beam
search.

Sinh vin thc hin: L Quang Thng-20062965-Kha K51-Lp Cng ngh Phn mm B

54

Phm vi: private


Kiu: RuleSet
Tn: ruleS
Phm vi: private
Kiu: DefaultMutableTreeNode
Tn: rootNode
Phm vi: private
Kiu: ArrayList<String>
Thuc Tn: specialWord
tnh
Phm vi: private
Kiu: ArrayList<Integer>
Tn: specialIndex
Phm vi: private
Kiu: ArrayList<String>
Tn: arlTree
Phm vi: private
Kiu: ArrayList<Word>
Tn: arlWord
Phm vi: private
Kiu: ArrayList<Integer>
Tn: arlSize
Phm vi: private
Kiu: ArrayList<Cell>
Tn: table
Phm vi: public
Kiu: void
Tn+tham s:
Phng
CYK(float step)
thc
Phm vi: public
Kiu: void
Tn+tham s: run()
Pham vi: public
Kiu: void
Tn+tham s:
buildTree()
Phm vi: public
Kiu: void
Tn+tham s:
joinCell
(int iPos1, int iPos2,
Bounds bound1,
Bounds bound2 )

Tp lut c php.
L bin s nhn kt qu tr v t
b phn tch c php vi gi tr l
nt gc ca cy phn tch c php.
Mng lu cc t c bit, bao gm
danh t ring v s t.
Mng lu v tr cc du cu, s
dng trong vic thm nhn
Mng lu xu c c t cy c
php.
Mng lu tr cc t trong cu
Mng lu kch thc cc hng
trong bng CYK
Danh sch cc trong phn tch
CYK.
Thc hin phn tch CYK vi
ngng step.
L hm chnh thc hin cng vic
phn tch cu u vo.
Dng cy phn tch t bng CYK

Thc hin vic kt hp hai v


tr iPos1 v iPos2 trong bng
CYK.

4.2.3.4.2. Lp AStar
y l lp thc hin phn tch c php bng thut ton A* c s dng thut
ton lelightwin c bn.

Sinh vin thc hin: L Quang Thng-20062965-Kha K51-Lp Cng ngh Phn mm B

55

Bng 4-13. Bng m t ca lp AStar

Phm vi: private


Kiu: RuleSet
Tn: ruleS
Phm vi: private
Kiu: ArrayList<Word>
Tn: arlWord
Phm vi: private
Kiu: ArrayList<Integer>
Thuc Tn: specialIndex
tnh
Phm vi: private
Kiu:
HashMap<String, AStarElement>
Tn: chart
Phm vi: private
Kiu: ArrayList<AStarElement>
Tn: agenda
Phm vi: private
Kiu: DefaultMutableTreeNode
Tn: node
Phm vi: public
Kiu: void
Tn+tham s: run()
Phm vi: public
Kiu: ArrayList<AStarElement>
Tn+tham s:
getCombinedElements(AStarElement e)
Phm vi: public
Kiu: float
Tn+tham s:
Phng
calculateOutside(AStarElement cand)
thc
Phm vi: public
Kiu: AStarElement
Tn+tham s: candidate()
Phm vi: public
Kiu: DefaultMutableTreeNode
Tn+tham s:
getNode(AStarElement a)
Phm vi: public
Kiu: DefaultMutableTreeNode
Tn+tham s:
getNode()
Phm vi: public
Kiu: String
Tn+tham s:
key(AstarElement e)

Tp lut c php.
Mng lu tr cc t trong
cu.
Bin lu v tr cc du cu,
dng cho vic thm nhn
cm t.
Tp CHART c s dng
trong thut ton A*.
Tp AGENDA c s
dng trong thut ton A*.
Ging nh trng hp ca
CYK, y l bin lu gi kt
qu sau khi phn tch.
y l hm chnh trong class
thc hin phn tch AStar.
Hm tr v tp phn t c
to ra bi s kt hp ca
phn t e vi cc phn t
trong CHART.
Thc hin tnh outside cho
phn t cand.
Tr v phn t ng c vin
c c lng cao nht.
Tr v nhnh cy m a lm
nt gc trong cy phn tch
c php u ra.
Tr v bin node.

Tr v kha ca phn t e
trong AGENDA.

4.2.3.4.3. Lp LeLightWin
Sinh vin thc hin: L Quang Thng-20062965-Kha K51-Lp Cng ngh Phn mm B

56

y l lp thc hin gii thut lelightwin c bn, thc hin cng vic x l v
tr v mt tp cc chui kt hp ca phn t ng c vin vi cc phn t trong
CHART.
Bng 4-14. Bng m t ca lp LeLightWin

Phm vi: private


Kiu:
ArrayList<ArrayList<AStarElement>>
Tn: chain
Phm vi: private
Kiu:
ArrayList<ArrayList<AStarElement>>
Tn: leftChain
Thuc Phm vi: private
tnh
Kiu:
ArrayList<ArrayList<AStarElement>>
Tn: rightChain
Phm vi: private
Kiu:
HashMap<String, ArrayList<AStarElement>>
Tn: left
Phm vi: private
Kiu:
HashMap<String, ArrayList<AStarElement>>
Tn: right
Phm vi: public
Kiu: void
Tn+tham s:
leftProcess(AStarElement elem)
Phm vi: public
Kiu: void
Tn+tham s:
rightProcess(AStarElement elem)
Phm vi: public
Kiu: void
Phng Tn+tham s:
thc
classify(AStarElement e,
HashMap<String, AStarElement> ea)
Phm vi: public
Kiu: void
Tn+tham s:
generateSubChain(AStarElement e)
Phm vi: public
Kiu:
ArrayList<ArrayList<AStarElement>>
Tn+tham s:

Bin lu kt qu
ca thut ton l cc
chui kt hp.
Dy kt hp tri ca
phn t ng c vin
Dy kt hp phi ca
phn t ng c vin
Tp phn t nm bn
tri ng c vin sau
khi c phn loi
Tp phn t nm bn
phi ng c vin sau
khi c phn loi
Hm lm nhim v
sinh ra leftChain ca
ng c vin bt u
ti v tr phn t elem
Hm lm nhim v
sinh ra rightChain ca
ng c vin bt u
ti v tr phn t elem
Hm lm nhim v
phn loi cc phn t
trong CHART v lu
vo hai tp left v
right.
Hm lm nhim v
to ra cc chui con
kt hp ca phn t
ng c vin t hai tp
left v right
Hm cui cng, l
hm chnh trong
class, lm nhim v
quan trng nht: sinh

Sinh vin thc hin: L Quang Thng-20062965-Kha K51-Lp Cng ngh Phn mm B

57

getLLWChain(AStarElement e,
ra tt c cc chui kt
HashMap<String, AStarElement> chart) hp ca ng c vin e
v lu vo chain.
4.2.3.4.4. Lp Sentence
y c th coi l lp chnh ca gi phn tch cu analysis. Lp ny s m
nhn nhim v lu tr thng tin v mt cu vn bn u vo, cung cp cc phng
thc phn tch c php cho cu da vo tt c cc lp trn.
Bng 4-15. Bng m t ca lp Sentence

Phm vi: private


Kiu: String
Tn: sentence
Phm vi: private
Kiu: ArrayList<String>
Thuc Tn: specialWord
tnh
Phm vi: private
Kiu: ArrayList<Integer>
Tn: specialIndex
Phm vi: private
Kiu: ArrayList<Word>
Tn: arlWord
Phm vi: public
Kiu: void
Tn+tham s: splitWord()
Phm vi: public
Kiu: void
Tn+tham s: callQTag()
Phm vi: public
Kiu: void
Tn+tham s: readWordFromFile()
Phm vi: public
Phng Kiu: void
thc
Tn+tham s: CYKParser()
Phm vi: public
Kiu: void
Tn+tham s: astarParser()
Phm vi: public
Kiu: DefaultMutableTreeNode
Tn+tham s: getAStarNode
Phm vi: public
Kiu: DefaultMutableTreeNode
Tn+tham s: getCYKNode

Bin lu cu vn bn u vo
bin lu cc t c bit: danh t
ring v s t
bin lu v tr cc du cu, s
dng trong vic thm nhn cm t
mng lu tr cc t trong cu
m nhn nhim v tch t cho
cu u vo. Kt qu tr ra file
text.txt.
c d liu t file text.txt v
thc hin gn nhn t loi. Kt qu
lu trong file wordTagged.txt
c d liu t file
wordTagged.txt
Thc hin phn tch CYK cho cu
u vo, c s dng kt hp beam
search.
Thc hin phn tch A* cho cu
u vo, c s dng kt hp thut
ton lelighwin c bn.
Tr v kt qu phn tch ca AStar
Tr v kt qu phn tch ca CYK

Sinh vin thc hin: L Quang Thng-20062965-Kha K51-Lp Cng ngh Phn mm B

58

4.3. Th nghim v nh gi
4.3.1. Giao din chng trnh
Giao din ca chng trnh phn tch c php ting Vit gm cc thnh phn
chnh nh sau:
Nt kch hot chc nng phn tch c php bng gii thut CYK-Beam
search, kt qu ca qu trnh phn tch s c hin th trong vng kt
qu phn tch ca CYK - Beam search.
Nt kch hot chc nng phn tch c php bng gii thut A*, kt qu
ca qu trnh phn tch s c hin th trong vng kt qu phn tch ca
Astar.
Bng danh sch cc cu cn phn tch: Vn bn u vo s c tch ra
thnh cc cu v hin th trn bng ny, mi dng tng ng vi mt cu.
Vng nhp vn bn u vo.
Cc vng hin th kt qu: hin th kt qu di dng cy, cp nht qu
trnh phn tch theo tng cy, mi khi c mt cy phn tch c th kt
qu s c cp nht.

Hnh 4-6. Giao din chnh ca chng trnh.


Sinh vin thc hin: L Quang Thng-20062965-Kha K51-Lp Cng ngh Phn mm B

59

4.3.2. Kt qu th nghim
4.3.2.1. Tp d liu th nghim
B d liu c dng th nghim hiu nng ca h thng phn tch c php
trong tng hp ting ni ting Vit gm c hai tp. Tp th nht gm 630 cu vn
bn c ly t vnspeechcorpus. Nhng cu vn bn ny u l nhng cu rt di
v lng nhng v mt c php. B d liu th hai th nghim chnh xc ca
b phn tch c php gm 200 cu c ly ra t b d liu VietTreeBank. Tp d
liu th nghim th hai hon ton c th chp nhn c v h thng phn tch c
php ca n cha h c hun luyn qua b d liu VietTreeBank m ch n
thun thng k xc sut PCFG, s th nghim ny s khng gy ra s thin v.
4.3.2.2. Kt qu phn tch.
Di y l mt s kt qu m b phn tch c php vi thut ton A* t
c vi cc cu t n gin n phc tp.
Ti l sinh vin

Hnh 4-7. CPTCP ti l sinh vin

u ra xml:
<?xml version="1.0" ?>
<BKLightWinParser>
<sentence id="1">ti l sinh vin
<parse id="1">
Sinh vin thc hin: L Quang Thng-20062965-Kha K51-Lp Cng ngh Phn mm B

60

<NP level="1" explain="cm danh t">ti


<P level="2" explain="i t">ti</P>
</NP>
<VP level="1" explain="cm ng t">l sinh vin
<V level="2" explain="ng t">l</V>
<N level="2" explain="danh t">sinh vin</N>
</VP>
</parse>
</sentence>
</BKLightWinParser>

Ti l mt sinh vin hc rt gii mn ton.

Hnh 4-8. CPTCP ti l mt sinh vin hc rt gii mn ton

u ra xml:
<?xml version="1.0" ?>
<BKLightWinParser>
<sentence id="1">ti l mt sinh vin hc rt gii mn ton
<parse id="1">
<NP level="1" explain="cm danh t">ti
<P level="2" explain="i t">ti</P>
</NP>
<VP level="1" explain="cm ng t">l mt sinh vin hc rt gii mn ton
<V level="2" explain="ng t">l</V>
<NP level="2" explain="cm danh t">mt sinh vin hc rt gii mn ton
<M level="3" explain="s t">mt</M>
<NP level="3" explain="cm danh t">sinh vin hc rt gii mn ton
<N level="4" explain="danh t">sinh vin</N>
Sinh vin thc hin: L Quang Thng-20062965-Kha K51-Lp Cng ngh Phn mm B

61

<VP level="4" explain="cm ng t">hc rt gii mn ton


<V level="5" explain="ng t">hc</V>
<AP level="5" explain="cm tnh t">rt gii mn ton
<R level="6" explain="ph t">rt</R>
<A level="6" explain="tnh t">gii</A>
<NP level="6" explain="cm danh t">mn ton
<N level="7" explain="danh t">mn</N>
<N level="7" explain="danh t">ton</N>
</NP>
</AP>
</VP>
</NP>
</NP>
</VP>
</parse>
</sentence>
</BKLightWinParser>

Do ny ti khng cn thch n trng nh lc trc na.


Nguyn nhn mt phn, do ng lc n trng khng c.
u ra xml:
<?xml version="1.0" ?>
<BKLightWinParser>
<sentence id="1">do ny ti khng cn thch n trng nh lc trc na
<parse id="1">
<VP level="1" explain="cm ng t">do
<V level="2" explain="ng t">do</V>
</VP>
<NP level="1" explain="cm danh t">ny ti
<P level="2" explain="i t">ny</P>
<P level="2" explain="i t">ti</P>
</NP>
<VP level="1" explain="cm ng t">khng cn thch n trng nh lc trc na
<R level="2" explain="ph t">khng</R>
<V level="2" explain="ng t">cn</V>
<VP level="2" explain="cm ng t">thch n trng nh lc trc na
<V level="3" explain="ng t">thch</V>
<PP level="3" explain="cm gii t">n trng nh lc trc na
<C level="4" explain="gii t">n</C>
<NP level="4" explain="cm danh t">trng nh lc trc na
<N level="5" explain="danh t">trng</N>
<PP level="5" explain="cm gii t">nh lc trc na
<C level="6" explain="gii t">nh</C>
<NP level="6" explain="cm danh t">lc trc na
<N level="7" explain="danh t">lc</N>
<N level="7" explain="danh t">trc</N>
<R level="7" explain="ph t">na</R>
</NP>
</PP>
</NP>
</PP>
</VP>
</VP>
</parse>
</sentence>
Sinh vin thc hin: L Quang Thng-20062965-Kha K51-Lp Cng ngh Phn mm B

62

<sentence id="2">nguyn nhn mt phn , do ng lc n trng khng c


<parse id="1">
<NP level="1" explain="cm danh t">nguyn nhn mt phn
<N level="2" explain="danh t">nguyn nhn</N>
<M level="2" explain="s t">mt</M>
<N level="2" explain="danh t">phn</N>
</NP>
<punc level="1">,</punc>
<C level="1" explain="gii t">do</C>
<NP level="1" explain="cm danh t">ng lc n trng
<N level="2" explain="danh t">ng lc</N>
<C level="2" explain="gii t">n</C>
<N level="2" explain="danh t">trng</N>
</NP>
<VP level="1" explain="cm ng t">khng c
<R level="2" explain="ph t">khng</R>
<V level="2" explain="ng t">c</V>
</VP>
</parse>
</sentence>
</BKLightWinParser>

Sau y chng ta s th vi mt cu rt kh vi di:


g n ng ng ca, thn th vm v ca k c nui sng bng
cht bt, ln t d di khun mt m m, khng r thin hay c
H thng vn hon ton c th phn tch c:

Hnh 4-9. Hnh nh phn tch ca mt cu rt kh v di.

Di y l bng tng kt v qu trnh th nghim ca h thng vi 630 cu


vn bn phc tp(c so snh vi thut ton CYK-Beam search):
Bng 4-16. Bng tng kt th nghim vi 630 cu hnh vn

Thut ton
A*
CYK-Beam search

Thi gian x l
15 pht
45 pht

S lng phn tch c


92%
75%

Sinh vin thc hin: L Quang Thng-20062965-Kha K51-Lp Cng ngh Phn mm B

63

V chnh xc, sau khi th nghim vi khong 200 cu trong tp TreeBank,


chnh xc t c khong 70% (A*).

4.3.3. nh gi h thng
Vi phm vi ca mt n tt nghip, nhng kt qu m h thng t c l
kh kh quan. Tuy nhin, kt qu th nghim cho thy kt qu ca b phn tch c
php cha c vn cn thp. Nguyn nhn v nhng l do sau y:
B phn tch c php vn cha c gii thut hun luyn vi tp TreeBank
m ch n thun s dng thng k nn hiu nng ca chng trnh
khng c ci tin.
Tp lut c php vn cn cn phi hon thin thm.
B tch t v b gn nhn cho ra kt qu sai dn n u ra ca b phn
tch c cng sai.
Cc cu trong tp VietTreeBank l rt kh v di, hu ht l nhng cu c
phc tp 50-60 t v cu trc rt phc tp.
V mt tc , h thng lun gi c mt tc phn tch kh n nh k c
vi nhng cu di v kh cho thy s u vit ca thut A*. Hn na, mi ch l
bc u, nu kt hp thm gii thut lelightwin ct ta, tc ca h thng c th
c ci thin ln hng chc ln.

4.4. Kt chng
Chng ny trnh by kt qu kim th cng nh nh gi hiu nng ca
chng trnh phn tch c php ting Vit.
Gii thut phn tch c php A* cho kt qu rt kh quan khi phn tch 630
cu hnh vn trong thi gian 15 pht, tc trung bnh khong 3s/1 cu.
Nhng cu ny u l nhng cu rt di v kh.
So vi gii thut CYK-Beam search, gii thut A* t ra u th hn hn v
mt tc . V chnh xc, do khng thi gian nn vn cha c th
nghim cho chng trnh. Nhng trong tng lai nht nh s c hon
thnh th nghim nh gi hiu nng ca h thng mt cch chun xc
hn.
chnh xc khi phn tch cc cu mu trong tp TreeBank vn cha c
cao.

Sinh vin thc hin: L Quang Thng-20062965-Kha K51-Lp Cng ngh Phn mm B

64

KT LUN V HNG PHT TRIN


Nhng thnh tu m n t c:
Tm hiu v nghin cu rt cc phng php phn tch c php cng
nh nhng m hnh trn th gii tm ra hng i mi cho nhnh
ti ny.
Tm hiu v kt hp s dng b tch t v b gn nhn thnh b tin x
l cho u vo ca phn tch c php.
Thit k d liu u ra cho h thng h tr cc cng on khc trong
tng hp ting ni c th d dng s dng.
Xy dng thnh cng gii thut A* p dng cho phn tch c php ting
Vit vi tc v chnh xc kh quan.
xut ra c tng v gii thut lelightwin ct ta gip tng tc
ca h thng phn tch ln mt tm mi.
Hng pht trin ca n:
n th em so snh h thng vi mt s h thng phn tch c php
khc nh h thng phn tch c php PCFG ca thy Hong Anh Vit K46, h
thng phn tch c php hc my thng k ca VLSP. Kt qu cho thy nh sau:
Chng trnh phn tch ca thy Vit tc rt nhanh do tp lut c
php n gin ch c 180 lut (trong khi b lut ca h thng l 938
lut) tuy nhin vi nhng trng hp cu phc tp v lng nhng th kt
qu khng my kh quan. Nhng h thng ca thy Vit c mt im
m h thng ca n cn phi hc hi l c s dng gii thut hun
luyn inside outside, gii thut ny s gip ci thin cht lng ca b
phn tch c php ln rt nhiu.
Chng trnh PTCP ca VLSP tht s rt tt, kt qu cho ra v cng
kh quan. Nu vi nhng cu khng c du phy, th b phn tch c
php ca VLSP t ra u th hn hn so vi h thng ca n. Nguyn
nhn ca kt qu ny l do VLSP (GS.H T Bo) c tp TreeBank
khng l gm 10.000 cu vi b lut c php c nghin cu rt k
lng. Thm vo , h thng tch t v gn nhn ca h c
nghin cu rt khoa hc v c th nghim m bo chnh xc
trn 90%.
Ngoi ra, m hnh PCFG m h thng s dng vn cha phi l ti u khi cc
trng hp nhp nhng c php cp t vng vn cha th gii quyt c. V
vn ny, m hnh LPCFG c nghin cu t rt lu v cho kt qu v cng
kh quan trong vic x l nhp nhng cp t vng. Ngoi ra, vi vic thm thng
tin ca t vo trong lut c php, m hnh LPCFG c th tn dng c ti a
Sinh vin thc hin: L Quang Thng-20062965-Kha K51-Lp Cng ngh Phn mm B

65

thng tin m tp TreeBank mang li, thc s l mt hng pht trin y ha


hn.
Qua nhng iu phn tch trn, nhng hng pht trin ca n trong
tng lai s l :
Hon thnh gii thut lelightwin ct ta.
Kt hp gii thut A* s dng m hnh LPCFG gip nng cao chnh
xc ca b phn tch c php.
Xy dng mt tp VietTreeBank c quy m ln hn hoc ti s dng
tp TreeBank ca VLSP nhm nng cao cht lng ca d liu hun
luyn cng nh b lut c php.
H thng li cc nhn t loi v ng loi s dng b vnTagger thay
th cho vnqtag.

Sinh vin thc hin: L Quang Thng-20062965-Kha K51-Lp Cng ngh Phn mm B

66

Ti liu tham kho


[1].

Fei Xia, Inside-Outside algorithm, LING 572.

[2].

Christopher D.Manning and Hinrich Schutze. Probabilistic

Grammars, Chapter11, 1999.


[3].

Michael Collins, Head-Driven Statistical Models for Natural

Language Parsing, MIT Computer Science and Artificial Intelligence


Laboratory.
[4].

Dan Klein and Christopher D. Manning. 2003. A* parsing: Fast

exact Viterbi parse selection. In Proceedings of the Human Language


Technology

Conference

and the North

American

Association for

Computational Linguistics(HLT-NAACL).
[5].

Dan Klein and Christopher D. Manning. 2002. A* parsing: Fast

exact Viterbi parse selection. Technical Report dbpubs/2002-16, Stanford


University, Stanford, CA.
[6].

Adam Pauls and Dan Klein, K-Best A* Parsing, Computer

Science Division University of California, Berkeley.


[7].

Hong Anh Vit, Phn tch c php ting Vit s dng m hnh xc

sut PCFG, n tt nghip i hc nm 2006.


[8].

Phm Th Nhung, Phn tch c php ting Vit s dng beam

search, n tt nghip i hc nm 2009.


[9].

B Lm, L Thanh Hng, Implementing a Vietnamese

syntactic parser using HPSG, Khoa Cng ngh thng tin, trng i hc
Bch khoa H Ni.
[10]. Dip Quang Ban, Hong Vn Thung, Ng php ting Vit, tp
1,2, Nh xut bn gio dc, 1991-1992.
[11]. Trung tm khoa hc x hi v nhn vn Quc Gia. Ng php
ting Vit. Nh xut bn Khoa hc X hi 2000.
[12]. Nguyn Phng Thi, V Xun Lng, Nguyn Th Minh Huyn.
Xy dng treebank ting Vit.

Sinh vin thc hin: L Quang Thng-20062965-Kha K51-Lp Cng ngh Phn mm B

67

PH LC
Bng cc k hiu dng trong tp lut c php
A tnh t
AE Ph t trc tnh t
AH Tnh t trung tm
AP cm tnh t
AR Ph t sau tnh t
AV trng ng
Ac tnh t tng th
Al tnh t nh lng
Ao tnh t tng thanh
Ap tnh t tnh cht
C gii t
Cm gii t chnh ph
Co ,
Cp gii t lin hp
D ph t kt qu
E cm t
H trung tm
I danh t s lng
M s t
N danh t
NE Ph t trc danh t
NH Danh t trung tm
NP cm danh t
NPC cm danh t
NR Ph t sau danh t
Na danh t tru tng
Nc danh t n th
Ng danh t tng th
Np tn ring
Nq s t
Ns loi t
Nu danh t n v o lng
P i t
PP cm gii t
PRD v ng
Pd i t ch nh
Pi i t nghi vn
Pl i t hot ng, tnh cht
Pn i t s lng
Pp i t xng h

Pu i t xng h
QP cm t ch s lng
R ph t
RP cm ph t
Rc ph t so snh
Rd ph t mc
Ri ph t mnh lnh
Root Cc cu phn tch c
Rt ph t thi gian
Rv ph t v tr
S Cu
SBAR Mnh ph
SBARS Mnh ph
SBJ ch ng
SC Cu mnh lnh
SE Cu cm thn
SF Cu thuyt
SN Cu trn thut
SQ Cu hi
V ng t
VE Ph t trc ng t
VH ng t trung tm
VP cm ng t
VPC cm ng t
VR Ph t sau ng t
Vc ng t tng th
Vit ng t ni ng
Vt ng t ngoi ng
Vz ng t l
WHAP cm tnh t nghi vn
WHNP cm danh t nghi vn
WHPP cm gii t nghi vn
WHRP cm ph t nghi vn
X t khng xc nh
Y t vit tt

Mt s kt qu phn tch c php vi nhng cu ph cp


<?xml version="1.0" ?>
<BKLightWinParser>
<sentence id="1">ti l mt c gi thng xuyn ca chuyn mc tm s
<parse id="1">
<NP level="1" explain="cm danh t">ti
Sinh vin thc hin: L Quang Thng-20062965-Kha K51-Lp Cng ngh Phn mm B

68

<P level="2" explain="i t">ti</P>


</NP>
<VP level="1" explain="cm ng t">l mt c gi thng xuyn ca chuyn mc tm s
<V level="2" explain="ng t">l</V>
<NP level="2" explain="cm danh t">mt c gi thng xuyn ca chuyn mc tm s
<M level="3" explain="s t">mt</M>
<NP level="3" explain="cm danh t">c gi thng xuyn ca chuyn mc tm s
<N level="4" explain="danh t">c gi</N>
<AP level="4" explain="cm tnh t">thng xuyn ca chuyn mc tm s
<A level="5" explain="tnh t">thng xuyn</A>
<PP level="5" explain="cm gii t">ca chuyn mc tm s
<C level="6" explain="gii t">ca</C>
<NP level="6" explain="cm danh t">chuyn mc tm s
<N level="7" explain="danh t">chuyn mc</N>
<V level="7" explain="ng t">tm s</V>
</NP>
</PP>
</AP>
</NP>
</NP>
</VP>
</parse>
</sentence>
<sentence id="2">ti theo di xem c tnh hung no ging mnh khng rt kinh nghim cho bn thn
nhng khng thy
<parse id="1">
<NP level="1" explain="cm danh t">ti
<P level="2" explain="i t">ti</P>
</NP>
<VP level="1" explain="cm ng t">theo di xem c tnh hung no ging mnh khng rt
kinh nghim cho bn thn nhng khng thy
<VP level="2" explain="cm ng t">theo di xem c tnh hung no ging mnh khng
<V level="3" explain="ng t">theo di</V>
<VP level="3" explain="cm ng t">xem c tnh hung no ging mnh khng
<V level="4" explain="ng t">xem</V>
<VP level="4" explain="cm ng t">c tnh hung no ging mnh khng
<V level="5" explain="ng t">c</V>
<NP level="5" explain="cm danh t">tnh hung no ging mnh
<N level="6" explain="danh t">tnh hung</N>
<P level="6" explain="i t">no</P>
<AP level="6" explain="cm tnh t">ging mnh
<A level="7" explain="tnh t">ging</A>
<P level="7" explain="i t">mnh</P>
</AP>
</NP>
<R level="5" explain="ph t">khng</R>
</VP>
</VP>
</VP>
<C level="2" explain="gii t"></C>
<VP level="2" explain="cm ng t">rt kinh nghim cho bn thn nhng khng thy
<V level="3" explain="ng t">rt</V>
<NP level="3" explain="cm danh t">kinh nghim cho bn thn nhng khng thy
<N level="4" explain="danh t">kinh nghim</N>
Sinh vin thc hin: L Quang Thng-20062965-Kha K51-Lp Cng ngh Phn mm B

69

<PP level="4" explain="cm gii t">cho bn thn nhng khng thy


<C level="5" explain="gii t">cho</C>
<NP level="5" explain="cm danh t">bn thn nhng khng thy
<N level="6" explain="danh t">bn thn</N>
<PP level="6" explain="cm gii t">nhng khng thy
<C level="7" explain="gii t">nhng</C>
<VP level="7" explain="cm ng t">khng thy
<R level="8" explain="ph t">khng</R>
<V level="8" explain="ng t">thy</V>
</VP>
</PP>
</NP>
</PP>
</NP>
</VP>
</VP>
</parse>
</sentence>
<sentence id="3">hm nay ti mun gi tm s ca mnh , rt mong cc bn cho ti nhng li khuyn b
ch
<parse id="1">
<NP level="1" explain="cm danh t">hm nay ti
<N level="2" explain="danh t">hm nay</N>
<P level="2" explain="i t">ti</P>
</NP>
<VP level="1" explain="cm ng t">mun gi tm s ca mnh , rt mong cc bn cho ti nhng
li khuyn b ch
<V level="2" explain="ng t">mun</V>
<VP level="2" explain="cm ng t">gi tm s ca mnh , rt mong cc bn cho ti nhng li
khuyn b ch
<VP level="3" explain="cm ng t">gi tm s ca mnh
<V level="4" explain="ng t">gi</V>
<VP level="4" explain="cm ng t">tm s ca mnh
<V level="5" explain="ng t">tm s</V>
<C level="5" explain="gii t">ca</C>
<P level="5" explain="i t">mnh</P>
</VP>
</VP>
<punc level="3">,</punc>
<VP level="3" explain="cm ng t">rt mong cc bn cho ti nhng li khuyn b ch
<R level="4" explain="ph t">rt</R>
<V level="4" explain="ng t">mong</V>
<NP level="4" explain="cm danh t">cc bn cho ti nhng li khuyn b ch
<N level="5" explain="danh t">cc</N>
<N level="5" explain="danh t">bn</N>
<PP level="5" explain="cm gii t">cho ti nhng li khuyn b ch
<C level="6" explain="gii t">cho</C>
<NP level="6" explain="cm danh t">ti nhng li khuyn b ch
<P level="7" explain="i t">ti</P>
<N level="7" explain="danh t">nhng</N>
<N level="7" explain="danh t">li</N>
<VP level="7" explain="cm ng t">khuyn b ch
<V level="8" explain="ng t">khuyn</V>
<A level="8" explain="tnh t">b ch</A>
Sinh vin thc hin: L Quang Thng-20062965-Kha K51-Lp Cng ngh Phn mm B

70

</VP>
</NP>
</PP>
</NP>
</VP>
</VP>
</VP>
</parse>
</sentence>
<sentence id="4">ti ba mi bn tui , l bc s
<parse id="1">
<NP level="1" explain="cm danh t">ti ba mi bn tui
<NP level="2" explain="cm danh t">ti
<P level="3" explain="i t">ti</P>
</NP>
<NP level="2" explain="cm danh t">ba mi bn tui
<N level="3" explain="danh t">ba mi</N>
<M level="3" explain="s t">bn</M>
<N level="3" explain="danh t">tui</N>
</NP>
</NP>
<punc level="1">,</punc>
<VP level="1" explain="cm ng t">l bc s
<V level="2" explain="ng t">l</V>
<N level="2" explain="danh t">bc s</N>
</VP>
</parse>
</sentence>
<sentence id="5"> lp gia nh mi nm v c mt b trai , mt b gi
<parse id="1">
<VP level="1" explain="cm ng t"> lp gia nh mi nm
<R level="2" explain="ph t"></R>
<V level="2" explain="ng t">lp</V>
<NP level="2" explain="cm danh t">gia nh mi nm
<N level="3" explain="danh t">gia nh</N>
<M level="3" explain="s t">mi</M>
<N level="3" explain="danh t">nm</N>
</NP>
</VP>
<C level="1" explain="gii t">v</C>
<VP level="1" explain="cm ng t">c mt b trai , mt b gi
<V level="2" explain="ng t">c</V>
<NP level="2" explain="cm danh t">mt b trai , mt b gi
<M level="3" explain="s t">mt</M>
<NP level="3" explain="cm danh t">b trai , mt b gi
<N level="4" explain="danh t">b</N>
<N level="4" explain="danh t">trai</N>
<punc level="4">,</punc>
<NP level="4" explain="cm danh t">mt b gi
<M level="5" explain="s t">mt</M>
<NP level="5" explain="cm danh t">b gi
<N level="6" explain="danh t">b</N>
<N level="6" explain="danh t">gi</N>
Sinh vin thc hin: L Quang Thng-20062965-Kha K51-Lp Cng ngh Phn mm B

71

</NP>
</NP>
</NP>
</NP>
</VP>
</parse>
</sentence>
<sentence id="6">nhn chung v chng ti sng ho thun , nhng nhn nhau
<parse id="1">
<VP level="1" explain="cm ng t">nhn chung v chng ti
<V level="2" explain="ng t">nhn chung
<U level="3" explain="null">nhn chung</U>
</V>
<NP level="2" explain="cm danh t">v chng ti
<N level="3" explain="danh t">v chng</N>
<P level="3" explain="i t">ti</P>
</NP>
</VP>
<VP level="1" explain="cm ng t">sng ho thun , nhng nhn nhau
<VP level="2" explain="cm ng t">sng ho thun
<V level="3" explain="ng t">sng</V>
<A level="3" explain="tnh t">ho thun</A>
</VP>
<punc level="2">,</punc>
<VP level="2" explain="cm ng t">nhng nhn nhau
<V level="3" explain="ng t">nhng nhn</V>
<N level="3" explain="danh t">nhau</N>
</VP>
</VP>
</parse>
</sentence>
<sentence id="7">chng ti l mt ngi tt , thng yu v con
<parse id="1">
<NP level="1" explain="cm danh t">chng ti
<N level="2" explain="danh t">chng</N>
<P level="2" explain="i t">ti</P>
</NP>
<VP level="1" explain="cm ng t">l mt ngi tt , thng yu v con
<VP level="2" explain="cm ng t">l mt ngi tt
<V level="3" explain="ng t">l</V>
<NP level="3" explain="cm danh t">mt ngi tt
<M level="4" explain="s t">mt</M>
<NP level="4" explain="cm danh t">ngi tt
<N level="5" explain="danh t">ngi</N>
<A level="5" explain="tnh t">tt</A>
</NP>
</NP>
</VP>
<punc level="2">,</punc>
<VP level="2" explain="cm ng t">thng yu v con
<V level="3" explain="ng t">thng yu</V>
<N level="3" explain="danh t">v con</N>
</VP>
Sinh vin thc hin: L Quang Thng-20062965-Kha K51-Lp Cng ngh Phn mm B

72

</VP>
</parse>
</sentence>
<sentence id="8">nhn b ngoi , ai cng bo ti l ngi v hnh phc
<parse id="1">
<VP level="1" explain="cm ng t">nhn b ngoi
<V level="2" explain="ng t">nhn</V>
<N level="2" explain="danh t">b ngoi</N>
</VP>
<punc level="1">,</punc>
<NP level="1" explain="cm danh t">ai
<P level="2" explain="i t">ai</P>
</NP>
<VP level="1" explain="cm ng t">cng bo ti l ngi v hnh phc
<R level="2" explain="ph t">cng</R>
<V level="2" explain="ng t">bo</V>
<NP level="2" explain="cm danh t">ti l ngi v hnh phc
<P level="3" explain="i t">ti</P>
<VP level="3" explain="cm ng t">l ngi v hnh phc
<V level="4" explain="ng t">l</V>
<NP level="4" explain="cm danh t">ngi v hnh phc
<N level="5" explain="danh t">ngi</N>
<NP level="5" explain="cm danh t">v hnh phc
<N level="6" explain="danh t">v</N>
<A level="6" explain="tnh t">hnh phc</A>
</NP>
</NP>
</VP>
</NP>
</VP>
</parse>
</sentence>
<sentence id="9">nhng c nhng iu kh s m chng bit tm s cng ai
<parse id="1">
<C level="1" explain="gii t">nhng</C>
<VP level="1" explain="cm ng t">c nhng iu kh s m chng bit tm s cng ai
<V level="2" explain="ng t">c</V>
<NP level="2" explain="cm danh t">nhng iu kh s m chng bit tm s cng ai
<N level="3" explain="danh t">nhng</N>
<N level="3" explain="danh t">iu</N>
<AP level="3" explain="cm tnh t">kh s m chng bit tm s cng ai
<A level="4" explain="tnh t">kh s</A>
<PP level="4" explain="cm gii t">m chng bit tm s cng ai
<C level="5" explain="gii t">m</C>
<VP level="5" explain="cm ng t">chng bit tm s cng ai
<R level="6" explain="ph t">chng</R>
<V level="6" explain="ng t">bit</V>
<VP level="6" explain="cm ng t">tm s cng ai
<V level="7" explain="ng t">tm s</V>
<C level="7" explain="gii t">cng</C>
<P level="7" explain="i t">ai</P>
</VP>
</VP>
Sinh vin thc hin: L Quang Thng-20062965-Kha K51-Lp Cng ngh Phn mm B

73

</PP>
</AP>
</NP>
</VP>
</parse>
</sentence>
<sentence id="10"> l thi quen sinh hot bn thu v lum thum ca chng ti
<parse id="1">
<NP level="1" explain="cm danh t">
<P level="2" explain="i t"></P>
</NP>
<VP level="1" explain="cm ng t">l thi quen sinh hot bn thu v lum thum ca chng ti
<V level="2" explain="ng t">l</V>
<NP level="2" explain="cm danh t">thi quen sinh hot bn thu v lum thum ca chng ti
<N level="3" explain="danh t">thi quen</N>
<VP level="3" explain="cm ng t">sinh hot bn thu v lum thum ca chng ti
<V level="4" explain="ng t">sinh hot</V>
<AP level="4" explain="cm tnh t">bn thu v lum thum
<A level="5" explain="tnh t">bn thu</A>
<C level="5" explain="gii t">v</C>
<A level="5" explain="tnh t">lum thum</A>
</AP>
<PP level="4" explain="cm gii t">ca chng ti
<C level="5" explain="gii t">ca</C>
<NP level="5" explain="cm danh t">chng ti
<N level="6" explain="danh t">chng</N>
<P level="6" explain="i t">ti</P>
</NP>
</PP>
</VP>
</NP>
</VP>
</parse>
</sentence>
<sentence id="11"> mi ba n anh y ung nm n su chn ru , tr bui sng , mi ngy ht hai
gi thuc l , nhng hu nh khng nh rng , ra tay , tm ra
<parse id="1">
<S level="1" explain="Cu">mi ba n anh y ung nm n su chn ru , tr bui sng
<NP level="2" explain="cm danh t">mi ba n anh y
<N level="3" explain="danh t">mi</N>
<NP level="3" explain="cm danh t">ba n anh y
<N level="4" explain="danh t">ba</N>
<V level="4" explain="ng t">n</V>
<P level="4" explain="i t">anh y</P>
</NP>
</NP>
<VP level="2" explain="cm ng t">ung nm n su chn ru , tr bui sng
<V level="3" explain="ng t">ung</V>
<NP level="3" explain="cm danh t">nm n su chn ru , tr bui sng
<N level="4" explain="danh t">nm</N>
<VP level="4" explain="cm ng t">n su chn ru
<V level="5" explain="ng t">n</V>
<NP level="5" explain="cm danh t">su chn ru
Sinh vin thc hin: L Quang Thng-20062965-Kha K51-Lp Cng ngh Phn mm B

74

<M level="6" explain="s t">su</M>


<NP level="6" explain="cm danh t">chn ru
<N level="7" explain="danh t">chn</N>
<N level="7" explain="danh t">ru</N>
</NP>
</NP>
</VP>
<punc level="4">,</punc>
<VP level="4" explain="cm ng t">tr bui sng
<V level="5" explain="ng t">tr</V>
<N level="5" explain="danh t">bui sng</N>
</VP>
</NP>
</VP>
</S>
<punc level="1">,</punc>
<S level="1" explain="Cu">mi ngy ht hai gi thuc l , nhng hu nh khng nh rng , ra
tay , tm ra
<NP level="2" explain="cm danh t">mi ngy ht hai gi thuc l , nhng hu nh khng
nh rng , ra tay ,
<N level="3" explain="danh t">mi</N>
<NP level="3" explain="cm danh t">ngy ht hai gi thuc l , nhng hu nh khng nh
rng , ra tay ,
<N level="4" explain="danh t">ngy</N>
<VP level="4" explain="cm ng t">ht hai gi thuc l , nhng hu nh khng nh
rng
<VP level="5" explain="cm ng t">ht hai gi thuc l
<V level="6" explain="ng t">ht</V>
<NP level="6" explain="cm danh t">hai gi thuc l
<M level="7" explain="s t">hai</M>
<NP level="7" explain="cm danh t">gi thuc l
<N level="8" explain="danh t">gi</N>
<N level="8" explain="danh t">thuc l</N>
</NP>
</NP>
</VP>
<punc level="5">,</punc>
<C level="5" explain="gii t">nhng</C>
<VP level="5" explain="cm ng t">hu nh khng nh rng
<RP level="6" explain="cm ph t">hu nh khng
<R level="7" explain="ph t">hu nh</R>
<R level="7" explain="ph t">khng</R>
</RP>
<VP level="6" explain="cm ng t">nh rng
<V level="7" explain="ng t">nh</V>
<N level="7" explain="danh t">rng</N>
</VP>
</VP>
</VP>
<punc level="4">,</punc>
<VP level="4" explain="cm ng t">ra tay
<V level="5" explain="ng t">ra</V>
<N level="5" explain="danh t">tay</N>
</VP>
<punc level="4">,</punc>
Sinh vin thc hin: L Quang Thng-20062965-Kha K51-Lp Cng ngh Phn mm B

75

</NP>
</NP>
<V level="2" explain="ng t">tm ra</V>
</S>
</parse>
</sentence>
</BKLightWinParser>

Sinh vin thc hin: L Quang Thng-20062965-Kha K51-Lp Cng ngh Phn mm B

76

You might also like