Professional Documents
Culture Documents
Ict08 VLSP SP84 1
Ict08 VLSP SP84 1
Hong Th ip
i hc Cng Ngh - HQG
H Ni
Trn Mnh K
i hc Cng Ngh - HQG
H Ni
Tm tt
Phn tch c php c vai tr quan trng trong lnh vc x l vn bn v n l bc trung
gian ca nhiu bi ton ln nh: tm tt vn bn, dch my, hi p t ng. Trong thi gian
gn y, phn tch c php ph thuc thu ht c s quan tm ca nhiu nhm nghin cu
x l ngn ng t nhin trn th gii bi quan h ph thuc gia hai t vng c th c ch
trong kh nhp nhng v c php ny c kh nng m hnh ha cc ngn ng c trt t t t
do. Trong bo co ny, chng ti trnh by phng php Maximum Spanning Tree phn
tch c php ph thuc cu ting Vit v s dng b hiu chnh cy bng lut ci thin u
ra ca MST. Cui cng chng ti a ra mt s kt qu thc nghim trn tp ng liu 450
cu ting Vit v xut hng pht trin phng php MST cho bi ton ny.
1
1.1
Gii thiu
Tnh hnh nghin cu t ng phn tch c php ph thuc ting Vit
C php ph thuc
u ra
ca MST
B hiu chnh
u ra
cui cng
M hnh M1 c sinh ra bng phng php hc my MIRA4 [11] hc trn d liu hun
luyn. Cn M2 c sinh bng Perceptron a lp [11] hc trn tp kt hp u ra ca MST
v d liu hun luyn.
1.5
S lc cu trc bo co
c trng
Tnh phn
tch
Tnh n
hnh
Trt t
t
iu kin x nh
T loi ca v t
Ting Vit
SVO
a s nhng khng
phi ton b
ng t, tnh t, danh
t, mt s h t
Khi nim ngn ng n hnh7 khng ng nht vi khi nim ngn ng phn tch. Ngn
ng n hnh l ngn ng c phn ln hnh v l hnh v t do v c tiu chun l mt t.
Mc n c xc nh theo t l s lng hnh v - trn - s lng t. Ngn ng n hnh
ph bin cc nc ng Nam , trong c Vit Nam, v Trung Hoa c.
2.3
Trt t t8 [4]
Trong ngn ng hc, h thng phn loi theo trt t t ni ti nghin cu v cch m
ngn ng sp xp tng i cc thnh phn ca mt cu v v quan h gia cc cch sp ny.
Vi hu ht cc ngn ng c danh t chim a s th ta c th nh ngha mt trt t t c
bn theo ng t nguyn th (V) v cc i s ca n, ch ng (S) v tn ng (O). Theo
c 6 trt t c bn: SVO, SOV, VSO, VOS, OSV, OVS. Ng php Vit Nam thuc loi SVO.
Bn cnh cc trt t cp, cn mt lp cc ngn ng ng lu c gi l ngn
ng c trt t t t do (free word order language) v d nh ting La-tinh, Sc, Hung-ga-ri,
Ba Lan, Nga - i hi cc phng php nghin cu phc tp hn trong bi ton phn tch t
ng c php ph thuc.
5
2.4
iu kin x nh cho th ph thuc c pht biu mt cch hnh thc trong bi ging
[5] nh sau:
Mt th ph thuc c gi l c tnh x nh khi
Nu c i j th i * i vi i ' bt k tha mn i < i ' < j hoc j < i ' < i .
Khi nim t kha ca cu (mc t ph thuc vo nt gi root) trong phn tch ph thuc
chnh l khi nim v t trong ngn ng hc. Trong ting Anh th v t lun l ng t, nhng
trong ting Vit, t loi ca v t rt a dng. Cc v d bn di c trch t chng 1,
phn 2.2. Cc kiu cu c bn ca ting Vit trong cun Ng php Vit Nam [6]. V t l
cc t hay cm t in m.
T loi ca v t
ng t
V d
Gip a cho T t bo.
tnh t
danh t
h t l
h t bng
T loi ca v t
h t ti, do,
bi
h t
h t ch v tr
h t nh
h t ca
V d
Vic ny ti n.
Hng ny do h lm.
Bn y ung nc.
ng ti ngoi vn.
nh hoa vng.
Xe ny ca Gip.
Hng ny ca h lm.
Ryan McDonald trong [7] xut tip cn da trn th, c th l a bi ton phn
tch c php ph thuc v bi ton tm cy khung ti i ca mt th nh hng c trng
9
a v bi ton MST
s(i , j) =
( i , j )y
w . f(i , j)
( i , j )y
a)
b)
c)
c trng Unigram c bn
xi-word, xi-pos
c trng Bi-gram c bn
xi-word
xi-pos
xj-word, xj-pos
xj-word
xj-pos
10
3.2
Cc c trng c kho st
hm mt cy con l hon chnh (c=1, khng th thm dependent) hay cha hon chnh (c=0,
cn c hon chnh).
Dng c nh du (*) c ngha l tm im s tt nht cho mt cy con tri cha
hon chnh ta ch cn tm ch s sr<t s em li im s cao nht c th khi ghp hai cy con
hon chnh.
Theo rng buc phi c mt gc duy nht nm bn tri cu, im s ca cy tt nht cho c
cu l C[1][n][k][1].
3.3.2 Gii thut Chu-Liu-Edmonds cho trng hp khng x nh
Hnh 6 l phc tho ca Georgiadis cho gii thut Chu-Liu-Edmonds. C th pht biu
bng li l: vi mi nh trong th, gii thut chn (bng cch tham n) cnh i vo c
trng s cao nht. Nu to thnh mt cy th chnh l cy khung ti i. Nu khng th n
7
phi l mt chu trnh. Th tc trong hnh l pht hin mt chu trnh v rt gn n thnh
mt nh n v tnh li cc trng s cnh i vo v ra chu trnh.
Tc gi cng chng minh: cy khung ti i trn th rt gn l tng ng vi mt
cy khung ti i trn th gc. V vy gii thut c th gi quy ti chnh n trn th
mi. dng n gin nht, gii thut ny chy vi thi gian O(n3). MST s dng phin bn
ci tin ca tc gi Tarjan c phc tp thi gian O(n2) vi th tr mt [7].
3.4
Vn gn tn quan h ph thuc
s(x , y) =
w . f(i , j, t)
( i , j , t )y
m=2
MIRA
(mi ln cp nht w ta chn vect trng s mi gn vi
vect c nht)
tm min||w||
cho y parses(xt)
Hnh 7 So snh MIRA v SVMs
11
nng cao chnh xc b phn tch c php ph thuc, chng ti thc hin cc lut
hiu chnh cy trn u ra ca MST. Gii php s dng y l tip cn Giuseppe Attardi
xut trong [9]: xem cc lut hiu chnh ny nh cc nhn phn loi, nh vy a bi ton hiu
chnh u ra ca mt b phn tch c php ph thuc v bi ton phn lp.
4.1
a v bi ton phn lp
K hiu
r
u
-n
+n
[
]
>
<
d-d++
d-1
d+1
dP
e E}. Bng cch xem mi lut hiu chnh l mt nhn, ta a bi ton hiu chnh cy v tm
mt chui nhn cho cc mc t trong cu x. Mi mc t mt nhn.
C hai la chn cho vic dng E: 1-p dng ng thi tt c cc lut, 2-p dng tng lut
ring l to ra cy trung gian, ri li tip tc tm lut hiu chnh trn cy trung gian ny. Do
cch 2 c th to nhng dng trung gian khng phi l cy nn nghin cu ch dng li
cch 1.
4.2
4.2.1 tng
D liu hun luyn l cc cp vect c trng fi v lut ri (fi c th sinh t yi ; ri c th
sinh t cp cy gm cy gn nhn bng tay (yMi) v cy u ra ca MST (yi)). Cn ch l
mi cp (f, r) ng vi mt mc t trn cy.
Mc ch ca chng ta l hc hm C: FlR. R l khng gian cc lut hiu chnh, vi r1 k
hiu cho php gi nguyn cy.
Dng Perceptron a lp thc hin nhim v ny. Mi lut r s c mt vect trng s
wr tng ng. Bi ton quy v hc cc vect trng s wr
1 T t
w r vi T l s mu hun luyn.
T t =1
5
5.1
Kho ng liu dng cho thc nghim gm 450 cu ting Vit trch ngu nhin t cc bi
bo nhiu chuyn mc khc nhau ca bo in t Vietnamnet.
D liu c tin x l (sa li chnh t), gn nhn bng tay cc thng tin v t loi v
quan h ph thuc v nh dng theo chun ca Hi tho quc t CoNLL-X 2006 [8].
11
Thng tin v b nhn t loi v b nhn quan h ph thuc c m t chi tit hn trong
ti liu i km kho ng liu.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
Ti
s
tip_tc
gip
,
Tm
ni
,
ti
khi
h
khng
cn
ti
na
.
ti
P
s
R
tip_tc MD
gip
V
,
SYM
tm
NP
ni
V
,
SYM
ti
IN
khi
NN
h
P
khng
R
cn
V
ti
P
na
R
.
SYM
P
R
MD
V
SYM
NP
V
SYM
IN
NN
P
R
V
P
R
SYM
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
3
3
7
3
3
7
0
7
3
9
13
13
10
13
13
7
NP-SBJ
ADVP
S
VP
DEP
NP-SBJ
ROOT
DEP
PP
SBAR
NP-SBJ
ADVP
S
NP-OBJ
ADVP
DEP
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
5.2
5.2.1 Phngphp nh gi
Do d liu i hi qu trnh x l bng tay cng phu nn chng ti cha xy dng c
nhiu. kt qu nh gi l chnh xc nht vi 450 cu xy dng c, chng ti xut
vn dng linh hot phng php nh gi cho15.
a) Phng php nh gi MST
Chia d liu thnh 10 phn nh gi cho.
b) Phng php nh gi MST sau khi hiu chnh
Chia d liu thnh 10 phn, k hiu l T1,..,T10. kim th hiu chnh trn T1, ta thc
hin quay vng MST trn 9 phn cn li (hun luyn MST trn 8 phn v kim th MST trn
1 phn) ri gp kt qu kim th li lm d liu hun luyn b hiu chnh.
Lm tng t vi 9 phn cn li v chia trung bnh c chnh xc.
5.2.2 Thc o
Chng ti dng hai thc o in hnh cho bi ton phn tch ph thuc l: UAS (vit tt
ca Unlabeled Attachment Score) l chnh xc khi cha tnh n tn quan h ph thuc;
v LAS (vit tt ca Labeled Attachment Score) l chnh xc khi xt c tn quan h ph
thuc.
5.3
Kt qu thc nghim
Bng 3 So snh kt qu MST khi trc v sau hiu chnh
Phng php
MST bc 1
MST bc 1 + hiu chnh
15
UAS
67.70%
66.49%
LAS
63.11%
61.76%
12
Kt lun
[10]
[11] K. Crammer and Y. Singer (2003). Ultraconservative Online Algorithms for Multiclass Problems. Journal
of Machine Learning Research 3: pp.951-991.
[12]
J. Nivre, J. Hall, S. Kubler, R. McDonald, J. Nilsson, S. Riedel, and D. Yuret (2007). The CoNLL 2007
Shared Task on Dependency Parsing. Conference on Empirical Methods in Natural Language Processing
and Natural Language Learning.
13