Professional Documents
Culture Documents
I.
TM TT
M hnh Markov n l m hnh thng k ph bin m hnh chui d liu bin i
nhiu theo thi gian. Trong vic x l ngn ng t nhin NLP ( Natural Language
Processing), HMM c ng dng vi nhng thnh cng to ln trong vic gii
quyt cc vn nh trch thuc tnh ca ting ni, phn khc cc cm t.
II.
GII THIU
M hnh Markov n l mt cng c thng k rt mnh trong vic m hnh ha cc
chui c th sinh ra , hay ni cch khc l cc chui m c th c trng bi cc
chui trng thi sinh ra cc chui quan st khc nhau .
M hnh Markov n c ng dng trong rt nhiu lnh vc ca x l tn hiu ni
chung, v x l ting ni ni ring. V ng dng thnh cng trong NLP ( Natural
Languages Processing ) nh: part-of-speech tagging, phrase chunking, extracting
target information from document.
-
S l tp hp tt c cc trng thi:
V l tp hp tt c cc quan st c:
EVALUATION
Cho mt m hnh Markov n v mt chui quan st c O, ta c th tnh c P(O|
), l xc xut xut hin ca chui quan st cho bi m hnh Makov n. T ta
c th nh gi cht lng m hnh khi d on v chui O cho trc v chn c
m hnh thch hp nht.
Xc sut chui quan st O cho chui trang thi Q c tnh bi:
Chng ta c th tnh trc tip xc xut chui O, tuy nhin lng php tnh cn dng l
rt ln.
Do , chng ta c th s dng phng php tt hn l nhn bit lng tnh ton d,
sau m chng t c mc tiu gim phc tp trong tnh ton. Ta thc
hin m bng biu mt co cho mi bc tnh, ta tnh gi tr m (k hiu )ti
mi trng thi bng cch tng tt c trng thi trc n. lc ny l xc sut cua
s
chui O ti mc trng thi i ti thi im t.
Hnh trn biu din cc php tnh trong 1 bc t trng thi th t sang trng thi th
T
t+1. Bng phng php s dng m, ta gim c lng tnh ton t 2TN
2
cn TN php tnh.
IV.
DECODING
Mc ch ca vic decoding l xc inh c chui trng thi m c kh nng a ra
c chui quan st cho trc nhiu nht. Mt gii php cho vn ny l s dng
thut ton Viterbi.
Thut ton Viterbi l mt dng khc ca thut ton biu mt co, tng t nh
thut ton Forward, ngoi tr chn cc gi tr xc sut chuyn i ln nht ti mi
bc, thay v tnh tng ca chng.
u tin, chng ta phi xc nh:
- Thut ton Viterbi c thc hin nh sau:
1. Initialisation:
2. Recursion:
3. Termination:
hay khng
Tm xc sut ln nht chui thi tit s dng thut ton Viterbi. (gi thit rng xc sut u tin l
nh nhau vo ngy 1)
1. Tnh gi tr ban u ( Initialisation ):
n=3:
3. Termination:
Con ng c th xy ra nht c xc nh, bt u bng cch tm ra trng thi cui cng
ca chui c th xy ra nht.
4. Backtracking:
Chui trng thi tt nht c th c t vector . Xem hnh 7
V.
LEARNING
Cho trc 1 h thng cc mu ca mt qu trnh, chng ta c th nh gi cc thng s ca
m hnh Makov n = (A, B, ) sao cho chng th hin qu trnh mt cch ti u. C 2
phng php thng thng thc hin, ty thuc vo dng ca mu v d cho trc,
l training c gim st v training khng gim st.
Nu mu v d c c u vo v u ra th ta thc hin training c gim st, vi u vo l
chui quan st c cn u ra l chui trng thi. Nu mu v d ch c u vo th ta ch
c th training khng gim st bng cch on cc thng s ca m hnh t c
chui quan st cho.
Trong bi ny ta ch tho lun v training c gim st, cn training khng gim st vi
thut ton Baum -Wetch c trnh by trong [6].
Gii php n gin nht thit lp m hnh Makov n c thng s l s dng mt lot
mu v d. Tiu biu cho phng php ny l gii php PoS tagging.
Ta m t 2 nhm:
t 1 t N l nhm tag, tng ng nhm trng thi s 1 s N
w1 wM
ca HMM
ca HMM
Count (t i , t j )
ti
ti
sang
tj
Count (t i)
l s ln
ng vi
tj
Count (t j )
l s ln
xut hin.
Count (w k ,t j )
tj
l s ln quan st c
xut hin.
wk