Professional Documents
Culture Documents
7 Trình diễn thông tin
7 Trình diễn thông tin
e
=
D d
d
D
D
| |
1
) (
d
39
V d: trng tm
40
Thut ton Rocchio
Thut ton Rocchio cho ph hp phn hi trong m hnh
khng gian vec-t.
Rocchio la chn truy vn cc i ha hm
D
r
: tp vn bn ph hp; D
nr
: tp vn bn khng ph
hp
xu hng: ~q
opt
l vec-t tch bit ti a vn bn ph hp
v khng ph hp.
S dng mt vi gi thuyt chng ta c th vit li
nh sau:
))] ( , cos( )) ( , [cos(
max arg
nr r
q
opt
D q D q q
=
41
Thut ton Rocchio
Vec-t truy vn ti u l:
Chng ta dch chuyn trng tm ca nhng vn bn ph
hp theo s khc bit gia hai trng tm.
42
Bi tp: Tnh vec-t Rocchio
Cc vng trn: nhng vn bn ph hp, Xs: nhng vn bn khng
ph hp
43
Minh ha thut ton Rocchio
: Trng tm ca nhng vn bn ph hp
44
Minh ha Rocchio
khng tch bit nhng vn bn ph hp / khng ph hp.
45
Minh ha Rocchio
trng tm ca nhng vn bn khng ph hp.
46
Minh ha Rocchio
47
Minh ha Rocchio
- hiu vec-t
48
Minh ha Rocchio
cng vec-t khc bit vo
49
Minh ha Rocchio
ly
50
Minh ha Rocchio
phn bit vn bn ph hp / khng ph hp rt hiu qu.
51
Minh ha Rocchio
tch bit vn bn ph hp / khng ph hp rt hiu qu.
52
Thut ng
Chng ta vn s dng tn Rocchio cho phin bn
tt hn v mt l thuyt so vi phin bn Rochio
gc.
S ci t thc s c s dng trong hu ht
trng hp l ci t trong SMART chng ti
s dng tn Rocchio (khng ghi ch) cho gii
thut .
53
Gii thut Rocchio 1971 (SMART)
q
m
: vec-t truy vn thay i; q
0
: vec-t truy vn gc;
D
r
v D
nr
: tp vn bn ph hp v khng ph hp bit;
, , v : l cc trng s
Truy vn mi di chuyn v pha vn bn ph hp v ri xa
nhng vn bn khng ph hp.
S cn bng gia vs. /: nu chng ta c nhiu vn
bn thm nh, chng ta mun s dng / cao hn.
t trng s m bng 0.
trng s m ca thut ng khng c ngha trong m hnh
khng gian vec-t.
c s dng trong thc t:
e e
+ =
nr j r j
D d
j
nr
D d
j
r
m
d
D
d
D
q q
1 1
0
| o
54
Ph hp phn hi tch cc vs. tiu
cc
Phn hi tch cc c gi tr hn phn hi tiu cc.
v d, t = 0.75, = 0.25 c trng s ln
hn cho nhng phn hi tch cc.
Nhiu h thng ch h tr phn hi tch cc.
55
Ph hp phn hi: Cc gi thuyt
Khi no ph hp phn hi c th gip nng cao
y ?
Gi thuyt GT1: Ngi dng nm bt tt cc
thut ng trong b d liu thit lp truy vn
ban u.
Gi thuyt GT2: Nhng vn bn ph hp cha
nhng thut ng tng t (v vy c th hi vng
thu v nhng vn bn ph hp khc khc khi s
dng thng tin phn hi).
56
Ngoi l ca GT1
Gi thit GT1: Ngi dng bit tt v nhng
thut ng trong b d liu thit lp truy vn
ban u.
Ngoi l: S khng nht qun gia b t vng
ca ngi dng v cng c tm kim
V d: t / xe hi
57
Ngoi l ca GT2
GT2: Nhng vn bn ph hp c nhng thut ng
tng t.
V d ngoi l: [Cc chnh sch nh nc]
Mt vi mu khng lin quan
nh thu nh sn xut thuc l vs. tuyn truyn
khng ht thuc l
Tr cp cho ngi ngho vs. cho ngi ngho vay vn
Phn hi ph hp trn vn bn ni v thuc l s
khng gip ch trong tm kim vn bn ni v
ngi ngho.
58
Ph hp phn hi: nh gi
Ly mt o t bi ging trc, v.d., chnh
xc top 10: P@10
Tnh P@10 cho truy vn gc q
0
Tnh P@10 cho truy vn chnh sa sau phn
hi q
1
Trong hu ht cc trng hp: q
1
l tt hn q
0
rt nhiu!
Liu cch kim tra ny c cng bng?
59
nh gi kt qu ph hp phn hi
nh gi cng bng phi c thc hin trn b
d liu cha c kim tra bi ngi dng
Cc nghin cu cho thy ph hp phn hi
cng thnh cng theo cch nh gi ny.
Theo phng php thc nghim, ph hp phn
hi mt ln thng rt hu ch. Phn hi hai lt
c th c coi l hu ch.
60
nh gi kt qu: d on
nh gi mc hu ch phi c so snh vi
nhng phng php khc cng s dng mt
khong thi gian nh nhau.
Gii php thay th cho ph hp phn hi: Ngi
dng kim tra li v gi li truy vn.
Ngi dng c th a thch kim tra li/gi li
truy vn hn l nh gi s ph hp ca ti liu.
Khng c minh chng r rng rng ph hp phn
hi s dng thi gian ngi dng hiu qu nht.
61
Nhng vn vi ph hp phn hi
Ph hp phn hi c chi ph cao
Ph hp phn hi sinh ra truy vn di.
Chi ph x l truy vn di s cao hn.
Ngi dng s cm thy cng thng khi phi
cung cp phn hi tng minh.
Thng kh hiu v sao mt ti liu c th c
tr v sau khi p dng ph hp phn hi.
Cng c tm kim Excite c ph hp phn hi y
mt thi im nhng loi b n sau .
62
Ph hp phn hi gi lp
Ph hp phn hi gi lp t ng ha khu th cng ca
ph hp phn hi.
Thut ton ph hp phn hi gi lp:
Tr v mt danh sch c xp hng ca vn bn ph hp vi truy
vn
Gi thit rng k vn bn u tin ph hp.
Thc hin ph hp phn hi (v.d., Rocchio)
Lm vic rt tt khi xt trung bnh
Nhng c th rt sai cho mt vi truy vn.
Mt vi lt p dng phn hi c th lm x dch hon
ton truy vn.
63
Ph hp phn hi gi lp ti TREC4
H thng Cornell SMART
Nhng kt qu ch ra s vn bn ph hp trong s 100 vn bn u
tin cho 50 truy vn (nh vy tng s vn bn l 5000):
Nhng kt qu cho thy s khc bit gia hai phng php chun
ha di (L vs. l) v ph hp phn hi gi lp (PsRF).
Phng php ph hp phn hi gi lp c s dng ch thm 20
thut ng vo truy vn. (Rocchio s thm nhiu hn.)
Kt qu ny th hin rng ph hp phn hi gi lp hiu qu nu xt
trung bnh.
Phng php S vn bn ph hp
lnc.ltc 3210
lnc.ltc-PsRF 3634
Lnu.ltu 3709
Lnu.ltu-PsRF 4350
64
Ni dung chnh
Nhng nguyn tc c bn trong thit k giao
din tm kim
Ph hp phn hi
Tnh cp thit
Cn bn
Chi tit
M rng truy vn
65
M rng truy vn
M rng truy vn l mt phng php khc nhm tng
y .
Chng ta s dng thut ng m rng truy vn ton cc
vi ngha phng php ton cc nh ngha li truy vn.
Trong m rng truy vn ton cc, truy vn b thay i da
trn d liu ton cc, v.d., nhng d liu khng ph thuc
vo truy vn.
Thng tin c s dng ch yu l: (gn-)ng ngha
Mt cng b hoc CSDL gom nhng t ng ngha (hoc
gn) c gi l t in ng ngha.
Chng ta s xt n hai loi t in ng ngha: c bin
tp th cng v bin tp t ng.
66
V d m rng truy vn
67
Cc dng phn hi ngi dng
Ngi dng cung cp phn hi v vn bn
Ph bin hn trong ph hp phn hi
Ngi dng cung cp phn hi v t hoc cu
Ph bin hn trong m rng truy vn
68
Cc dng m rng truy vn
T in ng ngha th cng (c a vo s
dng bi nhng ngi bin tp, v.d., PubMed)
T in ng ngha c suy din t ng (v.d.,
da trn thng k ng xut hin)
S tng ng gia truy vn c xc nh da
trn khai ph lch s truy vn (ph bin trong mi
trng web)
69
M rng truy vn da trn t in
ng ngha
Vi mi t t trong truy vn m rng truy vn bng nhng t trong
danh sch tng ng trong t in ng ngha (c lin quan n t
v mt ngha).
V d cp trc : HOSPITAL MEDICAL
Thng tng y
C th gim ng k chnh xc vi nhng t a ngha
INTEREST RATE INTEREST RATE FASCINATE
S dng ph bin cho nhng cng c tm kim chuyn bit cho khoa
hc v cng ngh
Cn chi ph ln to th cng t in ng ngha v s dng, qun
l n.
T in ng ngha th cng c hiu qu gn tng ng vi ch
thch bng b t vng hu hn.
70
V d t in ng ngha th cng:
PubMed
71
Sinh t ng t in ng ngha
C gng sinh t in ng ngha mt cch t ng bng cch phn tch
c im phn b t trong vn bn
Khi nim nn tng: tng ng ng ngha gia hai t (ng ngha
hoc gn ngha)
nh ngha 1: Hai t gn ngha nu chng cng xut hin vi nhng t
ging nhau.
t xe my v c hai cng xut hin vi ng x, xng v giy
php, do hai t ny phi tng ng.
nh ngha 2: Hai t gn ngha nu chng xut hin theo mt quan h
ng php cho vi nhng t ging nhau.
Bn c th hi, p, n, v.v. to v l, nh vy to v l phi ging nhau v
mt ngha.
Xc nh ng xut hin th nhanh hn, nhng cc quan h ng php
th chun xc hn.
72
T in ng ngha da trn s
ng xut hin: V d
T Nhng hng xm gn nht
absolutely
bottomed
captivating
doghouse
makeup
mediating
keeping
lithographs
pathogens
senses
absurd whatsoever totally exactly nothing
dip copper drops topped slide trimmed
shimmer stunningly superbly plucky witty
dog porch crawling beside downstairs
repellent lotion glossy sunscreen skin gel
reconciliation negotiate case conciliation
hoping bring wiping could some would
drawings Picasso Dali sculptures Gauguin
toxins bacteria organisms bacterial parasite
grasp psyche truly clumsy naive innate
73
M rng truy vn trong nhng cng
c tm kim
Ti nguyn chnh m rng truy vn trong cng c tm
kim: lch s truy vn
V d 1: Sau khi cung cp truy vn [herbs], ngi dng
thng xuyn tm [herbal remedies].
herbal remedies c tm nng l truy vn m rng cho herb.
V d 2: Ngi dng tm kim [flower pix] thng xuyn
chn URL photobucket.com/flower. Ngi dng tm kim
[flower clipart] cng thng xuyn bm ln URL .
flower clipart v flower pix c tim nng l m rng ca
truy vn cn li.
74
Nhng ni dung chnh hm nay
Ph hp phn hi tng tc: Ci thin tp kt
qu ban u bng cch thng bo cho h thng
nhng vn bn ph hp / khng ph hp
Phng php ph hp phn hi c bit n
nhiu nht l phn hi Rocchio
M rng truy vn: Ci thin kt qu tm kim
bng cch thm vo nhng t ng ngha / c
lin quan n truy vn
D liu tra cu t lin quan: T in ng ngha bin
sa th cng hoc t ng, lch s truy vn
75
Ti liu tham kho
Chng 9 ca IIR
Ti nguyn ti a ch http://ifnlp.org/ir
Salton and Buckley 1990 (ti liu gc v ph hp
phn hi)
Spink, Jansen, Ozmultu 2000: Relevance feedback
at Excite
Schtze 1998: Automatic word sense
discrimination (describes a simple method for
automatic thesuarus generation)