You are on page 1of 75

IT4853 Tm kim v trnh din thng tin

Chng 7: Trnh din thng tin



Nguyn B Ngc

2
Ni dung chnh ca bi ging ny
Nhng nguyn tc c bn trong thit k giao din tm kim
Ph hp phn hi tng tc: ngi dng tng tc vi h
thng ci thin tp kt qu c tr v bng cch ch ra
nhng vn bn no ph hp / khng ph hp
Phng php ph hp phn hi ph bin nht: Phn hi
Rocchio
M rng truy vn: ci thin kt qu tm kim bng cch
thm vo t ng ngha / t lin quan vi t truy vn
D liu c s xc nh t lin quan: T in ng ngha bin
son th cng hoc t ng, lch s truy vn
3
Ni dung chnh
Nhng nguyn tc c bn trong thit k giao
din tm kim
Ph hp phn hi
Tnh cp thit
Cn bn
Chi tit
M rng truy vn
4
Ni dung chnh
Nhng nguyn tc c bn trong thit k giao
din tm kim
Ph hp phn hi
Tnh cp thit
Cn bn
Chi tit
M rng truy vn
5
M hnh tm kim thng tin c in
6
M hnh Berry-picking
Ngi dng hc trong qu trnh tm kim
ko theo nhng thay i truy vn cn thit
ko theo s thay i nhu cu thng tin
t nhu cu ny ko theo nhu cu khc
Nhu cu thng tin khng c tha mn bi mt
tp hp vn bn n l
thc t l bi nhng mnh thng tin tm thy trong
qu trnh tm kim
7
Nhng thao tc tm kim thng tin
D tm
c lt mc khi qut cao
la chn thng tin xem hoc s dng nh truy vn
Vit truy vn
cung cp nhng d liu mi
Di chuyn
i theo mt chui lin kt
l mt dy cc thao tc c lt v la chn
Duyt
Di chuyn mt cch ngu nhin, khng nh hng
8
Hnh trnh ca ngi dng ... Di chuyn qua nhiu
thao tc v hng ti mt ch chung l tha mn
nhu cu thng tin (sau Bates 89)
9
im bt u tm kim
Ngi dng bt u tm kim nh th no?
khng phi vi truy vn di, c th
thng vi truy vn ngn, ni tip bi c lt kt
qu v thay i truy vn
tr nn quen thuc vi b d liu, ngn ng truy vn,
v.v.
H thng cn hng ngi dng n im khi
u ng
10
Danh sch nhng b d liu
Nhng h thng lu tr truyn thng bt u vi
la chn t mt danh sch b d liu
ngi dng phi hc k nng nhn bit nhng b d
liu cn thit
Trn web, mt cng thng tin c th gi mt
danh sch nhng cng c tm kim
Cn thng tin tng quan
11
Cy th mc tng quan
Cung cp mt cu trc
phn cp cho b d liu
Rt ph bin trn web
Khi duyt cc th mc
c th b mt phng
hng
D.4 OPERATING SYSTEM (C)
D.4.0 General
D.4.2 Storage Management
Allocation/deallocation strategies
Distributed memory
Garbage collection (NEW)
Main memory
Secondary storage
Segmentiation**
Storage hierarchies
Swapping
Virtual memory
12
V d MeSHBrowse
13
Giao din tm kim
Nhu cu thng tin ca ngi dng l khng chnh xc
Ngi dng c th khng bit lm sao tm c thng
tin h cn
Giao din ngi-my c vai tr tr gip ngi dng hiu
v din t nhu cu thng tin
thit lp truy vn
la chn ti liu ph hp
phn tch kt qu tr v
theo di tin trnh tm kim
v.v.
14
Nhng nguyn tc thit k
Gi phn hi giu thng tin
H tr kh nng kim sot qu trnh tng tc
Ngi dng c kh nng kim sot phng thc v tnh hung
a ra phn hi
D dng hy b nhng thao tc thc hin
Khng yu cu ngi dng ghi nh nhiu
Cung cp nhng giao din chuyn bit cho ngi dng
mi v ngi dng c kinh nghim
15
Gi phn hi giu thng tin
H thng tm kim cn cho php phn hi v
Mi lin quan gia truy vn v kt qu tr v
Quan h gia nhng vn bn c tr v
Gia vn bn v siu d liu m t b d liu
Ngi dng cn c kh nng iu chnh mc phn hi

16
Kh nng kim sot
Ngi dng mun kim sot hnh vi ca h thng
Ngi dng cn phi l ngi bt u tng tc thay v
ngi phn ng
H thng cn chnh
nhng hnh vi gy bt ng
nhng tnh hung t nht khng th chnh
mt kh nng a ra hnh ng
V d: so sanh nhng giao din modal v non-modal
17
D dng hy b nhng thao tc
Bt k thao tc no cng cn c kh nng hy b
Kh nng quay li gip ngi dng a ra quyt nh
nhanh hn
Khuyn khch ngi dng khm ph cc chc nng mi
nh ngha r n v quay lui
tng thao tc n l, hoc mt nhm thao tc
18
Khng nn yu cu ngi dng ghi
nh nhiu
Khng nn lm qu ti kh nng ghi nh ca ngi dng
ngi bnh thng c th ghi nh 7 2 thng tin.
Tr gip ngi dng theo di nhng la chn tm kim
cho php ngi dng thay i gii php tm kim
lu li ng cnh v thng tin lin quan n cc phin tm kim
Cung cp nhng ng cnh c th duyt
gi nhng thut ng/siu d liu tm kim c lin quan
nhng im bt u bao gm danh sch ti nguyn, ch
19
Nhng giao din chuyn bit
S cn bng gia tnh n gin v kh nng
Giao din n gin: d s dng, nhng kh m t nhu
cu thng tin
Giao din phc tp: kh s dng, nhng cho php m
t nhu cu thng tin chi tit hn v tm nhanh hn
Cc giao din chuyn bit
i vi ngi dng mi: giao din n gin, d hc vi
nhng chc nng c bn
Chuyn gia c th i su hn v c nhiu kim sot,
chc nng, nhiu ty chn hn.
20
Ni dung chnh
Nhng nguyn tc c bn trong thit k giao
din tm kim
Ph hp phn hi
Tnh cp thit
Cn bn
Chi tit
M rng truy vn
21
Tng y tm kim bng cch
no?
Hai phng php ci thin y : ph hp phn hi
v m rng truy vn
V d, xt truy vn q: [ t] . . .
. . . v vn bn d cha x hp, nhng khng cha t t
Mt h thng tm kim n gin s khng tr v d cho
truy vn q.
K c nu d l ph hp nht vi truy vn q
Chng ta mun gii quyt tnh hung ny:
Tr v vn bn ph hp d khng c thut ng no trong
truy vn (gc)
22
y
Trong bi ging ny, mc ch chnh l: tng s
vn bn ph hp tr v cho ngi dng
C th dn ti gim y , v d, khi m
rng t vi xng du
. . . c th s loi b mt vi vn bn ph hp, nhng
tng s vn bn ph hp c tr v nhng trang
u
23
Nhng la chn ci thin y

Cc b: Thc hin phn tch cc b truy vn
ngi dng theo nhu cu
Phng php cc b ch yu: ph hp phn hi
Ni dung th nht
Ton cc: Thc hin phn tch ton cc mt ln
(v.d., ton b d liu) xc nh t in ng
ngha
S dng t in ng ngh phc v m rng truy vn
Ni dung th hai
24
Ph hp phn hi: tng c bn
Ngi dng cung cp mt truy vn (ngn, n gin).
Cng c tm kim tr v mt tp vn bn.
Ngi dng nh du mt vi vn bn l ph hp hoc
khng ph hp.
Cng c tm kim tnh ton mt biu din mi ca nhu cu
thng tin. Mong i: s tt hn truy vn hin ti.
Cng c tm kim chy truy vn mi v tr v nhng kt
qu mi.
Nhng kt qu mi c mong i s c y cao hn.
25
Ph hp phn hi
Chng ta c th lp quy trnh ny: mt vi chu
k phn hi.
Chng ta s s dng thut ng tm kim t do
(ad hoc retrieval) vi ngha l tm kim thng
thng khng c ph hp phn hi.
Chng ta s xem xt ba v d ph hp phn hi
nhn mnh vo nhng kha cnh khc nhau
ca qu trnh phn hi thng tin ph hp.
26
Ni dung chnh
Nhng nguyn tc c bn trong thit k giao
din tm kim
Ph hp phn hi
Tnh cp thit
Cn bn
Chi tit
M rng truy vn
27
Ph hp phn hi: V d 1
Kt qu tr v cho truy vn: bike
28
Phn hi t ngi dng: La chn
nhng thng tin ph hp
29
Kt qu sau khi p dng thng tin
phn hi
30
V d khng gian vec-t: truy vn
canine (1)
Ti liu gc:
Fernando Daz
31
S ph hp ca vn bn vi truy vn
canine
Ti liu gc:
Fernando Daz
32
Phn hi: ngi dng la chn
nhng vn bn ph hp
Ti liu gc:
Fernando Daz
33
Kt qu sau khi p dng thng tin
phn hi
Ti liu gc:
Fernando Daz
34
V d 3: Mt v d tm kim vn bn
Truy vn ban u:
*new space satellite applications+ kt qu tm kim truy vn u: (r = hng)
r
+ 1 0.539 NASA Hasnt Scrapped Imaging Spectrometer
+ 2 0.533 NASA Scratches Environment Gear From Satellite Plan
3 0.528 Science Panel Backs NASA Satellite Plan, But Urges Launches of
Smaller Probes
4 0.526 A NASA Satellite Project Accomplishes Incredible Feat: Staying
Within Budget
5 0.525 Scientist Who Exposed Global Warming Proposes Satellites for
Climate Research
6 0.524 Report Provides Support for the Critics Of Using Big Satellites
to Study Climate
7 0.516 Arianespace Receives Satellite Launch Pact From Telesat
Canada
+ 8 0.509 Telecommunications Tale of Two Companies

Sau ngi dng nh du vn bn ph hp bng du +.
35
M rng truy vn sau phn hi
Truy vn: *new space satellite applications+
2.074 new 15.106 space
30.816 satellite 5.660 application
5.991 nasa 5.196 eos
4.196 launch 3.972 aster
3.516 instrument 3.446 arianespace
3.004 bundespost 2.806 ss
2.790 rocket 2.053 scientist
2.003 broadcast 1.172 earth
0.836 oil 0.646 measure
So snh vi
nguyn mu
36
Kt qu cho truy vn m rng
r
2* 1 0.513 NASA Scratches Environment Gear From Satellite Plan
1* 2 0.500 NASA Hasnt Scrapped Imaging Spectrometer
3 0.493 When the Pentagon Launches a Secret Satellite, Space
Sleuths Do Some Spy Work of Their Own
4 0.493 NASA Uses Warm Superconductors For Fast Circuit
8* 5 0.492 Telecommunications Tale of Two Companies
6 0.491 Soviets May Adapt Parts of SS-20 Missile For
Commercial Use
7 0.490 Gaping Gap: Pentagon Lags in Race To Match the
Soviets In Rocket Launchers
8 0.490 Rescue of Satellite By Space Agency To Cost $90 Million
37
Ni dung chnh
Nhng nguyn tc c bn trong thit k giao
din tm kim
Ph hp phn hi
Tnh cp thit
Cn bn
Chi tit
M rng truy vn
38
Khi nim trng tm (centroid) cho
ph hp phn hi
Trng tm ca mt tp im c xc nh tng t khi
tm (trng tm) ca vt rn.
Nhc li: Chng ta biu din vn bn nh nhng im
trong mt khng gian a chiu.
Nh vy c th tnh trng tm ca mt tp vn bn.
nh ngha:



Trong D l tp vn bn, l biu din vec-t ca vn
bn d.

e
=
D d
d
D
D

| |
1
) (
d

39
V d: trng tm
40
Thut ton Rocchio
Thut ton Rocchio cho ph hp phn hi trong m hnh
khng gian vec-t.
Rocchio la chn truy vn cc i ha hm


D
r
: tp vn bn ph hp; D
nr
: tp vn bn khng ph
hp
xu hng: ~q
opt
l vec-t tch bit ti a vn bn ph hp
v khng ph hp.
S dng mt vi gi thuyt chng ta c th vit li
nh sau:
))] ( , cos( )) ( , [cos(
max arg
nr r
q
opt
D q D q q

=
41
Thut ton Rocchio
Vec-t truy vn ti u l:




Chng ta dch chuyn trng tm ca nhng vn bn ph
hp theo s khc bit gia hai trng tm.
42
Bi tp: Tnh vec-t Rocchio
Cc vng trn: nhng vn bn ph hp, Xs: nhng vn bn khng
ph hp

43
Minh ha thut ton Rocchio
: Trng tm ca nhng vn bn ph hp
44
Minh ha Rocchio

khng tch bit nhng vn bn ph hp / khng ph hp.

45
Minh ha Rocchio

trng tm ca nhng vn bn khng ph hp.

46
Minh ha Rocchio
47
Minh ha Rocchio

- hiu vec-t
48
Minh ha Rocchio

cng vec-t khc bit vo
49
Minh ha Rocchio

ly
50
Minh ha Rocchio

phn bit vn bn ph hp / khng ph hp rt hiu qu.
51
Minh ha Rocchio

tch bit vn bn ph hp / khng ph hp rt hiu qu.
52
Thut ng
Chng ta vn s dng tn Rocchio cho phin bn
tt hn v mt l thuyt so vi phin bn Rochio
gc.
S ci t thc s c s dng trong hu ht
trng hp l ci t trong SMART chng ti
s dng tn Rocchio (khng ghi ch) cho gii
thut .
53
Gii thut Rocchio 1971 (SMART)
q
m
: vec-t truy vn thay i; q
0
: vec-t truy vn gc;
D
r
v D
nr
: tp vn bn ph hp v khng ph hp bit;
, , v : l cc trng s
Truy vn mi di chuyn v pha vn bn ph hp v ri xa
nhng vn bn khng ph hp.
S cn bng gia vs. /: nu chng ta c nhiu vn
bn thm nh, chng ta mun s dng / cao hn.
t trng s m bng 0.
trng s m ca thut ng khng c ngha trong m hnh
khng gian vec-t.
c s dng trong thc t:

e e
+ =
nr j r j
D d
j
nr
D d
j
r
m
d
D
d
D
q q


1 1
0
| o
54
Ph hp phn hi tch cc vs. tiu
cc
Phn hi tch cc c gi tr hn phn hi tiu cc.
v d, t = 0.75, = 0.25 c trng s ln
hn cho nhng phn hi tch cc.
Nhiu h thng ch h tr phn hi tch cc.
55
Ph hp phn hi: Cc gi thuyt
Khi no ph hp phn hi c th gip nng cao
y ?
Gi thuyt GT1: Ngi dng nm bt tt cc
thut ng trong b d liu thit lp truy vn
ban u.
Gi thuyt GT2: Nhng vn bn ph hp cha
nhng thut ng tng t (v vy c th hi vng
thu v nhng vn bn ph hp khc khc khi s
dng thng tin phn hi).
56
Ngoi l ca GT1
Gi thit GT1: Ngi dng bit tt v nhng
thut ng trong b d liu thit lp truy vn
ban u.
Ngoi l: S khng nht qun gia b t vng
ca ngi dng v cng c tm kim
V d: t / xe hi
57
Ngoi l ca GT2
GT2: Nhng vn bn ph hp c nhng thut ng
tng t.
V d ngoi l: [Cc chnh sch nh nc]
Mt vi mu khng lin quan
nh thu nh sn xut thuc l vs. tuyn truyn
khng ht thuc l
Tr cp cho ngi ngho vs. cho ngi ngho vay vn
Phn hi ph hp trn vn bn ni v thuc l s
khng gip ch trong tm kim vn bn ni v
ngi ngho.
58
Ph hp phn hi: nh gi
Ly mt o t bi ging trc, v.d., chnh
xc top 10: P@10
Tnh P@10 cho truy vn gc q
0
Tnh P@10 cho truy vn chnh sa sau phn
hi q
1
Trong hu ht cc trng hp: q
1
l tt hn q
0

rt nhiu!
Liu cch kim tra ny c cng bng?
59
nh gi kt qu ph hp phn hi
nh gi cng bng phi c thc hin trn b
d liu cha c kim tra bi ngi dng
Cc nghin cu cho thy ph hp phn hi
cng thnh cng theo cch nh gi ny.
Theo phng php thc nghim, ph hp phn
hi mt ln thng rt hu ch. Phn hi hai lt
c th c coi l hu ch.
60
nh gi kt qu: d on
nh gi mc hu ch phi c so snh vi
nhng phng php khc cng s dng mt
khong thi gian nh nhau.
Gii php thay th cho ph hp phn hi: Ngi
dng kim tra li v gi li truy vn.
Ngi dng c th a thch kim tra li/gi li
truy vn hn l nh gi s ph hp ca ti liu.
Khng c minh chng r rng rng ph hp phn
hi s dng thi gian ngi dng hiu qu nht.
61
Nhng vn vi ph hp phn hi
Ph hp phn hi c chi ph cao
Ph hp phn hi sinh ra truy vn di.
Chi ph x l truy vn di s cao hn.
Ngi dng s cm thy cng thng khi phi
cung cp phn hi tng minh.
Thng kh hiu v sao mt ti liu c th c
tr v sau khi p dng ph hp phn hi.
Cng c tm kim Excite c ph hp phn hi y
mt thi im nhng loi b n sau .
62
Ph hp phn hi gi lp
Ph hp phn hi gi lp t ng ha khu th cng ca
ph hp phn hi.
Thut ton ph hp phn hi gi lp:
Tr v mt danh sch c xp hng ca vn bn ph hp vi truy
vn
Gi thit rng k vn bn u tin ph hp.
Thc hin ph hp phn hi (v.d., Rocchio)
Lm vic rt tt khi xt trung bnh
Nhng c th rt sai cho mt vi truy vn.
Mt vi lt p dng phn hi c th lm x dch hon
ton truy vn.
63
Ph hp phn hi gi lp ti TREC4
H thng Cornell SMART
Nhng kt qu ch ra s vn bn ph hp trong s 100 vn bn u
tin cho 50 truy vn (nh vy tng s vn bn l 5000):





Nhng kt qu cho thy s khc bit gia hai phng php chun
ha di (L vs. l) v ph hp phn hi gi lp (PsRF).
Phng php ph hp phn hi gi lp c s dng ch thm 20
thut ng vo truy vn. (Rocchio s thm nhiu hn.)
Kt qu ny th hin rng ph hp phn hi gi lp hiu qu nu xt
trung bnh.
Phng php S vn bn ph hp
lnc.ltc 3210
lnc.ltc-PsRF 3634
Lnu.ltu 3709
Lnu.ltu-PsRF 4350
64
Ni dung chnh
Nhng nguyn tc c bn trong thit k giao
din tm kim
Ph hp phn hi
Tnh cp thit
Cn bn
Chi tit
M rng truy vn
65
M rng truy vn
M rng truy vn l mt phng php khc nhm tng
y .
Chng ta s dng thut ng m rng truy vn ton cc
vi ngha phng php ton cc nh ngha li truy vn.
Trong m rng truy vn ton cc, truy vn b thay i da
trn d liu ton cc, v.d., nhng d liu khng ph thuc
vo truy vn.
Thng tin c s dng ch yu l: (gn-)ng ngha
Mt cng b hoc CSDL gom nhng t ng ngha (hoc
gn) c gi l t in ng ngha.
Chng ta s xt n hai loi t in ng ngha: c bin
tp th cng v bin tp t ng.
66
V d m rng truy vn
67
Cc dng phn hi ngi dng
Ngi dng cung cp phn hi v vn bn
Ph bin hn trong ph hp phn hi
Ngi dng cung cp phn hi v t hoc cu
Ph bin hn trong m rng truy vn
68
Cc dng m rng truy vn
T in ng ngha th cng (c a vo s
dng bi nhng ngi bin tp, v.d., PubMed)
T in ng ngha c suy din t ng (v.d.,
da trn thng k ng xut hin)
S tng ng gia truy vn c xc nh da
trn khai ph lch s truy vn (ph bin trong mi
trng web)
69
M rng truy vn da trn t in
ng ngha
Vi mi t t trong truy vn m rng truy vn bng nhng t trong
danh sch tng ng trong t in ng ngha (c lin quan n t
v mt ngha).
V d cp trc : HOSPITAL MEDICAL
Thng tng y
C th gim ng k chnh xc vi nhng t a ngha
INTEREST RATE INTEREST RATE FASCINATE
S dng ph bin cho nhng cng c tm kim chuyn bit cho khoa
hc v cng ngh
Cn chi ph ln to th cng t in ng ngha v s dng, qun
l n.
T in ng ngha th cng c hiu qu gn tng ng vi ch
thch bng b t vng hu hn.
70
V d t in ng ngha th cng:
PubMed
71
Sinh t ng t in ng ngha
C gng sinh t in ng ngha mt cch t ng bng cch phn tch
c im phn b t trong vn bn
Khi nim nn tng: tng ng ng ngha gia hai t (ng ngha
hoc gn ngha)
nh ngha 1: Hai t gn ngha nu chng cng xut hin vi nhng t
ging nhau.
t xe my v c hai cng xut hin vi ng x, xng v giy
php, do hai t ny phi tng ng.
nh ngha 2: Hai t gn ngha nu chng xut hin theo mt quan h
ng php cho vi nhng t ging nhau.
Bn c th hi, p, n, v.v. to v l, nh vy to v l phi ging nhau v
mt ngha.
Xc nh ng xut hin th nhanh hn, nhng cc quan h ng php
th chun xc hn.
72
T in ng ngha da trn s
ng xut hin: V d
T Nhng hng xm gn nht
absolutely
bottomed
captivating
doghouse
makeup
mediating
keeping
lithographs
pathogens
senses
absurd whatsoever totally exactly nothing
dip copper drops topped slide trimmed
shimmer stunningly superbly plucky witty
dog porch crawling beside downstairs
repellent lotion glossy sunscreen skin gel
reconciliation negotiate case conciliation
hoping bring wiping could some would
drawings Picasso Dali sculptures Gauguin
toxins bacteria organisms bacterial parasite
grasp psyche truly clumsy naive innate
73
M rng truy vn trong nhng cng
c tm kim
Ti nguyn chnh m rng truy vn trong cng c tm
kim: lch s truy vn
V d 1: Sau khi cung cp truy vn [herbs], ngi dng
thng xuyn tm [herbal remedies].
herbal remedies c tm nng l truy vn m rng cho herb.
V d 2: Ngi dng tm kim [flower pix] thng xuyn
chn URL photobucket.com/flower. Ngi dng tm kim
[flower clipart] cng thng xuyn bm ln URL .
flower clipart v flower pix c tim nng l m rng ca
truy vn cn li.
74
Nhng ni dung chnh hm nay
Ph hp phn hi tng tc: Ci thin tp kt
qu ban u bng cch thng bo cho h thng
nhng vn bn ph hp / khng ph hp
Phng php ph hp phn hi c bit n
nhiu nht l phn hi Rocchio
M rng truy vn: Ci thin kt qu tm kim
bng cch thm vo nhng t ng ngha / c
lin quan n truy vn
D liu tra cu t lin quan: T in ng ngha bin
sa th cng hoc t ng, lch s truy vn
75
Ti liu tham kho
Chng 9 ca IIR
Ti nguyn ti a ch http://ifnlp.org/ir
Salton and Buckley 1990 (ti liu gc v ph hp
phn hi)
Spink, Jansen, Ozmultu 2000: Relevance feedback
at Excite
Schtze 1998: Automatic word sense
discrimination (describes a simple method for
automatic thesuarus generation)

You might also like