Professional Documents
Culture Documents
Hash PDF
Hash PDF
1. Gii thiu:
a. Hon cnh:
Mt lp nhng bi ton rt c quan tm trong khoa hc my tnh ni chung v lp
trnh thi c ni ring, l x l xu chui. Trong lp bi ton ny, ngi ta thng rt hay
phi i mt vi mt bi ton: tm kim xu chui.
b. Pht biu bi ton:
i. Cho mt on vn bn, gm k t.
ii. Cho mt on mu, gm k t.
iii. My tnh cn tr li cu hi: on mu xut hin bao nhiu ln trong
on vn bn v ch ra cc v tr xut hin .
c. Thut ton:
C rt nhiu thut ton c th gii quyt bi ton ny. Ngi vit xin tm
tt 2 thut ton ph bin c dng trong cc k thi lp trnh:
i. Brute-force approach:
Vi mt cch tip cn trc tip, chng ta c th thu c thut ton
gii. Tuy nhin phc tp ca n l rt ln trong trng hp xu nht.
Thut ton brute-force so khp tt c cc v tr xut hin ca on mu
trong on vn bn. C th phc tp cho thut ton ny l ().
ii. Knuth-Morris-Pratt algorithm
Hay cn c vit tt l KMP, c pht minh vo nm 1974, bi
Donald Knuth, Vaughan Pratt v James H. Morris. Thut ton ny s
dng mt correction-array, l mt thut ton rt hiu qu, c phc
tp l ( + ).
d. Mc ch bi vit:
Trong bi vit ny, ngi vit ch tp trung vo mt thut ton. Tc gi xin gi thut ton
ny l Hash. Theo nh bn thn ngi vit nh gi, y l thut ton rt hiu qu c bit
l trong thi c. N hiu qu bi 3 yu t: tc thc thi, linh ng trong vic s dng (ng
dng hiu qu) v s n gin trong ci t.
u tin, ngi vit xin c trnh by v thut ton ny. Sau , ngi vit s trnh by
mt vi ng dng, cch s dng v pht trin thut ton Hash trong cc bi ton tin hc.
2. Thut ton Hash:
a. K hiu:
i. Tp hp cc ch ci c s dng: .
1
on vn bn: [1. . ],
on mu: [1. . ],
on con t n ca mt xu : [. . ]
Chng ta cn tm ra tt c cc v tr (1 + 1) tha mn:
[. . + 1] = .
b. M t thut ton:
ii.
iii.
iv.
v.
c. M chng trnh:
Chng trnh sau, ti vit bng ngn ng C++, l li gii cho bi trn h
thng chm bi trc tuyn VOJ.
#include <iostream>
#include <cstdio>
#include <cstring>
#define FOR(i,a,b) for(int i=a;i<=b;i++)
#define base 1000000003LL
#define ll long long
#define maxn 1000111
using namespace std;
ll POW[maxn],hashT[maxn];
ll getHashT(int i,int j) {
return (hashT[j]-hashT[i-1]*POW[j-i+1]+base*base)%base;
}
int main() {
string T,P;
cin >> T >> P;
int m=T.size(),n=P.size();
T=" "+T;P=" "+P;
POW[0]=1;
FOR(i,1,m) POW[i]=(POW[i-1]*26) % base;
FOR(i,1,m) hashT[i]=(hashT[i-1]*26+T[i]-'a') % base;
ll hashP=0;
FOR(i,1,n) hashP=(hashP*26+P[i]-'a') % base;
FOR(i,1,m-n+1) if(hashP==getHashT(i,i+n-1)) printf("%d ",i);
}
d. nh gi:
phc tp ca thut ton l ( + ). Nhng iu quan trng l: chng ta c th kim
tra 2 xu c ging nhau hay khng trong (1). y l iu to nn s linh ng cho thut
ton Hash. Ngoi s linh ng v tc thc thi, chng ta c th thy ci t thut ton ny
thc s rt n gin nu so vi cc thut ton x l xu khc.
3. ng dng:
Nh cp trn, thut ton ny s c trng hp chy sai. Tt nhin, bn cnh vic
s dng Hash, cn c nhiu thut ton x l xu chui khc, mang li s chnh xc tuyt i.
Ti tm gi nhng thut ton l thut ton chun. Vic ci t thut ton chun c th
mang li mt tc chy chng trnh cao hn, chnh xc ca chng trnh ln hn. Tuy
nhin, ngi lm bi s phi tr gi l s phc tp khi ci t cc thut ton chun .
S dng Hash khng ch gip ngi lm bi d dng ci t hn m quan trng ch:
Hash c th lm c nhng vic m thut ton chun khng lm c. Sau y, ti s xt
mt vi v d chng minh iu ny.
a. Longest palindrome substring
Bi ton t ra nh sau: Bn c cho mt xu di ( 50 000). Bn cn tm
di ca xu i xng di nht gm cc k t lin tip trong . (Xu i xng l xu c t 2
chiu ging nhau).
Mt thut ton chun khng th p dng vo bi ton ny l thut ton KMP. Ngoi
KMP ra, c 2 thut ton chun c th p dng c. Thut ton th nht l s dng
thut ton Manachar tnh bn knh i xng ti tt c v tr trong xu. Thut ton th 2
l s dng Suffix Array v LCP (Longest Common Prefix) cho xu c ni bi v xu
+
.
2
cch lm ny l ( log()).
b. k-th alphabetical cyclic
Bi ton t ra nh sau: Bn c cho mt dy 1 , 2 , , ( 50 000). Sp xp hon
v vng quanh ca dy ny theo th t t in. C th, cc hon v vng quanh ca dy ny
l (1 , 2 , , ), (2 , 3 , , , 1 ), (3 , 4 , , , 1 , 2 ),... Dy ny c th t t in nh hn
dy kia nu s u tin khc nhau ca dy ny nh hn dy kia. Yu cu bi ton l: In ra
dy c th t t in ln th .
Nu tip cn mt cch trc tip, chng ta s sinh ra tt c cc dy hon v vng quanh,
ri sau dng mt thut ton sp xp sp xp li chng theo th t t in, cui cng
ch vic in ra dy th sau khi sp xp. Tuy nhin phc tp ca thut ton ny l rt ln
v khng th p ng c yu cu v thi gian. C th, cch ny c phc tp l (2
log()), y l tch ca phc tp ca sp xp v phc tp ca mi php so snh dy.
Vn gi t tng l sp xp li tt c cc dy hon v vng quanh ri in ra dy ng v
tr th , chng ta c gng ci tin phc tp ca vic so snh th t t in ca 2 dy.
Nhc li nh ngha v th t t in ca 2 dy: Xt 2 dy v c cng s phn t. Gi
v tr th l v tr u tin t tri sang m . < < . Nh vy, ta phi tm
on tin t ging nhau di nht ca v , ri so snh k t tip theo. tm c on
tin t ging nhau di nht, ta c th s dng Hash kt hp vi chia nh phn.
5