You are on page 1of 5

Tuyn tp Bo co Hi ngh Sinh vin Nghin cu Khoa hc ln th 6

i hc Nng - 2008

NGHIN CU CC THUT TON NN D LIU THUT TON LZW


RESEARCH ALGORITHMS OF DATA COMPRESS LZW ALGORITHM SVTH: PHM TUN ANH
Lp: 05CCT2 Khoa Tin, Trng i hc S Phm

GVHD: ON DUY BNH


Khoa Tin, Trng i hc S Phm
TM TT C kh nhiu k thut nn d liu nh: dng m k hiu, m ng gi, m theo di, nn d liu vi m hnh ngun, k thut t in Trong s cc k thut trn th k thut t in l linh hot v hiu qu hn c. c bit l dng m LZ vi t in ng, v ph bin hn ht l phng php nn LZW. Bi bo co ny gii thiu mt s thut ton nn d liu v trnh by phng php nn LZW. ABSTRACT There are many method compress data, such as: use sysbol code, packed code, length code, compress data with source model and dictionary technonogy In that, dictionary technonogy is activityer and effectiver. Special is method use LZ with dynamic dictionary, and popular is method LZW compress. This report introduce some algorithm of data compression and execute LZW compress method.

1. M u: Trong cc lnh vc ca cng ngh thng tin vin thng hin nay, vic truyn ti tin tc l mt cng vic xy ra thng xuyn. Tuy nhin thng tin c truyn ti i thng rt ln, iu ny gy kh khn cho cng vic truyn ti: gy tn km ti nguyn mng, tiu ph kh nng ca h thng gii quyt vn , cc thut ton nn c ra i. Ban u vi phng php m ha lot di RLC (Run Length Coding), pht hin mt lot cc bt lp li. y l phng php n gin nht. Nguyn tc c bn ca phng php ny l pht hin mt k t c s ln xut hin lin tip vt qua mt ngng c nh no . Trong trng hp ny dy s c thay th bng 3 k t: K t th nht l k t c bit, thng bo dy tip l dy c bit. K t th hai ch s ln lp. K t th ba ch k t lp. Nh vy t tng ca phng php ny l thay th mt dy bng mt dy khc ngn hn tun theo mt ngng no , v thng thng ngng c gi tr l 4. K n l phng php Huffman, da vo m hnh thng k, tnh tn sut xut hin ca cc k t, ri gn cho cc k t c tn sut cao mt t m ngn, cc k t tn sut thp t m di. Phng php ny phi lu gi li bng m gn km cng vi d liu nn. Mt phng php nn hon ton khc l thut ton nn d liu theo t in c s (Dictionary-based compression) C 2 loi: M ha t in tnh (static dictionary coding) M ha t in ng (dynamic dictionary coding) C rt nhiu thut ton p dng k thut ny nh LZ77, LZK, LZSS, LZHnhng trong ni dung bi bo co ny, chng ta ch cp n hai thut ton chnh l: + Thut ton LZ78. + Thut ton LZW.

258

Tuyn tp Bo co Hi ngh Sinh vin Nghin cu Khoa hc ln th 6

i hc Nng - 2008

Jacob Ziv v Abraham Lempel m t k thut da trn t in bng m ha LZ77 v LZ78. tng da trn vic thay th 1 cm k t bng mt con tr, tr n v tr xut hin trc ca cm k t. LZW l m ha trong h LZ, hon thin hn LZ77-LZ78 v ang c s dng ph bin hin nay. V iu kin khng cho php nn bi bo co ch nu ra mt s thut ton nn d liu, nu mt s u nhc im v so snh lm ni bt phng php nn bng LZW. 2. Ni dung: Phng php m ha Huffman 2.1.1. Nguyn l: Nguyn l ca phng php Huffman l m ha cc bytes trong tp d liu ngun bng bin nh phn. N to m di bin thin l mt tp hp cc bits. y l phng php nn kiu thng k, nhng k t xut hin nhiu hn s c m ngn hn 2.1.2. Thut ton: Thut ton nn: Bc 1: Tm hai k t c trng s nh nht ghp li thnh mt, trng s ca k t mi bng tng trng s ca hai k t em ghp. Bc 2: Trong khi s lng k t trong danh sch cn ln hn mt th thc hin bc mt, nu khng th thc hin bc ba. Bc 3: Tch k t cui cng v to cy nh phn vi quy c bn tri m 0, bn phi m 1. Thut ton gii nn: Bc 1: c ln lt tng bit trong tp tin nn v duyt cy nh phn c xc nh cho n khi ht mt l. Ly k t l ghi ra tp gii nn. Bc 2: Trong khi cha ht tp tin nn th thc hin bc mt, ngc li th thc hin bc 3. Bc 3: Kt thc thut ton. Mt s nhng hn ch ca m Hufman: M Huffman ch thc hin c khi bit c tn sut xut hin ca cc k t. M Huffman ch gii quyt c d tha phn b k t. Huffman tnh i hi phi xy dng cy nh phn sn cha cc kh nng. iu ny i hi thi gian khng t do ta khng bit trc kiu d liu s c thc hin nn. Qu trnh gii nn phc tp do chiu di m khng bit trc cho n khi k t u tin c tm ra. Phng php m ha LZ78 Thay v thng bo v tr on vn lp li trong qu kh, m LZ78 nh s tt c cc on vn sao cho mi on ghi nhn s hiu on vn lp li trong qu kh cng vi mt k t m n lm cho on khc vi on trong qu kh. Nh vy mi on mi l mt on k t trong qu kh cng vi mt k t trong qu kh. Chnh v th on mi khc vi on c trong qu kh. V d: Gi s ta c on vn bn sau: aaabbabaabaaabab Theo thut ton LZ78 th chng c phn thnh cc on nh sau: Input A Aa b Ba baa baaa bab on output 1 0+a 2 1+a 3 0+b 4 3+a 5 4+a 6 5+a 7 4+b

259

Tuyn tp Bo co Hi ngh Sinh vin Nghin cu Khoa hc ln th 6

i hc Nng - 2008

Nh vy bn nn ca chng ta l: (0,a); (1,a); (0,b); (3,a); (4,a); (5,a); (4,b) Thut ton nn: Bc 1: c mt k t -> ch, on c gn bng 1, kt np k t vo t in, w=ch; Bc 2: While not eof(f) do Begin c tip k t tip theo w:= ww+ch; If w thuc t in then ww:=w; Else begin Code(w,j); Ghi j v ch vo tp nn. Thm w vo t in. End; End; Bc 3: Dng chng trnh. Thut ton gii nn Bc 1: c thng tin v t in c lu trong tp nn, tl:=false; Bc 2: while not eof(f) do Begin c byte tip theo -> b Decode(b,s,t); If tl=false then w:=w+s Else w:=ww+s; TIMCHU(w,t); If t=false then Begin Ghi s ra tp gii nn Thm s vo t in End Else Begin ww:=s; End; End; Bc 3: Dng chng trnh. nh gi: Ni chung thut ton LZ78 l mt thut ton nn vn bn kh tt, c thi gian chy chng trnh tng i nhanh tuy nhin kh nng tit kim cha c khai thc tt a. Phng php m ha LZW Thut ton ny l s chuyn giao ca thut ton LZ78. Nh chng ta bit thut ton LZ78, vic lu tr cc k t theo sau mi on thng gy lng ph v b nh nn hiu qu nn cha cao. Thut ton LZW qun l bng cch loi b k t sau mi on do u ra ca mi on ch cha con tr m thi. Thut ton ny lu tr bng vic chun b mt danh sch cc on bao gm rt nhiu k t trong u vo l mt bng ch ci no , n thc hin mt qu trnh m rng cc bng ch ci hay ni cch khc l n dng k t b sung biu din li cc chui ca k t chnh quy. nn LZW trn m ASCII 8 bits ta cn m rng bng ch ci bng cch dng 9 bits hay nhiu hn 256 k t b sung m m 9 bits cung cp c dng lu tr cc chui m c quyt nh t cc chui trong ngun tin. Thut ton s khng t hiu qu nn cao nu c nhng iu kin sau:

260

Tuyn tp Bo co Hi ngh Sinh vin Nghin cu Khoa hc ln th 6

i hc Nng - 2008

+ Ngun tin khng ng nht v c tnh d tha ca n thay i trong sut tp tin. + Ngun tin di mt cch ng k vt qu tm gii hn ca bng chui. Thut ton nn: Bc 1: Thng k to ra t in, ghi vo tp nn, t:=false; c k t u tin ->w Bc 2: While not eof(f) do Begin c mt k t ->ch If t=false then w:=w+ch Else Begin w:= ww+ch; t:=false; End; TIMCHU(w,tl); If tl=false then Begin Code(w,j); Ghi j ra tp nn. Thm w vo t in. w:=ch End Else Begin ww:=w; t:=true; End; End; Bc 3: Code(ch), Dng chng trnh. Thut ton gii nn: Bc 1: c thng tin t in trong tp nn, c byte tip theo, gii nn gn vo w, t=false; Bc 2: While not eof(f) do Begin c byte tip theo ->b Decode(b,s,t); If t=true then Begin For i:=1 to length(s) do Begin If t=false then w:=w+s(i) Else Begin w=ww+s(i); t:=false; End; TIMCHU(w,t); If t=false then Begin Thm vo t in; Ghi ra tp gii nn. w:=s(i) End; End; End; Else Begin

261

Tuyn tp Bo co Hi ngh Sinh vin Nghin cu Khoa hc ln th 6

i hc Nng - 2008

Ghi ra tp gii nn; w:=w+ w(i); Thm w vo t in End; End; Bc 3: Decode(b,s,t): ghi s ra tp gii nn. Dng chng trnh. 3. 3. Kt lun: Cc phng php khc kt qu m ha tr v l b i <i,S>; i l mt con tr ch s nguyn, S l mt chui. => cch tr v ny kh d tha, khng hiu qu. LZW khc phc c bng cch: Kt qu m ha tr v ch cha duy nht con tr ch s nguyn, loi b chui S theo sau. Thut ton LZW khc phc c s lng ph v b nh m cc thut ton trc khng tn dng c ht. ng thi khc phc c s cng nhc ca thut ton nn, gp phn lm thut ton nn tr nn mm do hn, c sc hp dn hn i vi ngi s dng. Bi bo co trnh by v mt s thut ton nn thng dng hin ny. Gip chng ta c c ci nhn tng qut v nn d liu. ng thi cng trnh by c v 2 thut ton LZ78 v LZW. C th ni y l thut ton tiu biu trong h thng m h LZ, l tin cho cc thut ton nn d liu tt hn sau ny. Bi ton ng dng ca em mi ch dng li vic nn d liu t file *TXT. Nhng khng dng li , em s c hng pht trin a c bi ton vo thc tin c th nn c d liu t nhiu ngun d liu hn. ng thi khng ngng hc hi, tip thu kin thc c th s dng nhng thut ton ci tin hn, hiu qu hn vo vic xy dng chng trnh gip chng trnh nn nhanh hn, t l nn cao hn. TI LIU THAM KHO [1] Gio trnh L thuyt th - PSG.TSKH. Trn Quc Chin 2002 (Lu hnh ni b) [2] Thut ton trong tin hc V c Thi NXB KHKT [3] Cm nang Thut ton Robert Sedgewick NXBKHKT [4] Gio trnh l thuyt m Nguyn L Anh, Nguyn Vn Xut, Phm Th Long Trng HDL ng 1997 [5] Text Compress NXB Prentice Hall, Englewood Cliffs Newjersey.

262

You might also like