You are on page 1of 6

LZW nn

Nn LZW c t theo tn nh pht trin ca n, A. Lempel v Ziv J., vi nhng thay i sau bi Terry A. Welch. y l k thut quan trng nht cho nn d liu tng hp ng mc ch n gin v tnh linh hot ca n. Thng thng, bn c th mong i LZW nn vn bn, m thc thi, v cc tp tin d liu tng t v mt na kch thc ban u ca h. LZW cng thc hin tt khi trnh by vi d liu tp tin cc k d tha, chng hn nh s lp bng, my tnh m ngun, v tn hiu thu thp . T l nn ca 5:01 l ph bin i vi cc trng hp ny. LZW l c s ca mt s tin ch my tnh c nhn m yu cu bi thng "tng gp i cng sut ca a cng ca bn. " LZW l nn lun lun c s dng trong cc tp tin hnh nh GIF, v c cung cp nh mt ty chn trong TIFF v PostScript. Nn LZW c bo v theo M bng sng ch s 4558302, cp 10 Thng 12, 1985 Sperry Tng cng ty (nay l Tng cng ty Unisys). i vi thng tin v cp php thng mi, lin h: Cc Welch cp php, Lut S, M/SC2SW1, Tng cng ty Unisys, Blue Bell, bang Pennsylvania, 19424-0001. LZW nn s dng mt bng m , nh minh ha trong hnh. 27-6. Mt s la chn ph bin l cung cp 4096 mc trong bng . Trong trng hp ny, d liu c m ha LZW bao gm ton b 12 bit, m s, mi cp n mt trong cc mc trong bng m .Uncompression l t c bng cch mi m t cc tp tin nn, v dch n qua bng m tm thy nhng g nhn vt hoc cc k t n i din . M s 0-255 trong bng m l lun lun c giao i din byte duy nht t cc tp tin u vo . V d, nu ch c 256 m u tin ny c s dng, mi byte trong file gc s c chuyn i thnh 12 bit trong cc tp tin c m ha LZW, kt qu trong mt kch thc tp tin ln hn 50%. Trong qu trnh uncompression, mi m 12 bit s c dch thng qua cc bng m tr li vo cc byte duy nht. Tt nhin, iu ny s khng l mt tnh trng hu ch .

Phng php LZW t c nn bng cch s dng m s 256 n 4095 i din cho chui cc byte. V d, m 523 c th i din cho chui ba byte: 231 124 234. Mi ln cc thut ton nn cuc gp g ny trnh t trong cc tp tin u vo, m 523 c t trong file m ha . Trong qu trnh uncompression, m 523 l dch thng qua bng m ti to li trnh t 3 byte ng. Trnh t c gn cho mt m duy nht, v thng xuyn hn trnh t c lp i lp li, nn cao hn t c.

Mc d y l mt phng php n gin, c hai tr ngi ln cn phi vt qua: (1) lm th no xc nh nhng trnh t trong bng m, v (2) lm th no cung cp cc chng trnh uncompression cng mt bng m c s dng bi nn chng trnh.Cc thut ton LZW tinh xo gii quyt c nhng vn ny. Khi chng trnh LZW bt u m ha mt tp tin, bng m cha ch 256 mc u tin, vi phn cn li ca bng trng. iu ny c ngha l cc m u tin i su vo cc tp tin nn ch n gin l cc byte duy nht t cc tp tin u vo c chuyn i sang 12 bit. Khi m ha tip tc, cc thut ton LZW xc nh trnh t lp i lp li trong d liu, v thm chng vo bng m . Nn bt u mt chui thi gian th hai l gp phi. im mu cht l mt chui t tp tin u vo l khng c thm vo bng m cho n khi n c t trong tp tin nn nh nhng nhn vt c nhn (m s 0 n 255) . iu ny rt quan trng v n cho php cc chng trnh uncompression ti to li bng m trc tip t cc d liu nn, m khng cn phi truyn ti cc bng m ring bit.

Hnh 27-7 cho thy mt s cho nn LZW. Bng 27-3 cung cp cc chi tit bc theo cc bc cho mt v d tp tin u vo bao gm 45 byte, chui vn bn ASCII: ma / / / Ty Ban Nha / ng / ch yu l / / / ng bng. Khi chng ta ni rng cc thut ton LZW c cc k t "a" t cc tp tin u vo, chng ti c ngha l n c gi tr: 01100001 (97 th hin trong 8 bit), trong 97 l " a "trong ASCII. Khi chng ta ni n vit k t "a" tp tin m ha, chng ti c ngha l n vit: 000001100001 (97 th hin trong 12 bit ).

Cc thut ton nn s dng hai bin: CHAR v STRING . Bin, CHAR , nm gi mt nhn vt duy nht, ngha l, mt gi tr byte duy nht gia 0 v 255 . Bin, STRING , l mt chui chiu di thay i, tc l, mt nhm ca mt hoc nhiu k t, vi mi nhn vt l mt byte duy nht . Trong hp 1 ca hnh. 27-7, chng trnh

bt u bng cch ly cc byte u tin t cc tp tin u vo, v t n trong cc , bin STRING. Bng 27-3 cho thy hnh ng ny trong dng 1 . Tip theo l cc thut ton lp cho mi byte b sung trong tp tin u vo, kim sot trong s dng chy bng hp 8. Mi khi mt byte c c t tp tin u vo (hp 2), n c lu tr trong cc bin, CHAR . Bng d liu sau c tm kim xc nh nu ni ca hai bin, STRING + CHAR , c ch nh mt m (hp 3). Nu mt trn u trong bng m khng tm thy, ba hnh ng c thc hin, nh th hin trong hp 4, 5 & 6 . Trong hp 4, m 12 bit tng ng vi cc ni dung ca cc, bin STRING , c ghi vo tp tin nn. Trong hp 5, mt m mi c to ra trong bng ni ca STRING + CHAR . Trong hp 6, cc bin, STRING , c gi tr, bin CHAR . Mt v d v nhng hnh ng ny c th hin trong dy chuyn 2 n 10 trong Bng 27-3, trong 10 byte u tin ca file v d . Khi mt trn u trong bng m When a match in the code table cis tm thy (hp 3), ni ca found (box 3), the concatenation ofSTRING + CHARSTRING+CHAR c lu tr trong cc, bin is stored in the variable, STRINGSTRING , m khng c bt k hnh ng khc ang din ra (box 7). l, nu mt trnh t ph hp c tm thy trong bng, khng c hnh ng no cn c thc hin trc khi xc nh nu c mt trnh t ph , without any other action taking place (box 7). That is, if a matching sequence is found in the table, no action should be taken before determining if there is a hplonger cn trong bng . matching sequence also in the table. An example of this is shown in line 11, where the sequence: STRING+CHAR = in, is identified as already having a code in the table. In line 12, the next character from the input file, /, is added to the sequence, and the code table is searched for: in/. Since this longer sequence is not in the table, the program adds it to the table, outputs the code for the shorter sequence that is in the table (code 262), and starts over searching for sequences beginning with the character, /. This flow of events is continued until there are no more characters in the input file. The program is wrapped up with the code corresponding to the current value of STRING being written to the compressed file (as illustrated in box 9 of Fig. 27-7 and line 45 of Table 27-3). A flowchart of the LZW uncompression algorithm is shown in Fig. 27-8. Each code is read from the compressed file and compared to the code table to provide the translation. As each code is processed in this manner, the code table is updated so that it continually matches the one used during the compression. However, there is a small complication in the uncompression routine. There are certain combinations of data that result in the uncompression algorithm receiving a code that does not yet exist in its code table. This contingency is handled in boxes 4,5 & 6. Only a few dozen lines of code are required for the most elementary LZW programs. The real difficulty lies in the efficient management of the code table. The brute force approach results in large memory requirements and a slow program execution. Several tricks are used in commercial LZW programs to improve their performance. For instance, the memory problem

You might also like