You are on page 1of 2

Implementation of Lempel-Ziv algorithm for lossless compression using VHDL 275

Implementation of Lempel-Ziv algorithm for


lossless compression using VHDL

Prof. Minaldevi K. Tank


HOD – Digital Electronics, Babasaheb Gawde Institute of Technology, Mumbai, India.

1. Introduction 2.1 Lossless compression


In computer science and information theory, data compres- Lossless compression algorithms usually exploit statisti-
sion or source coding is the process of encoding informa- cal redundancy in such a way as to represent the sender’s
tion using fewer bits than an unencoded representation data more concisely without error. Lossless compression
would use, through use of specific encoding schemes. As is possible because most real-world data has statistical
with any communication, compressed data communica- redundancy.
tion only works when both the sender and receiver of the
information understand the encoding scheme. For example, 2.2 Lossy compression
this text makes sense only if the receiver understands that it
It is also known as perceptual coding, it is possible if some
is intended to be interpreted as characters representing the
loss of fidelity is acceptable. Generally, a lossy data com-
English language. Similarly, compressed data can only be
pression will be guided by research on how people perceive
understood if the decoding method is known by the receiv-
the data in question. For example, the human eye is more
er. Compression is useful because it helps reduce the con-
sensitive to subtle variations in luminance than it is to varia-
sumption of expensive resources, such as hard disk space or
tions in color. JPEG image compression works in part by
transmission bandwidth. On the downside, compressed data
“rounding off” some of this less-important information.
must be decompressed to be used, and this extra process-
Lossy data compression provides a way to obtain the best
ing may be detrimental to some applications. For instance,
fidelity for a given amount of compression. In some cases,
a compression scheme for video may require expensive
transparent (unnoticeable) compression is desired; in other
hardware for the video to be decompressed fast enough to
cases, fidelity is sacrificed to reduce the amount of data as
be viewed as its being decompressed (the option of decom-
much as possible.
pressing the video in full before watching it may be incon-
Lossless compression schemes are reversible so that the
venient, and requires storage space for the decompressed
original data can be reconstructed, while lossy schemes ac-
video). The design of data compression schemes therefore
cept some loss of data in order to achieve higher compres-
involves trade-offs among various factors, including the de-
sion. However, lossless data compression algorithms will
gree of compression, the amount of distortion introduced (if
always fail to compress some files; indeed, any compres-
using a lossy compression scheme), and the computational
sion algorithm will necessarily fail to compress any data
resources required to compress and uncompress the data.
containing no discernible patterns. Attempts to compress
data that has been compressed already will therefore usu-
2. What Is compression?
ally result in an expansion, as will attempts to compress all
Data compression enables devices to transmit or store but the most trivially encrypted data An example of loss-
the same amount of data in fewer bits. The Compression less and lossy compression is the string: 25.888888888.
is briefly classified in two types lossless and lossy This string can be compressed as: 25.[9]8. Interpreted as,
compression “twenty five point 9 eights”, the original string is perfectly

S.J. Pise (ed.), ThinkQuest 2010, DOI 10.1007/978-81-8489-989-4_51,


© Springer India Pvt. Ltd. 2011
276 Prof. Minaldevi K. Tank

recreated, just written in a smaller form. In a lossy system, 5. Lempel-Ziv-Welch


using 26 instead, the exact original data is lost, at the benefit
LZW (Lempel-Ziv-Welch) is the one that is most com-
of a smaller file size.
monly used in practice. The algorithm is used to encode
byte streams (i.e., each message is a byte). The algorithm
3. The Lempel-Ziv Algorithm
maintains a dictionary of strings (sequences of bytes). The
The Lempel-Ziv (LZ) compression methods are among dictionary is initialized with one entry for each of the 256
the most popular algorithms for lossless storage.The possible byte values—these are strings of length one. As the
Lempel-Ziv algorithms compress by building a diction- algorithm progresses it will add new strings to the diction-
ary of previously seen strings. Unlike PPM which uses ary such that each string is only added if a prefix one byte
the dictionary to predict the probability of each character, shorter is already in the dictionary. For example, John is
and codes each character separately based on the context, only added if Joh had previously appeared in the message
the Lempel-Ziv algorithms code groups of characters of sequence. Each entry of the dictionary is given an index,
varying lengths. The original algorithms also did not use where these indices are typically given out incrementally
probabilities strings were either in the dictionary or not starting at 256.
and all strings in the dictionary were give equal probabi-
lity. Some of the newer variants, such as gzip, do take 6. LZ78 encoding and decoding.
some advantage of probabilities. At the highest level
The basic idea is to parse the input sequence into non-over-
the algorithms can be described as follows. Given a posi-
lapping blocks of different lengths while constructing a
tion in a file, look through the preceeding part of the file
dictionary of blocks seen thus far.
to find the longest match to the string starting at the
current position, and output some code that refers to that
6.1 Encoding
match. Now move the finger past the match. The two
main variants of the algorithm were described by Ziv and A dictionary is initialized to contain the single-character
Lempel in two separate papers in 1977 and 1978, and strings corresponding to all the possible input characters
are often refered to as LZ77 and LZ78. The algorithms (and nothing else except the clear and stop codes if they’re
differ in how far back they search and how they find being used). The algorithm works by scanning through
matches. The LZ77 algorithm is based on the idea of a the input string for successively longer substrings until it
sliding window. The algorithm only looks for matches finds one that is not in the dictionary. When such a string
in a window a fixed distance back from the current posi- is found, the index for the string less the last character (i.e.,
tion. Gzip, ZIP, and V.42b is (a standard modem protocal) the longest substring that is in the dictionary) is retrieved
are all based on LZ77. TheLZ78 algorithm is based on a from the dictionary and sent to output, and the new string
more conservative approach to adding strings to the dict- (including the last character) is added to the dictionary with
ionary. Unixcompress, and the Gif format are both based the next available code. The last input character is then used
on LZ78. as the next starting point to scan for substrings. In this way,
successively longer strings are registered in the dictionary
4. Why VHDL? and made available for subsequent encoding as single out-
put values. The algorithm works best on data with repeated
4.1 Using the same language it is possible to simulate as patterns, so the initial parts of a message will see little com-
well as design a complex logic. pression. As the message grows, however, the compression
4.2 Design reuse is possible ratio tends asymptotically to the maximum.
4.3 Design can be described at various levels of abstrac-
tions. 6.2 Decoding
4.4 It provides for modular design and testing.
4.5 The use of VHDL has tremendously reduced the The decoding algorithm works by reading a value from
“Time to Market “for large and small design. the encoded input and outputting the corresponding string
4.6 VHDL designs are portable across synthesis across from the initialized dictionary. At the same time it obtains
synthesis and simulation tools, which adhere to the the next value from the input, and adds to the dictionary the
IEEE 1076 standard. concatenation of the string just output and the first character
4.7 Using VHDL makes the design device independent. of the string obtained by decoding the next input value. The
4.8 The design description can be targeted to PLD, ASIC, decoder then proceeds to the next input value (which was
FPGA very easily. already read in as the “next value” in the previous pass) and
4.9 Designer has very little control at gate level. repeats the process until there is no more input, at which
4.10 The logic generated for the same description may vary point the final input value is decoded without any more ad-
from tool to tool. This may be due to algorithm used ditions to the dictionary. In this way the decoder builds up
by the tools, which might be proprietary. a dictionary which is identical to that used by the encoder,

You might also like