You are on page 1of 36

Chapter 7

LOSSLESS COMPRESSION
ALGORITHMS
Contents
7.1. Introduction
7.2. Basics of Information Theory
7.3. Run Length Coding
7.4. Variable-Length Coding (VLC)
7.5. Huffman Coding
7.6. The Shannon-Fano Encoding Algorithm
7.7. Lempel-Ziv Encoding
7.8. Arithmetic Coding
7.9. Lossless Image Compression
1/11/2023 CH-7 2
7.1. Introduction
The Need for Compression
Take, for example, a video signal with resolution 320×240 pixels and 256 (8 bits) colors, 30 frames per second. Raw
bit rate = 320x240x8x30
= 18,432,000 bits
= 2,304,000 bytes = 2.3 MB
• A 90 minute movie would take 2.3x60x90 MB = 12.44 GB. Without compression, data storage and transmission
would pose serious problems! Figure 7.1 depicts a general data compression scheme, in which compression is
performed by an encoder and decompression is performed by a decoder. We call the output of the encoder codes
or code words. The intermediate medium could be either data storage or a communication/computer network. If
the compression and decompression processes induce no information loss, the compression scheme is lossless;
otherwise, it is Lossy.
• The next several chapters deal with Lossy compression algorithms as they are commonly used for image, video,
and audio compression. Here, we concentrate on lossless compression.
Figure 7.1 A general data compression scheme.

1/11/2023 CH-7 3
7.2. Basics of Information Theory
• Information theory is the scientific study of the quantification, storage, and communication of digital
information.
• A key measure in information theory is entropy.
• Entropy quantifies the amount of uncertainty involved in the value of a random variable or the outcome of a
random process.
• Information theory is the mathematical treatment of the concepts, parameters and rules governing the
transmission of messages through communication systems. It was founded by Claude Shannon toward the
middle of the twentieth century and has since then evolved into a vigorous branch of mathematics fostering the
development of other scientific fields, such as statistics, biology, behavioral science, neuroscience, and statistical
mechanics.
• The techniques used in information theory are probabilistic in nature and some view information theory as a
branch of probability theory.
• In a given set of possible events, the information of a message describing one of these events quantifies the
symbols needed to encode the event in an optimal way.
• ‘Optimal’ means that the obtained code word will determine the event unambiguously, isolating it from all others
in the set, and will have minimal length, that is, it will consist of a minimal number of symbols.
• Information theory also provides methodologies to separate real information from noise and to determine the
channel capacity required for optimal transmission conditioned on the transmission rate.
1/11/2023 CH-7 4
7.2. Basics of Information Theory…
• The foundation of information theory was laid in a 1948 paper by Shannon titled, “A Mathematical Theory of
Communication.” Shannon was interested in how much information a given communication channel could
transmit. In neuroscience, you are interested in how much information the neuron’s response can communicate
about the experimental stimulus.
• Information theory is based on a measure of uncertainty known as entropy (designated “H”). For example, the
entropy of the stimulus S is written H(S) and is defined as follows:
H(S)=−ΣSP(s)log2P(s)
• The subscript S underneath the summation simply means to sum over all possible stimuli S=[1, 2 … 8]. This
expression is called “entropy” because it is similar to the definition of entropy in thermodynamics. Thus, the
preceding expression is sometimes referred to as “Shannon entropy.”
• The entropy of the stimulus can be intuitively understood as “how long of a message (in bits) do I need to convey
the value of the stimulus?” For example, suppose the center-out task had only two peripheral targets (“left” and
“right”), which appeared with an equal probability. It would take only one bit (a 0 or a 1) to convey which target
appeared; hence, you would expect the entropy of this stimulus to be 1 bit. That is what the preceding expression
gives you, as P(S)=0.5 and log2(0.5)=−1. The center-out stimulus in the dataset can take on eight possible values
with equal probability, so you expect its entropy to be 3 bits.

1/11/2023 CH-7 5
7.2. Basics of Information Theory
• What is the entropy? In science, entropy is a measure of the disorder of a system. the more entropy, the more
disorder. Typically, we add negative entropy to a system when we impart more order to it.
For example, suppose we sort a deck of cards.
• (Think of a bubble sort for the deck—perhaps this is not the usual way you actually sort cards, though.) For every
decision to swap or not, we impart 1 bit of information to the card system and transfer 1 bit of negative entropy to the
card deck. Now suppose we wish to communicate those swapping decisions, via a network, say. If we had to make
two consecutive swap decisions, the possible number of
• Outcomes will be 4. If all outcomes have an equal probability of 1/4, then the number of bits to send is on average 4
× (1/4) × log 2 (1/(1/4)) = 2 bits—no surprise here.
• To communicate (transmit) the result so four two decisions, we would need to transmit 2 bits. But if the probability
for one of the outcomes were higher than the others, the average number of bits we’d send would be different. (This
situation might occur if the deck were already partially ordered, so that the probability of a not-swap were higher
than for a swap.) Suppose the probabilities of one of our four states were 1/2, and the other three states each had
probability 1/6 of occurring. To extend our modeling of how many bits to send on average, we need to go to
noninteger powers of 2 for probabilities. Then we can use a logarithm to ask how many (float) bits of Information
must be sent to transmit the information content. Equation(1) says that in this case, we’d have to send just(1/2) × log
2 (2) + 3 × (1/6) × log 2 (6) = 1.7925 bits, a value less than 2. This reflects the idea that if we could somehow encode
our four states, such that the most occurring one means fewer bits to send, we’ddo better (fewer bits) on average.
1/11/2023 CH-7 6
7.2. Basics of Information Theory
• According to the famous scientist Claude E.Shannon, of Bell Labs, the entropy η of an information source with
alphabet S = {s 1 ,s 2 ,...,s n } is defined as

• Equation—7.2.1.

• Equation—7.2.2.

• where p i is the probability that symbol s i in S will occur.


• The term log 21/p I indicates the amount of information (the so-called self-information defined by Shannon)
contained in s i , which corresponds to the number of bits 1 needed to encode s i .
For example, if the probability of having the character n in a manuscript is 1/32, the amount of information associated
with receiving this character is 5 bits. In other words, a character string nnn will require 15 bits to code.
This is the basis for possible data reduction in text compression, since it will lead to character coding
schemes different from the ASCII representation, in which each character requires at least 7 bits.

1/11/2023 CH-7 7
7.2. Basics of Information Theory
• we can use a variable-length coding scheme for entropy coding frequently occurring symbols are given codes
that are quickly transmitted, while infrequently occurring ones are given longer codes. For example, E occurs
frequently in English, so we should give it a shorter code than Q, for example.
• For if a symbol occurs rarely, its probability p i is low (e.g., 1/100), and thus its self-information log 21/p i= log 2
100 is a relatively large number.
• This reflects the fact that it takes a longer bitstring to encode it. The probabilities pi sitting outside the logarithm
in Eq. (7.2.2.) say that over a long stream, the symbols come by with an average frequency equal to the
probability of their occurrence. This weighting should multiply the long or short information content given by the
element of “surprise” in seeing a particular symbol.
Fig.7.2.1. Histograms for two gray-level images

1/11/2023 CH-7 8
7.2. Basics of Information Theory
• One wrinkle in the algorithm implied by Eq. (7.2.2.) is that if a symbol occurs with zero frequency, we simply
don’t count it into the entropy: we cannot take a log of zero.
• As another concrete example, if the information source S is a gray-level digital image, each s i is a gray-level
intensity ranging from 0 to (2 k − 1), where k is the number of bits used to represent each pixel in an
uncompressed image. The range is often [0,255], since 8 bits are typically used: this makes a convenient 1 byte
per pixel. The image histogram is a way of calculating the probability pi of having pixels with gray-level intensity
i in the image.
• Figure 7.2.1.a shows the histogram of an image with uniform distribution of gray-level intensities, that is, ∀i pi =
1/256. Hence, the entropy of this image is

• As can be seen in Eq. (7.3), the entropy η is a weighted sum of terms log21/pi ; hence, it represents the average
amount of information contained per symbol in the source S. For a memoryless source S, the entropy η represents
the minimum average number of bits required to represent each symbol in S. In other words, it specifies the lower
bound for the average number of bits to code each symbol in S.

1/11/2023 CH-7 9
7.2. Basics of Information Theory
• If we use Ī to denote the average length (measured in bits) of the codewords produced by the encoder, the
Shannon coding theorem states that the entropy is the best we can do (under certain conditions):
η≤ Ī
• Coding schemes aim to get as close as possible to this theoretical lower bound.
• It is interesting to observe that in the above uniform-distribution example we found that η = 8—the minimum
average number of bits to represent each gray-level intensity is at least 8. No compression is possible for this
image! In the context of imaging, this will correspond to the “worst case,” where neighboring pixel values have
no similarity. Figure7.2.2.b shows the histogram of another image, in which one-third of the pixels are rather dark
and two-third of them are rather bright. The entropy of this image is
• η =1/3· log2 3 +2/3· log23/2
• η = 0.33 × 1.59 + 0.67 × 0.59 = 0.52 + 0.40 = 0.92
• In general, the entropy is greater when the probability distribution is flat and smaller when it is more peaked.

1/11/2023 CH-7 10
7.2. Basics of Information Theory
Multimedia Data Compression
• Data compression is about finding ways to reduce the number of bits or bytes used to store or transmit
the content of multimedia data. It is the process of encoding information using fewer bits Eg. ZIP file
format. As with any communication, compressed data communication only works when both the
sender and receiver of the information understand the encoding scheme.
Is compression useful?
• Compression is useful because it helps reduce the consumption of resources, such as hard disk space
or transmission bandwidth.
– save storage space requirement
– speed up document transmission time
• On the downside,
– compressed data must be decompressed to be used, and
– this extra processing may be harmful to some applications. For instance, a compression scheme for video
may require expensive hardware for the video to be decompressed fast enough to be viewed as it’s being
decompressed.
– The option of decompressing the video in full before watching it may be inconvenient, and requires storage
space for the decompressed video.
1/11/2023 CH-7 11
7.2. Basics of Information Theory
Tradeoffs in Data Compression
The design of data compression schemes therefore involves trade-offs among various factors, including
– The degree of compression: to what extent the data should be compressed?
– The amount of distortion introduced: to what extent quality loss is tolerated.
– The computational resources required to compress and uncompressed the data: do we have enough
memory required for compressing and uncompressing the data?
Types of Compression
• Lossless Compression
• The original content of the data is not lost/changed when it is compressed (encoded). It is used mainly
for compressing symbolic data such as database records, spreadsheets, texts, executable programs,
etc.,
• Lossless compression can recover the exact original data after compression where exact replication of
the original is essential and changing even a single bit cannot be tolerated.
Examples: Run Length Encoding (RLE), Lempel Ziv (LZ), Huffman Coding.

1/11/2023 CH-7 12
7.2. Basics of Information Theory
Lossy Compression
• The original content of the data is lost to certain degree when compressed. For visual and audio data, some loss of
quality can be tolerated without losing the essential nature of the data. Lossy compression is used for image
compression in digital cameras like JPEG, audio compression like mp3. Video compression in DVDs with MPEG
format.
Figure 7. 2 Lossless and Lossy compression technique

1/11/2023 CH-7 13
7.2. Basics of Information Theory
Lossy and Lossless Compression
• GIF image files and WinZip use lossless compression. For this reason, zip software is popular for compressing program
and data files. Lossless compression does not lose any data in the compression process.
Lossless compression has advantages and disadvantages.
– The advantage is that the compressed file will decompress to an exact duplicate of the original file, mirroring its
quality.
– The disadvantage is that the compression ratio is not all that high, precisely because no data is lost.
To get a higher compression ratio — to reduce a file significantly beyond 50% — you must use lossy compression
Lossless vs. Lossy compression
• Lossless & lossy compression have become part of our everyday vocabulary due to the popularity of MP3 music file,
JPEG image file, MPEG video file. A sound file in WAV format, converted to a MP3 file will lose much data as MP3
employs a lossy compression. JPEG uses lossy compression, while GIF follows lossless compression techniques.
Example of lossless vs. lossy compression is the following string: 25.888888888.
✓ This string can be compressed as 25.9! 8 and interpreted as, “twenty five point 9 eights”, the original string is perfectly
recreated, just written in a smaller form.
In a Lossy system it can be compressed as 26
In which case, the original data is lost, at the benefit of a smaller file size. The above example is a very simple example
of run-length encoding.
1/11/2023 CH-7 14
7.3. Run Length Coding
• Data often contains sequences of identical bytes. By replacing these repeated byte sequences with the
number of occurrences, a substantial reduction of data can be achieved. In Run-length encoding, large
runs of consecutive identical data values are replaced by a simple code with the data value and length
of the run, i.e. (dataValue, LengthOfTheRun)
• This encoding scheme tries to tally occurrence of data value (Xi) along with its run length, i.e.(Xi ,
Length_of_Xi).
• It compress data by storing runs of data (that is, sequences in which the same data value occurs in
many consecutive data elements) as a single data value & count. For example, consider the following
image with long runs of white pixels (W) and short runs of black pixels (B).
WWWWWWWWWWBWWWWWWWWWBBBWWWWWWWWWWWW
• The RLE data compression algorithm, the compressed code is: 10W1B9W3B12W (Interpreted as ten
W’s, one B, nine W’s, three B’s, …)
• Original sequence: 111122233333311112222 can be encoded as: (1,4),(2,3),(3,6),(1,4),(2,4)
• Run-length encoding performs lossless data compression.

1/11/2023 CH-7 15
7.3. Run Length Coding
• Lossless vs. Lossy compression
Generally, the difference between the two compression techniques is that:
– Lossless compression schemes are reversible so that the original data can be reconstructed,
– Lossy schemes accept some loss of data in order to achieve higher compression.
• These Lossy data compression methods typically offer a three-way tradeoff between
– Computer resource requirement (compression speed, memory consumption)
– Compressed data size and
– Quality loss.
• Lossless compression is a method of reducing the size of computer files without losing any information. That means when you
compress a file, it will take up less space, but when you decompress it, it will still have the exact same information. The idea is
to get rid of any redundancy in the information, this is exactly what happens is used in ZIP and GIF files. This differs from
Lossy compression, such as in JPEG files, which loses some information that is not very noticeable.
• Common compression methods
– Statistical methods: It requires prior information about the occurrence of symbols
E.g. Huffman coding- Estimate probabilities of symbols, code one symbol at a time, shorter codes for symbols with high
probabilities.
– Dictionary-based coding: The previous algorithms (both entropy and Huffman) require the statistical knowledge. Dictionary
based coding, such as Lempel-Ziv (LZ), compression techniques do not require prior information to compress strings. Rather,
replace symbols with a pointer to dictionary entries.

1/11/2023 CH-7 16
7.4. Variable-Length Coding (VLC)
• Since the entropy indicates the information content in an information source S, it leads to a family of coding
methods commonly known as entropy coding methods. Variable-length coding (VLC) is one of the best known
such methods.

• In coding theory, a variable-length code is a code, which maps source symbols to a variable number of bits.
Variable-length codes can allow sources to be compressed and decompressed with zero error and still be read
back symbol by symbol.

• In this section we will discuss about the Shannon-Fano algorithm and Huffman coding.

1/11/2023 CH-7 17
7.5. Huffman Coding
• Developed in 1950s by David Huffman, widely used for text compression, multimedia code and message
transmission
Example 7.5.1. : The problem: Given a set of n symbols and their weights (or frequencies), construct a tree structure
(a binary tree for binary code) with the objective of reducing memory space and decoding time per symbol.
For instance, Huffman coding is constructed based on frequency of occurrence of letters in text documents.
• The output of the Huffman encoder is determined by the Model (probabilities). The higher the probability of
occurrence of the symbol, the shorter the code assigned to that symbol and vice versa. This will enable to easily
control the most frequently occurring symbols in a data and also reduce the time taken during decoding each
symbols.

1/11/2023 CH-7 18
7.5. Huffman Coding
• How to construct Huffman coding
Step 1: Create forest of trees for each symbol, t1, t2,… tn
Step 2: Sort forest of trees according to falling probabilities of symbol occurrence
Step 3: WHILE more than one tree exist DO
Merge two trees t1 and t2 with least probabilities p1 and p2
Label their root with sum p1 + p2
• Associate binary code: 1 with the right branch and 0 with the left branch
Step 4: Create a unique code word for each symbol by traversing the tree from the root to the leaf.
Concatenate all encountered 0s and 1s together during traversal
Example 7.5.2: Consider a 7-symbol alphabet given in the following table to construct the Huffman coding.

• The Huffman encoding algorithm picks each time two symbols (with the smallest frequency) to combine.

1/11/2023 CH-7 19
7.5. Huffman Coding
• How to construct Huffman coding

Huffman Coding Tree


• Using the Huffman coding a table can be constructed by working down the tree, left to right. This gives the binary
equivalents for each symbol in terms of 1s and 0s.

1/11/2023 CH-7 20
7.5. Huffman Coding
• How to construct Huffman coding
Example 7.5.3.

1/11/2023 CH-7 21
7.5. Huffman Coding
• How to construct Huffman coding

Example 7.5.4. : construct the tree & binary code by using Huffman coding

1/11/2023 CH-7 22
7.6. The Shannon-Fano Encoding Algorithm
Steps:
1. Calculate the frequency of each of the symbols in the list.
2. Sort the list in (decreasing) order of frequencies.
3. Divide the list into two half’s, with the total frequency counts of each half being as close as possible to each
other.
4. The right half is assigned a code of 1 and the left half with a code of 0.
5. Recursively apply steps 3 and 4 to each of the halves, until each symbol has become a corresponding code leaf on
the tree. That is, treat each split as a list and apply splitting and code assigning till you are left with lists of single
elements.
6. Generate code word for each symbol
Let us assume the source alphabet S={X1,X2,X3,Ö,Xn} and Associated probability P={P1,P2,P3 ,Ö,Pn}. The
steps to encode data using Shannon-Fano coding algorithm is as follows: Order the source letter into a sequence
according to the probability of occurrence in non-increasing order i.e. decreasing order.

1/11/2023 CH-7 23
7.6. The Shannon-Fano Encoding Algorithm
ShannonFano (sequence s)
• If s has two letters,
• Attach 0 to the codeword of one letter and 1 to the codeword of another; Else if s has more than two letter. Divide
s into two subsequences S1, and S2 with the minimal difference between probabilities of each subsequence;
extend the codeword for each letter in S1 by attaching 0, and by attaching 1 to each codeword for letters in S2;
• ShannonFano(S1);
• ShannonFano(S2);
Example 1: Given five symbols A to E with their frequencies being 15, 7, 6, 6 & 5; encode them using Shannon-Fano
entropy encoding

• Solution:
• Step1: Say, we are given that there are five symbols (A to E) that can occur in a source with their frequencies
being 15 7 6 6 and 5. First, sort the symbols in decreasing order of frequency.
1/11/2023 CH-7 24
7.6. The Shannon-Fano Encoding Algorithm
Example 1: Given five symbols A to E with their frequencies being 15, 7, 6, 6 & 5; encode them using Shannon-Fano
entropy encoding…
Step2: Divide the list into two equal halves. That is, the counts of both halves are as close as possible to each other.
Therefore, in this case we split the list between B and C & assign 0 and 1.
Step3: We recursively repeat the steps of splitting and assigning code until each symbol become a code leaf on the
tree. That is, treat each split as a list, apply splitting, and code assigning until you are left with lists of single
elements.
Step 4: Note that we split the list containing C, D and E between C and D because the difference between the split
lists is 11 minus 6, which is 5, if we were to have divided between D and E we would get a difference of 12-5
which is 7.
• Step5: We complete the algorithm and as a result have codes assigned to the symbols.

1/11/2023 CH-7 25
7.6. The Shannon-Fano Encoding Algorithm
• Example 2: Suppose the following source and with related probabilities S={A,B,C,D,E}
P={0.35,0.17,0.17,0.16,0.15} Message to be encoded=îABCDEî The probability is already arranged in non-
increasing order.
• First, we divide the message into AB and CDE. Why? This gives the smallest difference between the total
probabilities of the two groups.
S1={A,B} P={0.35,0.17}=0.52 S2={C,D,E} P={0.17,0.17,0.16}=0.46 The difference is only 0.52-0.46=0.06. This is
the smallest possible difference when we divide the message. Attach 0 to S1 and 1 to S2. Subdivide S1 into sub
groups. S11={A} attach 0 to this S12={B} attach 1 to this Again subdivide S2 into subgroups considering the
probability again. S21={C} P={0.17}=0.17 S22={D,E} P={0.16,0.15}=0.31 Attach 0 to S21 and 1 to S22. Since S22
has more than one letter in it, we have to subdivide it. S221={D} attach 0 S222={E} attach 1
Figure 7. 4 Shannon-Fano coding tree

1/11/2023 CH-7 26
7.6. The Shannon-Fano Encoding Algorithm
• The message is transmitted using the following code (by traversing the tree)
• A=00 B=01
• C=10 D=110
• E=111
• Instead of transmitting ABCDE, we transmit 000110110111.

1/11/2023 CH-7 27
7.7. Lempel-Ziv Encoding
• Data compression up until the late 1970’s mainly directed towards creating better methodologies for Huffman
coding. An innovative, radically different method was introduced in1977 by Abraham Lempel and Jacob Ziv. The
zip and unzip use the LZH technique while UNIX’s compress methods belong to the LZW and LZC classes.
Lempel-Ziv compression
• The problem with Huffman coding is that it requires knowledge about the data before encoding takes place.
Huffman coding requires frequencies of symbol occurrence before codeword is assigned to symbols
• In Lempel-Ziv compression not rely on previous knowledge about the data rather builds this knowledge in the
course of data transmission/data storage. Lempel-Ziv algorithm (called LZ) uses a table of code-words created
during data transmission; and it transmits the index of the symbol/word instead of the word itself. Each time it
replaces strings of characters with a reference to a previous occurrence of the string.
Lempel-Ziv Compression Algorithm
• The multi-symbol patterns are of the form: C0C1 . . . Cn-1 Cn. The prefix of a pattern consists of all the pattern
symbols except the last: C0C1 . . . Cn-1
Lempel-Ziv Output: there are three options in assigning a code to each symbol in the list
– If one-symbol pattern is not in dictionary, assign (0, symbol)
– If multi-symbol pattern is not in dictionary, assign (dictionaryPrefixIndex, lastPatternSymbol)
– If the last input symbol or the last pattern is in the dictionary, asign (dictionaryPrefixIndex, )
1/11/2023 CH-7 28
7.7. Lempel-Ziv Encoding
Eg: Encode (i.e., compress) the string ABBCBCABABCAABCAAB using the LZ algorithm.
Figure 7. 5 Lempel-Ziv algorithm.

1/11/2023 CH-7 29
7.7. Lempel-Ziv Encoding
• The compressed message is: (0,A)(0,B)(2,C)(3,A)(2,A)(4,A)(6,B)
Note: The above is just a representation, the commas and parentheses are not
transmitted
Consider the string ABBCBCABABCAABCAAB given in example 1. The compressed
string consists of codewords and the corresponding codeword index as shown below:
• Codeword: (0, A) (0, B) (2, C) (3, A) (2, A) (4, A) (6, B)
Codeword index: 1 2 3 4 5 6 7
The actual compressed message is: 0A0B10C11A010A100A110B where each character
is replaced by its binary 8-bit ASCII code.
1/11/2023 CH-7 30
7.7. Lempel-Ziv Encoding
Example: Decompression Decode (i.e., decompress) the sequence
(0, A) (0, B) (2, C) (3, A) (2, A) (4, A) (6, B)

1/11/2023 CH-7 31
7.8. Arithmetic Coding
• Arithmetic coding (AC) is a form of entropy encoding used in lossless data compression. Normally, a string of
characters is represented using a fixed number of bits per character, as in the ASCII code. An arithmetic coding
algorithm encodes an entire file as a sequence of symbols into a single decimal number. The input symbols are
processed one at each iteration. The interval derived at the end of this division process is used to decide the code
word for the entire sequence of symbols.
Example: Arithmetic coding of the word “BELBA”

1/11/2023 CH-7 32
7.8. Arithmetic Coding
Example: Arithmetic coding of the word “BELBA”
• UL= LL+d (ul)d (f) Where LL: lower limit, d (u, l) difference of upper and lower & d (f) is frequency of letter
For the first letter in B, the lower limit is zero, and the upper limit is 0.4. UL= LL+d (ul)d (f)
B =0 + (0.4 – 0) 0.4= 0+0.4*0.4=0.16
E= 0 + (0.4 – 0) 0.6= 0+0.4*0.6=0.24
L= 0 + (0.4 – 0) 0.8= 0+0.4*0.8=0.32
A= 0 + (0.4 – 0) 1= 0+0.4*1=0.4
Similar to others
• A message is represented by a half-open interval [a, b) where a and b are real numbers between a and 1. Initially,
the interval is [0, 1). When the message becomes longer, the length of the interval shortens, and the number of
bits needed to represent the interval increases. Suppose the alphabet is [A, B, C, D, E, F, $], in which $ is a
special symbol used to terminate the message, and the known probability distribution is listed below.

1/11/2023 CH-7 33
7.8. Arithmetic Coding
• Example:

1/11/2023 CH-7 34
7.9. Lossless Image Compression
• One of the most commonly used compression techniques in multimedia data compression is differential coding.
• The basis of data reduction in differential coding is the redundancy in consecutive symbols in a data stream.
Audio is a signal indexed by one dimension, time. Here we consider how to apply the lessons learned from audio
to the context of digital image signals ~hat are indexed by two, spatial, dimensions (x, y).
• Let’s consider differential coding in the context of digital images. In a sense, we move from signals with domain
in one dimension to signals indexed by numbers in two dimensions (x, y) – the rows and columns of an image.
Later, we’ll look at video signals. These are even more complex, in that they are indexed by space and time (x, y,
t). Because of the continuity of the physical world, the gray-level intensities (or color) of background and
foreground objects in images tend to change relatively slowly across the image frame. Since we were dealing
with signals in the time domain for audio, practitioners generally refer to images as signals in the spatial doma’in.
The generally slowly changing.
Lossless JPEG
• Lossless IPEG is a special case of the JPEG image compression. It differs drastically from other IPEG modes in
that the algorithm has no lossy steps. Thus we treat it here and consider the more used JPEG methods in Chapter
9. Lossless JPEG is invoked when the user selects a 100% quality factor in an image tool. Essentially, lossless
IPEG is included in the JPEG compression standard simply for completeness. The following predictive method is
applied on the unprocessed original image (or each color band of the original color image). It essentially involves
two steps: forming a differential prediction and encoding.
1/11/2023 CH-7 35
Review Questions
1. Given the following symbols and their corresponding frequency of occurrence, find an
optimal binary code for compression

I. Using the Huffman algorithm


II. Using Entropy coding scheme
2. Encode (i.e., compress) the following strings using the Lempel-Ziv algorithm.
ABBCDBBBDBCCBCCB
3. Encode using Arithmetic coding for the word “HELLO”
4. Encode this by using RLE
a. 4444666667777779999999
b. MMMEEEEDDDDIIIIIIIAAAAAA

1/11/2023 CH-7 36

You might also like