CHAPTER 7
Multimedia Data Compression
By: Moti B.
1/6/2025 1
What is Compression?
Compression
• Is a process of deriving more compact (i.e., smaller) representations of data.
• The process of coding that will effectively reduce the total number of bits
needed to represent certain information.
• Representation of file in fewest number of bits and as accurate as possible
Goal of Compression
Remove redundancy
Reduce irrelevance
Constraints on Compression
• Perfect or near-perfect reconstruction (lossless/lossy)
1/6/2025 2
con’t…
• Strategies for Compression
Reducing redundancies
Exploiting (manipulating) the characteristics of human vision
• If compression and decompression processes induce no information loss, then the
compression scheme is lossless; otherwise, it is lossy.
• Lossy does not necessarily mean loss of quality. In fact the output could be “better” than
the input.
• Drop random noise in images (dust on lens)
• Drop background in music
• Fix spelling errors in text. Put into better form.
• Writing is the art of lossy text compression 3
1/6/2025
Why Compression?
• To reduce the volume of data to be transmitted (text, fax, images)
• To reduce the bandwidth required for transmission and to reduce storage
requirements (speech, audio, video)
Compression
• How is compression possible?
• Redundancy in digital audio, image, and video data
• Properties of human perception
• Digital audio is a series of sample values; image is a rectangular array of pixel
values; video is a sequence of images played out at a certain rate
1/6/2025 4
Data Compression
• Data compression is the reduction or elimination of redundancy in data representation in
order to achieve savings in storage, speed up file transfer and decrease costs for storage
hardware & network.
• Compression or Data compression is used to reduce the size of one or more files. When a file
is compressed, it takes up less disk space than an uncompressed version & can be transferred
to other systems more quickly. There are two types of data compression.
1/6/2025 5
Types of Compression
• Lossless data compression
• Lossless data compression make use of data compression algorithms that allows the
exact original data to be reconstructed from the compressed data
• Original can be recovered exactly. Higher quality, bigger.
• Lossless compression for legal and medical documents, computer programs
• Exploit only data redundancy
• Error free compression
• Lossy data compression
• Lossy data compression, which does not allow the exact original data to be
reconstructed from the compressed data.
• Only an approximation of the original can be recovered. Lower quality, smaller.
• Digital audio, image, video where some errors or loss can be tolerated
• Exploit both data redundancy and human perception properties
•1/6/2025
Error containing compression 6
Con’t…
Summary
Lossless Lossy
Original data and the data after Original data and the data after
compression and decompression are compression and decompression are
exactly the same. not exactly the same
Redundant data is removed in Used for compressing images and
compression and added during video files (our eyes cannot
decompression distinguish subtle changes, so lossy
data is acceptable).
Lossless methods are used when we lossy algorithm removes information
can’t afford to lose any data: legal that it cannot later restore.
and medical documents, computer
programs.
Used to compress text, images, Lossy algorithms are used to
sound and programs compress still images, video and
audio.
1/6/2025 7
Lossless Compression
Common methods to remove redundancy
• Basics of Information Theory
• Run Length coding
• Huffman Coding and etc
1/6/2025 8
Information Theory
Information theory is a branch of science that is concerned with quantifying
information.
• Information theory is defined to be the study of efficient coding and its
consequences.
• An interface between modeling and coding
• It is the field of study concerned about the storage and transmission of data.
• It is concerned with source coding and channel coding.
Source coding: involves compression
Channel coding: how to transmit data, how to overcame noise, etc
• Data compression may be viewed as a branch of information theory in which the
primary objective is to minimize the amount of data to be transmitted.
1/6/2025 9
Entropy
• The measure of information of a set is known as the Shannon entropy or entropy.
• Entropy is a measure of the number of specific ways in which a system may be
arranged commonly understood as a measure of the disorder of a system.
• The change in information before and after the split is known as the information gain.
• The entropy η of an information source with alphabet S = {s1,s2,...,sn} is defined as
• Where pi is the probability that symbol si in S will occur.
• The term indicates the amount of information (the so-called self-
information) 10
1/6/2025
Con’t…
• If all outcomes have an equal probability of 1/4, then the number of bits to send is
on average 4 × (1/4) × log2(1/(1/4)) = 2 bits. To communicate (transmit) the results
of our two decisions, we would need to transmit 2 bits.
• Example : Calculate the entropy of the following diagram
The histogram of an image with uniform distribution of gray-level intensities, that is,
∀i pi = 1/256. Hence, the entropy of this image is:
1/6/2025 11
Run Length coding (RLC)
• Run-length coding is one of the simplest forms of data compression.
• It can be used to compress data made of any combination of symbols.
• It does not need to know the frequency of occurrence of symbols and can be very
efficient if data is represented as 0s and 1s.
• The general idea behind this method is to replace consecutive repeating occurrences
of a symbol by one occurrence of the symbol followed by the number of
occurrences.
• The method can be even more efficient if the data uses only two symbols (for
example 0 and 1) in its bit pattern and one symbol is more frequent than the other.
1/6/2025 12
Con’t…
Compress repeated 'runs' of the same character by storing the length of that run, and
provide a function to reverse the compression.
Input: WWWWWWWWWWWWBWWWWWWWWWWWWBBBWWWWWW
WWWWWWWWWWWWWWWWWWBWWWWWWWWWWWWWW
Output: 12W1B12W3B24W1B14W
Run-Length encoding for 2 symbol.
1/6/2025 13
Example
Original bit stream :
• 000000111111111111110000000000000111111111
• Size is : 42 bits because we have 6 zeros, 14 ones, 13 zeros and 9
ones(6+14+13+9)
• The compressed bit stream is:
• 0:6, 1:14, 0:13,1:9
• Resulting 5 bits (to make similar)
• 00110 11110 01101 11001compressed bits
• Size is: 20bits
1/6/2025 14
Huffman Coding
• The Huffman Coding is a lossless data compression algorithm, developed by David
Huffman in the early of 50s while he was a PhD student at MIT.
• The algorithm is based on a binary-tree frequency-sorting method that allow
encode any message sequence into shorter encoded messages and a method to
reassemble into the original message without losing any data.
• The algorithm is based on the frequency of occurrence of the data item(byte).
• The most frequent data items will represented and encoded with a lower
number of bits.
• The main idea of the algorithm is create a binary tree, called Huffman tree, based
on the bytes frequency on the data, where the leafs are the bytes symbols, and the
path from the root to a leaf determines the new representation of that leaf byte.
1/6/2025 15
Building the Huffman Tree
• Each node of the tree are represented with a byte symbol and the frequency of that byte
on the data.
• The creation of the Huffman tree have the following steps:
1. Scan the data and calculate the frequency of occurrence of each byte;
2. Insert those nodes into a reverse priority queue based on the frequencies(a lowest
frequency is given highest priority);
3. Start a loop until the queue is empty;
4. Remove two nodes from the queue and combine them into a internal node with the
frequency equal to the sum of the two nodes frequencies;
5. Insert the two nodes removed from the queue as children of the created internal node;
6. Insert the created internal node into the queue;
7. The last node remaining on the queue is the root of the tree.
1/6/2025 16
Con’t…
• Using the text HELLO as example and applying those steps, we have the
following tree:
1/6/2025 17
Con’t…
• Huffman coding assigns shorter codes to symbols that occur more frequently
and longer codes to those that occur less frequently.
• For example, imagine we have a text file that uses only five characters (A, B, C, D, E).
• Before we can assign bit patterns to each character, we assign each character a
weight based on its frequency of use.
• In this example, assume that the frequency of the characters is as shown in Table
below.
1/6/2025 18
Con’t…
1/6/2025 19
Con’t…
A B C D E
00 010 011 10 11
1/6/2025 20
Con’t…
1/6/2025 21
Example 2
1/6/2025 22
Con’t…
• Step 1
1 1 1 1 2 2 3 3 5
A G M T E H _ I S
1/6/2025 23
Con’t…
• Step 2
2 2
1 1 1 1 2 2 3 3 5
A G M T E H _ I S
1/6/2025 24
Con’t…
• Step 3
2 2 4
1 1 1 1 2 2 3 3 5
A G M T E H _ I S
1/6/2025 25
Con’t…
• Step 4
2 2 4
1 1 1 1 2 2 3 3 5
A G M T E H _ I S
1/6/2025 26
Con’t…
• Step 5
2 2 4 6
1 1 1 1 2 2 3 3 5
A G M T E H _ I S
1/6/2025 27
Con’t…
• Step 6
4 4
2 2 2 2 6
E H
1 1 1 1 3 3 5
A G M T _ I S
1/6/2025 28
Con’t…
• Step 7
8 11
4 4 6 5
2 2 2 2 3 3
E H _ I
1 1 1 1
A G M T
1/6/2025 29
Con’t…
• Step 8
19
8 11
4 4 6 5
2 2 2 2 3 3
E H _ I
1 1 1 1
A G M T
1/6/2025 30
Con’t…
• Label edges with 0 and 1
19
0 1
8 11
0 1 0 1
4 4 6 5
0 1 0 1 0 1 S
2 2 2 2 3 3
0 1 0 1 E H _ I
1 1 1 1
A G M T
1/6/2025 31
Con’t…
• Huffman code & encoded message
1/6/2025 32
Lossy Compression
Our eyes and ears cannot distinguish subtle changes. In such cases, we can use a
lossy data compression method.
These methods are cheaper—they take less time and space when it comes to
sending millions of bits per second for images and video.
Several methods have been developed using lossy compression techniques. JPEG
(Joint Photographic Experts Group) encoding is used to compress pictures and
graphics, MPEG (Moving Picture Experts Group) encoding is used to compress
video, and MP3 (MPEG audio layer 3) for audio compression.
1/6/2025 33
Image Compression Standards
• JPEG (Joint Photographic Experts Group)
• An image compression standard
• Accepted as an international standard in 1992.
• A lossy image compression method by using DCT(Discrete Cosine Transform)
• Useful when image contents change relatively slowly
• Human less to notice loss of very high spatial frequency component
• Visual acuity is much greater for gray than for color.
1/6/2025 34
Exercise
1. Compress the following bits of stream by using Run Length Encoding
11111111111100000000000111110000000000000011111111
2. Based on the following data given for you
A. Build the Huffman tree
B. Find the code word
C. Check the difference between using Huffman code and not to use
1/6/2025 35
THE END!
1/6/2025 36