You are on page 1of 24

Unit1: Introduction to Data

Compression
Data Compression(CA209)
by
Zohaib Hasan Khan
Assistant Professor
Department of Electronics and Communication Engg.
Integral University, Lucknow
Content
• UNIT-I: Introduction to Compression Techniques: Loss less
compression, Lossy Compression, Measures of performance,
Modeling and coding, Mathematical Preliminaries for Lossless
compression.
• Introduction to Information Theory and Models: Physical
models, Probability models, Markov models.

2 February 2021 2
What is Data Compression?
• Data Compression = Modeling + Coding
• data compression consists of taking a stream of symbols
and transforming them into codes. If the compression is
effective, the resulting stream of codes will be smaller than
the original symbols.
• The decision to output a certain code for a certain symbol
or set of symbols is based on a model.
• The model is simply a collection of data and rules used to
process input symbols and determine which code(s) to
output.
2 February 2021 3
Other Definitions
• Data compression is the process of converting an input data stream
(the source stream or the original raw data) into another data
stream (the output, the bitstream, or the compressed stream) that
has a smaller size. A stream is either a file or a buffer in memory.
• The field of data compression is often called source coding. We
imagine that the input symbols (such as bits, ASCII codes, bytes,
audio samples, or pixel values) are emitted by a certain information
source and have to be coded before being sent to their destination.
The source can be memoryless, or it can have memory.

2 February 2021 4
Need of Compression
• Why Data Compression?
– There are two practical motivations for compression:
• Make optimal use of limited storage space (Reduction of storage
requirements)
• Save time and help to optimize resources
– If compression and decompression are done in I/O processor,
less time is required to move data to or from storage
subsystem, freeing I/O bus for other work
– In sending data over communication line: less time to transmit
and less storage to host
2 February 2021 5
Data Compression
• Data compression, source coding, or bit-rate reduction is the process of
encoding information using fewer bits than the original representation.
Any particular compression is either lossy or lossless.
• Lossless compression reduces bits by identifying and eliminating statistical
redundancy. No information is lost in lossless compression.
• Lossy compression reduces bits by removing unnecessary or less
important information.
• Typically, a device that performs data compression is referred to as an
encoder, and one that performs the reversal of the process
(decompression) as a decoder.

2 February 2021 6
Data Compression contd…
• In compression technique or compression algorithm,
we are actually referring to two algorithms.
• There is the compression algorithm that takes an input
X and generates a representation Xc that requires
fewer bits, and there is a reconstruction algorithm
(decompression algorithm) that operates on the
compressed representation Xc to generate the
reconstruction Y.

2 February 2021 7
Data Compression contd…

Fig.1. Compression and Reconstruction


2 February 2021 8
Process of Data Compression

2 February 2021 9
• Based on the requirements of reconstruction,
data compression schemes can be divided into
two broad classes:
• lossless compression schemes, in which Y is
identical to X , and
• lossy compression schemes, which generally
provide much higher compression than lossless
compression but allow Y to be different from X .
2 February 2021 10
Types of Data Compression
• Data compression is about storing and sending a smaller number of bits.
• There are two major categories for methods to compress data: lossless and lossy
methods.

2 February 2021 11
Lossless Compression Methods
• In lossless methods, original data and the data
after compression and decompression are exactly
the same.
• Redundant data is removed in compression and
added during decompression.
• Lossless methods are used when we can’t afford
to lose any data: legal and medical documents,
computer programs.
2 February 2021 12
Lossy Compression Methods
• Used for compressing images and video files (our eyes
cannot distinguish subtle changes, so lossy data is
acceptable).
• These methods are cheaper, require less time and
space.
• Several methods:
– JPEG: compress pictures and graphics
– MPEG: compress video
– MP3: compress audio
2 February 2021 13
Measure of Performance
• A compression algorithm can be evaluated in a
number of different ways.
• We could measure-
– the relative complexity of the algorithm,
– the memory required to implement the algorithm,
– how fast the algorithm performs on a given machine,
– the amount of compression, and
– how closely the reconstruction resembles the original.
2 February 2021 14
1. Compression Ratio
• A very logical way of measuring how well a compression
algorithm compresses a given set of data is to look at the ratio
of the number of bits required to represent the data before
compression to the number of bits required to represent the
data after compression. This ratio is called the compression
ratio.

2 February 2021 15
Example
• Suppose storing an image made up of a square array of
256×256 pixels requires 65,536 bytes. The image is
compressed and the compressed version requires 16,384
bytes.
• The compression Ratio for the above compression is given by-
Compression Ratio= Original Size
Compressed Size
 Compression Ratio= 65536 = 4:1
16384

2 February 2021 16
• We can also represent the compression ratio by expressing the
reduction in the amount of data required as a percentage of the size
of the original data.
• Total Compression in percentage = Original-Compressed ×100%
Original
= 65536-16384 ×100%
65536
= 75%
• In this particular example, the compression ratio calculated in this
manner would be 75%.

2 February 2021 17
2. Rate of Compression
• Compression performance can also be reported by providing
the average number of bits required to represent a single
sample.
• This is generally referred to as the rate.
• For example, in the case of the compressed image described
previously, if we assume 8 bits per byte (or pixel), the average
number of bits per pixel in the compressed representation is
2.
• Thus, we would say that the rate is 2 bits per pixel.
2 February 2021 18
3. Distortion
• In lossy compression, the reconstruction differs from the
original data.
• Therefore, in order to determine the efficiency of a
compression algorithm, we have to have some way of
quantifying the difference.
• The difference between the original and the reconstruction is
often called the distortion.

2 February 2021 19
What is meant by stochastic?
The word stochastic comes from the Greek word stokhazesthai meaning to
aim or guess. In the real word, uncertainty is a part of everyday life, so
a stochastic model could literally represent anything. The opposite is a
deterministic model,

What is the difference between probabilistic and stochastic?


As adjectives the difference between probabilistic and stochastic. is
that probabilistic is (mathematics) of, pertaining to or derived using
probability while stochastic is random, randomly determined, relating to
stochastics.

2 February 2021 20
2 February 2021 21
2 February 2021 22
2 February 2021 23
What are Markov models used for?
Markov models are often used to model the probabilities of different states and
the rates of transitions among them. The method is generally used to
model systems. Markov models can also be used to recognize patterns, make
predictions and to learn the statistics of sequential data.
Applications of Markov models
Genome annotation is the
process of attaching biological
information to sequences. It consists of
three main steps: identifying portions of
the genome that do not code for
proteins. identifying elements on
the genome, a process
called gene prediction, and. attaching
biological information to these elements.

2 February 2021 24

You might also like