You are on page 1of 20

Data compression


CSC 101

CSC 101
What is Compression Technology?
• Compression denotes compact representation of data. In this lecture
we exclusively cover compression of digital data.
• Examples for the kind of data you typically want to compress are e.g.
– text
– source-code
– arbitrary files
– images
– video
– audio data
– speech

CSC101
Why do we still need compression?
• Compression Technology is employed to efficiently
use storage space, to save on transmission capacity
and transmission time, respectively
• Compression reduces the size of a file:
– To save space when storing it.
– To save time when transmitting it.
– Most files have lots of redundancy.

CSC 301
CSC 101
Why is it possible to compress data?
• Compression is enabled by statistical and other properties of most
data types, however, data types exist which cannot be compressed,
e.g. various kinds of noise or encrypted data.
• Compression-enabling properties are:
• Statistical redundancy: in non-compressed data, all symbols are represented
with the same number of bits independent of their relative frequency (fixed-
length representation).
• Correlation: adjacent data samples tend to be equal or similar (e.g. think of
images or video data).
• There are different types of correlation:
 Spatial correlation
 Spectral correlation
 Temporal correlation

CSC 301
Why is it possible to compress data?
In addition, in many data types, there is a significant amount
of irrelevancy since the human brain is not able to process
and/or perceive the entire amount of data. As a
consequence, such data can be omitted without degrading
perception.
• Furthermore, some data contain more abstract properties
which are independent of time, location, and resolution and
can be described very efficiently (e.g. fractal properties

CSC 301
History of compression technologies
• 1st century B.C.: Stenography
• 19th century: Morse- and Braille alphabets
• 50ies of the 20th century: compression technologies exploiting statistical
redundancy are developed –bit-patterns with varying lengths are used to
represent individual symbols according to their relative frequency.
• 70ies: dictionary algorithms are developed – symbol sequences are mapped to
shorter indices using dictionaries.
• 70ies: with the ongoing digitization of telephone lines telecommunication
companies got interested in procedures on how to get more channels on a
single wire.
• early 80ies: fax transmission over analog telephone lines.
• 80ies: first applications involving digital images appear on the market, the
“digital revolution” starts with compressing audio data
• 90ies: video broadcasting, video on demand, etc.
CSC 301
CSC 101
Data Encoding

• Data Encoding is the process of conversion of data in a form


suitable for various types of information processing.
• Encoding is used for data transmissions, data storage, and
data compression. It is also used in application processing
for file conversions.

CSC 101
Types of data encoding algorithms
• Following are the types of data encoding algorithms:
• Lossy encoding
• Discrete Cosine Transform
• Fractal Compression
• Lossless encoding
• Run Length Encoding
• Lempel-Ziv-Welch Algorithm
• Huffman coding
• Fano Shannon encoding
• Arithmetic coding
CSC 301
Lossy encoding
• In the case of lossy encoding, data that is not noticeable is
removed.
• This process reduces the actual data size. However, when
decoded, it cannot be restored into its initial state.
• There is also a loss in the quality of the data since some bits
are lost forever.
Lossy encoding generally results in a high degree of
compression.
Application: It is used for compressing audios and videos.

CSC 301
Lossy Algorithm
• Discrete Cosine Transform (DCT)
• The data is broken into various frequencies. A large amount
of information is stored in a very low-frequency component
of a signal. One of the ways to calculate DCT is by using
Fourier transformation.
• Discrete Cosine Transform makes image transmission
possible over networks.

CSC 301
Lossy Algorithm
• Fractal Compression
• Fractal compression uses the fact that different parts of an image
are similar; that is, images can be represented as fractals.
• Hence storing the image as a collection of transformations could be
useful in compressing the data.
• The algorithm converts these parts into fractal codes, which can
then be used to recreate the entire image.
It is hard to determine the compression ratio that can be achieved
using Fractal Compression as the image can be decoded at any
scale.

CSC 301
Lossless encoding
• Lossless encoding keeps the bytes which are non-
noticeable while compressing data.
• Hence, after decoding, the data can be restored to its
original state and there is no reduction in quality.
• Lossless encoding reduces less as compared to lossy
encoding, but data can be restored to the original quality.
Application: It is used for compressing text as well as audios
and videos. For instance, while creating a zip file.

CSC 301
Lossless compression Algorithm
• Run Length Encoding
• Run Length encoding follows a straightforward algorithm, it
just picks the next character and appends the character, and
it’s count of subsequent occurrences in the encoded string.
For the string
“WWWWWWWWWWWWWBBWWWBBBBBBBBBBB” , the
code is “W13B2W3B11”.
• The Length reduces from 29 characters to 10 characters.
This algorithm can prove highly useful for files with a high
number of runs and large run lengths
CSC 301
Lossless compression Algorithm
• Huffman coding
• Huffman coding is another popular algorithm for encoding
data.
• Huffman coding is a greedy algorithm, reducing the
average access time of codes as much as possible. It is a
tree-based encoding technique.
This method generates variable-length bit sequences called
codes in such a way that the most frequently occurring
character has the shortest code length.

CSC 301
Encoding vs. Encryption
• The idea behind encoding is to convert data in a format
such that it is readable by other processes and systems.
• Encoding is also done to reduce the size of the data and to
store it easily.
• Encoding uses commonly known algorithms and have a
standard way of decoding/decompressing.
• Encryption, on the other hand, is not decipherable by
anyone and everyone.
• It is done to prevent data theft while transferring sensitive
data..
CSC 301
Encoding vs. Encryption

• Encryption algorithms scramble the data and convert them


into cipher text. Data can be decoded only using special
keys or algorithms. Common encryption techniques are RSA
and AES encryption

CSC 301
CSC 101
CSC 101

You might also like