You are on page 1of 10

Data Compression

What is Data Compression?


Definition
 -Reducing the amount of data required to represent a source


of information (while preserving the original content as much


as possible).

Objectives

1- Reduce the amount of data storage space required.


2- Reduce length of data transmission time over the network.

Data Compression Methods


Data compression is about storing and sending
a smaller number of bits.
 Therere two major categories for methods to
compress data: lossless and lossy methods


Lossy Data Compression




The original message can never be recovered exactly as it was


before it was compressed.

-Not good for critical data, when we cannot afford to loss even a
single bit.

-Used mostly in sound, video, image compressions where the losses


can be tolerated.

-A threshold level is used for truncation. (for example In a sound file,

very high and low frequencies, which the human ear can not hear,
may be truncated from the file)


-Examples: JPEG, MPEG

-Lossy techniques are much more effective at compression than


lossless methods. The higher the compression ratio, the more noise
added to the data.

Lossless Compression Methods


In lossless methods, original data and the data
after compression and decompression are
exactly the same.
 Redundant data is removed in compression and
added during decompression.
 Lossless methods are used when we cant afford
to lose any data: legal and medical documents,
computer programs.


Popular algorithms: LZW(Lempel-Ziv-Welch),


RLE(Run Length Encoding), Huffman coding,
Arithmetic Coding, Delta Encoding.
GIF images (an example of lossless image
compression)

Run-length encoding



Simplest method of compression.


How: replace consecutive repeating occurrences of a symbol by 1
occurrence of the symbol itself, then followed by the number of
occurrences.

The method can be more efficient if the data uses only 2 symbols
(0s and 1s) in bit patterns and 1 symbol is more frequent than
another.

Huffman Coding




Assign fewer bits to symbols that occur more


frequently and more bits to symbols appear less often.
Theres no unique Huffman code and every Huffman
code has the same average code length.
Algorithm:

Make a leaf node for each code symbol


Add the generation probability of each symbol to the leaf node
Take the two leaf nodes with the smallest probability and connect
them into a new node
Add 1 or 0 to each of the two branches
The probability of the new node is the sum of the probabilities of
the two connecting nodes
If there is only one node left, the code construction is completed. If
not, go back to (2)

Huffman Coding


Example

Huffman Coding


Encoding

Decoding

Applications: Why We Need Data Compression?


The two most important points are:
1-Data storage
 -Modern data processing applications require storage of large
volumes of data.
 -Compressing a file to half of its original size is equivalent to
doubling the capacity of the storage medium.
2-Data transmission
 -Modern communication networks require massive transfer of data
over communication channels.
 -Compressing the amount of data to be transmitted is equivalent
to increasing the capacity of the communication channel.


-Smaller a file the faster it can be transferred over the channel.