# DATA COMPRESSION TECHNIQUES

Presented By : Archit Gupta & Gaurav Gupta

CONTENT
Distinguish between lossless and lossy compression. Describe run-length encoding and how it achieves compression. Describe Lempel Ziv encoding and the role of the dictionary in encoding and decoding. Describe Huffman coding and how it achieves compression.

LOSSLESS AND LOSSY COMPRESSION
Data compression implies sending or storing a smaller number of bits. In general these methods can be divided into two broad categories: lossless and lossy methods.

j In lossless data compression, the integrity of the data is preserved. The original data and preserved. the data after compression and decompressidecompression are exactly the same. same. j In lossy data compression, the integrity of the data is not exactly preserved. Our eyes preserved. and ears cannot distinguish subtle changes. changes. In such cases, we can use a lossy data compression method. method.

DATA COMPRESSION METHODS

RUN-LENGTH ENCODING
Run-length encoding is probably the simplest method of compression. It can be used to compress data made of any combination of symbols. o It can be very efficient if data is in the form of icons,bitamp,etc.

EXAMPLE OF RUN LENGTH ENCODING
Original Data:
BBBBBBBAAAAAAAAAANMMMMM

BBBBBBBAAAAAAAAAAANMMMMM

Characters: Frequency:

B

A

N

M

07

12

01

05

BBBBBBBAAAAAAAAAAANMMMMM

Compressed Data:

B07A12N01M05

A 17 X 17 image

Compressed Data
2W4R5W4R3W6R3W6R2W6R3W6R 1W8R1W16R1W59R1W15R2W15R3 W13R5W11R7W9R9W7R11W5R13 W3R15W1R8W

LEMPEL ZIV ENCODING 
Lempel

Ziv (LZ) encoding is an example of a category of algorithms called dictionary-based encoding. 

The

idea is to create a dictionary (a table) of strings used during the communication session.

COMPRESSION
In this phase there are two concurrent events: building an indexed dictionary and compressing a string of symbols. The algorithm extracts the smallest substring that cannot be found in the dictionary from the remaining uncompressed string. It then stores a copy of this substring in the dictionary as a new entry and assigns it an index value.

Compression occurs when the substring, except for the last character, is replaced with the index found in the dictionary. The process then inserts the index and the last character of the substring into the compressed string.

EXAMPLE OF LEMPEL ZIV ENCODING
Original Data:

BAABABBBA

BAABABBBA
B
B 1 B
PARSED STRING

BAABABBBA
A
B,A 1 2 B A
PARSED STRING

BAABABBBA
AB
PARSED STRING

B,A,2B 1 B 2 3 A AB

BAABABBBA
ABB
PARSED STRING

B,A,2B,3B 1 B 2 A 3 4 AB ABB

BAABABBBA
BA
PARSED STRING

B,A,2B,3B,1A 1 B 2 A 3 4 5 AB ABB BA

HUFFMAN CODING
The main idea behind Huffman Coding is that it assigns shorter codes to symbols that occur more frequently and longer codes to those that occur less frequently.

CONSTRUCTION ALGORITHM
Given frequencies of character we wish to compute a trie so that the length of the encoding is minimum possible. Each character is a leaf of the trie.

CONSTRUCTION ALGORITHM
The number of bits used to encode a character is its level number. Thus if fi is the frequency of the ith character and li is the level of the leaf corresponding to it when we want to find a tree which minimize i fili.

HUFFMAN ENCODING TRIE
Let our text is ABRACADABRA Characters Frequency A
5

B
2

R
2

C
1

D
1

5 2 A B

2 R

2 1 C 1 D

5 A 2 B 4 2 R 1 C 2 1 D

5 A 4 2 B 2 R

6 2 1 C 1 D

11 5 4 2 2 1 6 2 1

11
0 1

5
0

6
1

4
0

2
1 0 1

2

2

1

1

11
0 1

A
0

6
1

0

4

1

0

2
1

B

R

C

D

11
0 1

A
0 0

6
1 1 0

4

2

1

B

R

C

D

A B R A C A D A B R A 0 100 101 0 110 0 111 0 100 101 0

= 23 Bits

BIBLIOGRAPHY
www.wikipedia.org www.vectorsite.net www.scribd.com www.cs.cmu.edu www.ics.uci.edu www.webopedia.com www.citeseerx.ist.psu.edu www.ecma-international.org www.davidsalomon.name www.data-compression.com www.authorstream.com www.ligo.caltech.edu www.aha.com Data compression: The complete reference.

Thank You