You are on page 1of 3

Compression Techniques

Different Techniques for Compression

- Save storage

Front Compression

- Less I/O operation for access

Rear Compression

- Extra CPU process

Hierarchic Compression
Huffman Coding

Front Compression
Given the first four entry of employees
ROBERTON
ROBERTSON
ROBERTSTONE
ROBINSON
Suppose employee names are 12 characters long (b = blank)
ROBERTONbbbb
ROBERTSONbbb
ROBERTSTONEb
ROBINSONbbbb

Replace characters at the front of each entry that are the same as
those in the previous entry by a corresponding count.
ROBERTONbbbb
ROBERTSONbbb
ROBERTSTONEb
ROBINSONbbbb
Front compression
0 - ROBERTONbbbb
6 - SONbbb
7 - TONEb
3 - INSONbbbb

Rear Compression
This compression can be achieved by dropping all characters to
right of the one required to distinguish the entry in question from
its two immediate neighbors.
ROBERTONbbbb
ROBERTSONbbb
ROBERTSTONEb
ROBINSONbbbb

Problem with decompression


ROBERTO?????
ROBERTSO????
ROBERTST ???
ROBI????????

Rear compression
1 - ROBERTO
1 - ROBERTSO
3 - ROBERTST
4 - ROBI

5 - ROBERTO
4 - ROBERTSO
4 - ROBERTST
8 - ROBI

Front and Rear Compression

Hierarchic Compression

Apply front and rear compression together

In this type of compression records of a file will be sequenced by


value(s) of some stored field F.

ROBERTONbbbb
ROBERTSONbbb
ROBERTSTONEb
ROBINSONbbbb

Filed F will be utilized to create a single hierarchic.

Front & Rear compression


0 - 7 - ROBERTO
6 - 2 - SO
7-1-T
3-1-I

Intra-file
Hierarchic Compression (inter-file)

Stored record, consist of two parts:


A fixed part (city field)
A varying part (set of supplier) [ repeating group ]

Huffman Coding
- Bit string encodings are assigned to represent characters.
- Different characters are represented by strings of different lengths.
- Most commonly occurring characters are represented by the shortest
string.
Character
E
A
D
C
B

Frequency
35%
30%
20%
10%
5%

Code
1
01
001
0001
0000

00110001010011
00110001010011
DE
C A DE

You might also like