0% found this document useful (0 votes)
168 views12 pages

Data Compression Techniques Explained

Data compression algorithms reduce the size of data representations. Lossless compression, like Lempel-Ziv-Storer-Szymanski (LZSS), identifies and eliminates statistical redundancies without losing information. LZSS works by replacing repeated strings with references to a dictionary. Huffman coding assigns variable length codes to characters based on frequency, with more common characters having shorter codes. The Burrows-Wheeler transform permutes the characters in a string to group common substrings together, making run-length encoding more effective.

Uploaded by

shashank jakka
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
168 views12 pages

Data Compression Techniques Explained

Data compression algorithms reduce the size of data representations. Lossless compression, like Lempel-Ziv-Storer-Szymanski (LZSS), identifies and eliminates statistical redundancies without losing information. LZSS works by replacing repeated strings with references to a dictionary. Huffman coding assigns variable length codes to characters based on frequency, with more common characters having shorter codes. The Burrows-Wheeler transform permutes the characters in a string to group common substrings together, making run-length encoding more effective.

Uploaded by

shashank jakka
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd

DATA

COMPRESS
TEXT

DATA COMPRESSION
In Data Compression or bit-rate reduction involves encoding information
using fewer bits than the original representation.

Compression can be either lossy or lossless. Lossless compression reduces


bits by identifying and eliminating statistical redundancy.

No information is lost in lossless compression.

Lossy compression reduces bits by identifying marginally important


information and removing it.
TEXT

LEMPEL-ZIV-STORER-SZYMANSKI(LZSS)
LZSS is a dictionary encoding technique.

Unlike Huffman coding, which attempts to reduce the


average amount of bits required to represent a symbol,
LZSS attempts to replace a string of symbols with a
reference to a dictionary location for the same string.

It is intended that the dictionary reference should be shorter


than the string it replaces.
TEXT

LEMPEL-ZIV-STORER-SZYMANSKI(LZSS)
Step 1. Initialise the dictionary to a known value.
Step 2. Read an original string that is the length of the maximum allowable match.
Step 3. Search for the longest matching string in the dictionary.
Step 4. If a match is found greater than or equal to the minimum allowable match
length:

Write the encoded flag, then the offset and length to the encoded output.

Otherwise, write the original flag and the first original symbol to the encoded output.
Step 5. Shift a copy of the symbols written to the encoded output from the original
string to the dictionary.
Step 6. Read a number of symbols from the original input equal to the number of
symbols written in Step 4.
Step 7. Repeat from Step 3, until all the entire input has been encoded.
LEMPEL-ZIV-STORER-SZYMANSKI(LZSS)
Lets take an string : thisisthe
This would take 72 bits ! We can do better.
For this tiny string we would need 9 bits allowing up
to 512 unique characters.

Current Next Output Add to the


dictionary
t h t 116 th
256
h i h 104 hi
257
LEMPEL-ZIV-STORER-SZYMANSKI(LZSS)
Current Next Output Add to the dictionary
i s i 105 is
258
s i s 115 si
259
i s is is in the dictionary
is t is 258 ist
260
t h th is in the dictionary
th e th 256 the
261
e - e 101 -
TEXT

RUN LENGTH ALGORITHM


Run-length encoding (RLE) is a very simple form of data
compression in which runs of data (that is, sequences in which
the same data value occurs in many consecutive data elements)
are stored as a single data value and count, rather than as the
original run.
Let us take a hypothetical single scan line, with B representing a
black pixel and W representing white:
WWWWWWWWWWWWBWWWWWWWWWWWWBBBWWWWWW
WWWWWWWWWWWWWWWWWWBWWWWWWWWWWWWWW
If we apply the run-length encoding (RLE) data compression
algorithm to the above hypothetical scan line, we get the
following: 12W1B12W3B24W1B14W
TEXT

HUFFMAN CODING
Huffman coding is a lossless data compression
algorithm. The idea is to assign variable length codes
to input characters, lengths of the assigned codes
are based on the frequencies of corresponding
characters.
The most frequent character gets the smallest code
and the least frequent character gets the largest
code.
TEXT

HUFFMAN CODING
1. Create a leaf node for each unique character and build a min
heap of all leaf nodes (Min Heap is used as a priority queue. The
value of frequency field is used to compare two nodes in min heap.
Initially, the least frequent character is at root)
2. Extract two nodes with the minimum frequency from the min
heap.
3. Create a new internal node with frequency equal to the sum of
the two nodes frequencies. Make the first extracted node as its left
child and the other extracted node as its right child. Add this node
to the min heap.
4. Repeat steps#2 and #3 until the heap contains only one node.
The remaining node is the root node and the tree is complete.
TEXT

BURROW-WHEELER ALGORITHM
When a character string is transformed by the BWT, none
of its characters change value.
The transformation permutes the order of the characters.
If the original string had several substrings that occurred
often, then the transformed string will have several places
where a single character is repeated multiple times in a
row.
This is useful for compression, since it tends to be easy to
compress a string that has runs of repeated characters by
techniques such as move-to-front transform and run-
length encoding.
TEXT
TEXT

You might also like