You are on page 1of 24

Entropy (contd)

Statistical Encoding Use set of code words to transmit source information. 7/8 bits ASCII code Two steps identifying most frequent bit or byte patterns in data coding these patterns with fewer bits than initially represented

The Prefix Property


To see why the prefix property is essential, consider the codewords given below in which e is encoded with 110 which is a prefix of f
character codeword a 0 b 101 c 100 d 111 e 110 f 1100

The decoding of 11000100110 is ambiguous:

11000100110 => face

11000100110

=> eaace

Source Encoding
Differential encoding (predictive encoding)
Used extensively in applications where the amplitude of a value or signal covers a large range but the difference in amplitude between successive values/signal is relatively small. Can be a lossless or lossy Lossless - if the number of bits used is sufficient to cater for the maximum difference value then it is lossless. Lossy - when the difference value exceeds the maximum numbe of bits being used.

Source Encoding (contd)


Transform Encoding
Transforming source information from one form to another form. Used in applications involving both images and video. Digitization of monochromatic image produce 2 dimensional matrix of pixel each of which represents the level of gray in particular position of the image. Spatial frequency rate of change in magnitude as one traverse the matrix give rise t a term as spatial frequency. Frequency components scan the matrix in either the horizontal or the vertical direction, give rise to a term horizontal and vertical frequency components.

Source Encoding (contd)


Transform Encoding (contd)
If amplitude of the higher frequency component falls below a certain amplitude threshold, they will not be detected by the eye. DCT (Discrete cosine transform) Mathematical technique used for transformation of a two dimensional matrix of pixel values into an equivalent matrix of spatial frequency. Transformation operation is lossless. Frequency components in the matrix whose amplitude is less than a defined threshold can be dropped.

Text Compression
Three different types of text Unformatted Formatted Hypertext Any compression algorithm associated with text must be lossless since the loss of just a single character could modify the meaning of a complete string. y y Static Huffman Coding Dynamic Huffman Coding

Text Compression(contd)
Static Huffman Coding
Static Huffman coding assigns variable length codes to symbols based on their frequency of occurrences in the given message. Low frequency symbols are encoded using many bits, and high frequency symbols are encoded using fewer bits. The message to be transmitted is first analyzed to find the relative frequencies of its constituent characters. The coding process generates a binary tree, the Huffman code tree, with branches labeled with bits (0 and 1). The Huffman tree (or the character codeword pairs) must be sent with the compressed information to enable the receiver decode the message.

Static Huffman Coding Algorithm


Find the frequency of each character in the file to be compressed; For each distinct character create a one-node binary tree containing the character and its frequency as its priority; Insert the one-node binary trees in a priority queue in increasing order of frequency; while (there are more than one tree in the priority queue) { dequeue two trees t1 and t2; Create a tree t that contains t1 as its left subtree and t2 as its right subtree; // 1 priority (t) = priority(t1) + priority(t2); insert t in its proper location in the priority queue; // 2 } Assign 0 and 1 weights to the edges of the resulting tree, such that the left and right edge of each node do not have the same weight; // 3

Static Huffman Coding example


Example: Information to be transmitted over the internet contains the following characters with their associated frequencies:
Character Frequency

a 45

e 65

l 13

n 45

o 18

s 22

t 53

Use Huffman technique to answer the following questions:    Build the Huffman code tree for the message. Use the Huffman tree to find the codeword for each character. If the data consists of only these characters, what is the total number of bits to be transmitted? What is the compression ratio? Verify that your computed Huffman codewords satisfy the Prefix property.

Static Huffman Coding example (contd)

Static Huffman Coding example (contd)

Static Huffman Coding example (contd)

Static Huffman Coding example (contd)

Static Huffman Coding example (contd)

The sequence of zeros and ones that are the arcs in the path from the root to each leaf node are the desired codes:
character Huffman codeword a 110 e 10 l 0110 n 111 o 0111 s 010 t 00

Static Huffman Coding example (contd)


If we assume the message consists of only the characters a,e,l,n,o,s,t then the number of bits for the compressed message will be 696:

If the message is sent uncompressed with 8-bit ASCII representation for the characters, we have 261(addition of all frequencies)*8 = 2088 bits.
y

Static Huffman Coding example (contd) (contd)


Assuming that the number of character-codeword pairs and the pairs are included at the beginning of the binary file containing the compressed message in the following format:
7 a110 e10 l0110 n111 o0111 s010 t00 sequence of zeroes and ones for the compressed message

in binary (significant bits) Characters are in 8bit ASCII codes

Number of bits for the transmitted file = bits(7) + bits(characters) + bits(codewords) + bits(compressed message) = 3 + (7*8) + 21 + 696 = 776 Compression ratio = bits for ASCII representation / number of bits transmitted = 2088 / 776 = 2.69

Encoding and decoding examples


 Encode (compress) the message tenseas using the following codewords:
character Huffman codeword a 110 e 10 l 0110 n 111 o 0111 s 010 t 00

Decode a bit-stream by starting at the root and proceeding down the tree according to the bits in the message (0 = left, 1 = right). When a leaf is encountered, output the character at that leaf and restart at the root .If a leaf cannot be reached, the bitstream cannot be decoded.

(a)0110011101000 => lost (b) 11101110101011


The decoding fails because the corresponding node for 11 is not a leaf

Image Compression
Types of images:
Computer generated images(Graphical Images) represented in the form of computer program, hence require less memory space. Digitized Images represented in two dimensional matrix. Use Compression algorithm to compress bitmap image to transfer image across the network. To transfer digitized image two schemes 1) combination of run-length and statistical encoding 2) combination of transform, differential and run length encoding.

Image Compression (contd)


Graphical Interchange Format (GIF) (contd)
24 bit pixel image- 8 bits each for red, green and blue. GIF reduces the number of possible colors that are present by choosing the 256 colors from the original set of colors match most closely those used in the original image. Instead of sending each pixel as a 24- bit value, only the 8 bit index to the table entry that contains the closest match color to the original is sent. This results in a compression ratio of 3:1 Global Color table: The table of colors that relate to the whole image. Local Color table: The table of colors that relate to a portion of the image.

Image Compression (contd)


Graphical Interchange Format (GIF) (contd)
In case of text, detect the color value of the text If the color is not present in the table then add that color into the color table after 256 colors If the color is present in the table then add that index value of that color into the color table after 256 colors.

So, the number of entries in the table is allowed to increase incrementally by extending the length of the index by 1 bit. GIF allows an image to be stored and transfer over the network in an interlaced mode. With this mode, organize data in such a way so that the decompressed image is build up in progressive way as the data arrives.

Image Compression (contd)


Graphical Interchange Format (GIF) (contd)
To achieve this, compressed data is divided into four groups 1) 1/8 2) 1/8

3)
4)

Image Compression (contd)


Tagged image file format
It supports pixel resolution of up to 48 bits- 16 bits each for r ,g, b

You might also like