You are on page 1of 7

# List 5 Efficient codes and compression

1) Consider a dice with 8 faces written the letters from A to H. The probability for
each side are: A (1/2), B (1/4), C (1/8), D (1/16), E (1/32), F (1/64), G (1/128) e
H (1/128).
a) Find the Shannon-Fano and Huffman encodings for the symbols emitted
by this source
b) Compute the entropy of the source and compare with the average length
of the code words obtained in a), determining the coding efficiency of
Shannon-Fano and Huffman techniques
2) An unfair dice with 5 faces has probability of to give face A and to give
face B. The other three faces C, D and E have probability each
a) Find the Shannon-Fano coding for symbols emitted by this source.
b) Compute the entropy of the source and compare with the average length
of the code words obtained in a), determining the coding efficiency of
Shannon-Fano and Huffman techniques
3) Create the Shannon-Fano and Huffman codings for the following set of
symbols, then compare with the average length of code words obtained with the
entropy H(X), determining its efficiency:
Symbol Probabilities
x1 0,2
x2 0,18
x3, x4, x5 0,1 cada
x6 0,061
x7 0,059
x8, x9, x10, x11 0,04 cada
x12 0,03
x13 0,01
4) You want to transmit the following phrase to a receptor: this list is very easy
using the ASCII character set to map characters to 7 bits sequences
a) How many bits are needed to encode the sequence above?
b) How would be this sequence after applying the Shannon-Fano coding? What
is the average length?
c) How would this sequence be after applying the Huffman coding? What is the
average length?
d) Compute the entropy of the source and the efficiency of the codings found at
a), b), and c).

## 5) If you are developing a new computer with memory based on DNA

molecules. You will store the data using the 4 possible bases.
a) How many bases are needed to store 1 Terabyte of data?
b) You want to store a file containing only 8 symbols with probabilities: 10%,
20%, 20%, 15%, 15%, 10%, 5% e 5%. Propose the most compact possible selfpunctuating coding using the bases A, C, T and G, then determine its coding
efficiency.
6) You were asked to compactly encode a tongue-twister. This is the phrase:
peter piper picked a peck of pickled peppers. The frequency distribution for
each symbol is:

a) A way to encode this sequence would use a fixed size code, with code words
long enough to encode the 14 different symbols. How many bytes would be
needed to transmit this phrase with 44 characters using a fixed code size?
b) Determine the minimum number of bits required to encode the phrase
assuming that each character is independent of its surrounding characters.
c) What is the theoretical contribution of each one of the 14 symbols to the
average information?
d) Build a code dictionary using the Huffman algorithm for the 14 symbols
e) Encode the phrase using the code sequence of item d):
i. How many bits are needed?
ii. How does this number compare with the number of bits needed when using
the code obtained at a)?
iii. How does this number compare with the information content of the phrase
calculated in item b

7) Consider a source X with symbols x1, x2, x3, x4 encoded with the following
codes:

## a) Which of them are instantaneous codes? Justify.

b) Which of them are uniquely decodable? Justify.
8) Explain the codes below:

9) A source X has four symbols x1, x2, x3 e x4 with p(x1) = , p(x2) = and
p(x3) = p(x4) = 1/8. Build the Shannon-Fano code for X. Show that this code
has 100% efficiency.
10) Given a source X with m equiprobable symbols xi, i=1, ..., m. Let n be the
size of a word in a fixed size encoding. Show that, if n=log2m, then the code
efficiency is 100%.
11) In a DNA string, there are 4 kinds of bases: G, T, C and A. Which is the
information contained in a DNA of size 10 in the following cases?
a) All bases are equiprobable
b) The bases G and T are twice more probable than the bases C and A

## a) Identify the prefix codes and build their decision trees

b) Decode the sequence 0011011111000 using the prefix codes identified in a).
Try to repeat the same procedure to the other codes.
13) Consider a dice with 8 faces written the characters from A to H. Considering
that all faces have equal probability, show whether it is possible or not to create
a code which is more efficient than the fixed size code in this case.
14) A source X has 5 symbols with the following probabilities: p(x1) = 0,4, p(x2)
= 0,19, p(x3) = 0,16, p(x4) = 0,15 and p(x5) = 0,1.
a) Create a Shannon-Fano code for X and compute the code efficiency.
b) Repeat for the Huffman code and compare its results.
15) A source X has 5 equiprobable symbols.
a) Create a Shannon-Fano code for X and compute the code efficiency.
b) Repita para o cdigo de Huffman e compare os resultados.
16) Let be an unfair die with the following probabilities: 1: 0.05; 6: 0.3; de 2 a 5:
0,1625.
a) Determine the efficiency of the following codes:

## b) Determine the Shannon-Fano and Huffman codes and their respective

efficiencies.
17) Given a discrete source without memory in which the alphabet consists of K
equiprobable symbols. What conditions must be satisfied by K and by the size
of code-words (fixed size) so that the efficiency becomes 100%?
18) Joozinho says that it is possible to design a method which produces more
efficient codes than Huffman. He created the following iterative method:
- 1 iteration: For the 2 most probable symbols, assign 1bit codewords following
the binary order
- 2 iteration: For the next 4 most probable symbols, assign 2bits codewords
following the binary order
- 3 iteration: For the next 8 most probable symbols, assign 3bits codewords
following the binary order
...
- N iteration: For the next 2 N most probable symbols, attribute N bits codewords
following the binary order
a) Apply the method described by Joozinho for the following probabilities of an
unfair dice:
P(X = 1) = 1/4; P(X = 2) = 1/4; P(X = 3) = 1/4; P(X = 4) = 1/8; P(X = 5) = 1/16; P(X = 6) = 1/16

b) Determine the efficiencies of both Joozinho and Huffman codes for the
unfair dice of item a). Is there something strange with Joozinhos code? If yes,
justify.

## 19) The codes can be classified in 4 types

i singular
ii non-singular, but uniquely decodable
iii uniquely decodable, but non-instantaneous
iv instantaneous
a) Determine the type of each one of the following codes and justify
a1) Morse code:

## a2) A = 00, B = 01, C = 10, D = 11

a3) A = 010, B = 100, C = 101, D = 100
a4) A = 10, B = 11, C = 00, D = 110
a5) A = 0, B = 011, C = 10, D = 01
b) Why does Morse code need a pause (larger time interval) between the
emission of two characters and an even larger pause between two words?
20) Two fair dices were launched by Alice and the sum of their results was
written by her. Bob must ask a series of yes/no questions to find this number.
Describe an strategy to do it minimizing the number of questions
asked. This strategy must be better than the one obtained at
exercise 8 of the previous list.
21) Consider the message 122121213. Assuming that each character is an 8
bits ASCII, how would it be transmitted using run-length encoding? What is the
compression ratio?
22) Consider this famous phrase from JFK (in portuguese):
A pergunta de cada um de ns no deve ser o que o pas pode fazer por ns;
mas sim o que cada um de ns pode fazer pelo pas
a) How many bytes does this phrase occupy?
b) What frequency has each one of these words?
c) Build a static dictionary for this phrase, based on frequencies on b)
d) How many space does the dictionary occupies?

## 23) Why run-length can be considered an example of redundancy detection?

Does this algorithm ensure space saving? In what situations the algorithm
results in a file bigger than the original?
24) How can the following sequences be encoded as run-length sequences?
a) AAABBBBBBYYYYPPPPPPPPPTKKKKKKKK
b) 111112223333312222221111111333333333
25) What compression rate was obtained in a) and b) of the previous exercise?
26) In what cases are recommended lossy or lossless compression? Justify.
27) Create a simple method to quantize an image with 8 bits gray levels in other
image with 4 bits gray levels. Can the method be easily extended to colored
images (RGB)?
28) Given the following messages:
i) AAAAAAAAAAAAAAAAAAAABBBBBBBBBBBBBBBBBBBB
ii) ABABABABABABABABABABABABABABABABABABABAB
Would run-length be a great choice to encode the message i)? And the
message ii)? Justify
29) Consider an image with the Germany flag in a resolution of 30X40 pixels in
the RGB standard of 1 byte per channel, where the first 10 lines correspond to
the color black, the following 10 lines correspond to the color red, and the last
10 lines correspond to line yellow.
a) What is the size of a file containing this image (in bits and bytes)? Justify.
b) Consider the run-length compression method in which consecutive pixels
of same color are encoded by a binary pattern followed by a value in
brackets [X], where X is the number of consecutive repetitions of that
color. Suppose that the brackets are represented by 1 byte (ASCII
character) and X is an integer number represented by 4 bytes. Also, the
compressed file must start with a header which informs the file resolution
(two 2 bytes integers). What is the file size of the previous file after this
compression (in bits and bytes)? What is the compression rate with
regard to the file obtained in a). Justify.
c) What compression methods would be the most adequate to compress
the Germany flag image? The run-length or Huffman code? Justify.