You are on page 1of 7

List 5 – Efficient codes and compression

1) Consider a dice with 8 faces written the letters from A to H. The probability for
each side are: A (1/2), B (1/4), C (1/8), D (1/16), E (1/32), F (1/64), G (1/128) e
H (1/128).
a) Find the Shannon-Fano and Huffman encodings for the symbols emitted
by this source
b) Compute the entropy of the source and compare with the average length
of the code words obtained in a), determining the coding efficiency of
Shannon-Fano and Huffman techniques
2) An unfair dice with 5 faces has probability of ⅛ to give face A and ⅛ to give
face B. The other three faces C, D and E have probability ¼ each
a) Find the Shannon-Fano coding for symbols emitted by this source.
b) Compute the entropy of the source and compare with the average length
of the code words obtained in a), determining the coding efficiency of
Shannon-Fano and Huffman techniques
3) Create the Shannon-Fano and Huffman codings for the following set of
symbols, then compare with the average length of code words obtained with the
entropy H(X), determining its efficiency:
Symbol Probabilities
x1 0,2
x2 0,18
x3, x4, x5 0,1 cada
x6 0,061
x7 0,059
x8, x9, x10, x11 0,04 cada
x12 0,03
x13 0,01
4) You want to transmit the following phrase to a receptor: “this list is very easy”
using the ASCII character set to map characters to 7 bits sequences
a) How many bits are needed to encode the sequence above?
b) How would be this sequence after applying the Shannon-Fano coding? What
is the average length?
c) How would this sequence be after applying the Huffman coding? What is the
average length?
d) Compute the entropy of the source and the efficiency of the codings found at
a), b), and c).

6) You were asked to compactly encode a tongue-twister. then determine its coding efficiency. You will store the data using the 4 possible bases. Propose the most compact possible selfpunctuating coding using the bases A. This is the phrase: “peter piper picked a peck of pickled peppers”. 15%. c) What is the theoretical contribution of each one of the 14 symbols to the average information? d) Build a code dictionary using the Huffman algorithm for the 14 symbols e) Encode the phrase using the code sequence of item d): i. How does this number compare with the number of bits needed when using the code obtained at a)? iii.5) If you are developing a new computer with memory based on DNA molecules. How many bits are needed? ii. 20%. How does this number compare with the information content of the phrase calculated in item b . How many bytes would be needed to transmit this phrase with 44 characters using a fixed code size? b) Determine the minimum number of bits required to encode the phrase assuming that each character is independent of its surrounding characters. 10%. 5% e 5%. with code words long enough to encode the 14 different symbols. 20%. T and G. a) How many bases are needed to store 1 Terabyte of data? b) You want to store a file containing only 8 symbols with probabilities: 10%. 15%. C. The frequency distribution for each symbol is: a) A way to encode this sequence would use a fixed size code.

then the code efficiency is 100%. Let n be the size of a word in a fixed size encoding.. Which is the information contained in a DNA of size 10 in the following cases? a) All bases are equiprobable b) The bases G and T are twice more probable than the bases C and A . T.. i=1. b) Which of them are uniquely decodable? Justify. x2. 10) Given a source X with m equiprobable symbols xi. m. p(x2) = ¼ and p(x3) = p(x4) = 1/8. 11) In a DNA string. x3. Show that this code has 100% efficiency.7) Consider a source X with symbols x1. Build the Shannon-Fano code for X. x2. x4 encoded with the following codes: a) Which of them are instantaneous codes? Justify. . C and A.. Show that. if n=log2m. 8) Explain the codes below: 9) A source X has four symbols x1. x3 e x4 with p(x1) = ½. there are 4 kinds of bases: G.

16.05. 16) Let be an unfair die with the following probabilities: 1: 0. p(x3) = 0. p(x2) = 0. a) Create a Shannon-Fano code for X and compute the code efficiency. 14) A source X has 5 symbols with the following probabilities: p(x1) = 0. b) Repita para o código de Huffman e compare os resultados. de 2 a 5: 0. Considering that all faces have equal probability.3. 6: 0.12) Given the four codes below: a) Identify the prefix codes and build their decision trees b) Decode the sequence 0011011111000 using the prefix codes identified in a).1.15 and p(x5) = 0. 13) Consider a dice with 8 faces written the characters from A to H. a) Create a Shannon-Fano code for X and compute the code efficiency.4. a) Determine the efficiency of the following codes: .19. p(x4) = 0. Try to repeat the same procedure to the other codes. show whether it is possible or not to create a code which is more efficient than the fixed size code in this case. b) Repeat for the Huffman code and compare its results. 15) A source X has 5 equiprobable symbols.1625.

justify.Nª iteration: For the next 2 N most probable symbols.b) Determine the Shannon-Fano and Huffman codes and their respective efficiencies. assign 1bit codewords following the binary order . He created the following iterative method: . P(X = 3) = 1/4.2ª iteration: For the next 4 most probable symbols.1ª iteration: For the 2 most probable symbols. P(X = 5) = 1/16. Is there something strange with Joãozinho’s code? If yes. P(X = 6) = 1/16 b) Determine the efficiencies of both Joãozinho and Huffman codes for the unfair dice of item a). . assign 3bits codewords following the binary order . P(X = 4) = 1/8.. ..3ª iteration: For the next 8 most probable symbols. P(X = 2) = 1/4. What conditions must be satisfied by K and by the size of code-words (fixed size) so that the efficiency becomes 100%? 18) Joãozinho says that it is possible to design a method which produces more efficient codes than Huffman. attribute N bits codewords following the binary order a) Apply the method described by Joãozinho for the following probabilities of an unfair dice: P(X = 1) = 1/4. assign 2bits codewords following the binary order . 17) Given a discrete source without memory in which the alphabet consists of K equiprobable symbols.

D = 100 a4) A = 10. C = 10. B = 100. C = 00. Bob must ask a series of yes/no questions to find this number. mas sim o que cada um de nós pode fazer pelo país a) How many bytes does this phrase occupy? b) What frequency has each one of these words? c) Build a static dictionary for this phrase. based on frequencies on b) d) How many space does the dictionary occupies? . B = 11. 21) Consider the message “122121213”. Describe an strategy to do it minimizing the number of questions asked. how would it be transmitted using run-length encoding? What is the compression ratio? 22) Consider this famous phrase from JFK (in portuguese): A pergunta de cada um de nós não deve ser o que o país pode fazer por nós. but uniquely decodable iii – uniquely decodable. D = 110 a5) A = 0. B = 01. Assuming that each character is an 8 bits ASCII. B = 011. C = 101. D = 01 b) Why does Morse code need a pause (larger time interval) between the emission of two characters and an even larger pause between two words? 20) Two fair dices were launched by Alice and the sum of their results was written by her. C = 10. This strategy must be better than the one obtained at exercise 8 of the previous list. D = 11 a3) A = 010. but non-instantaneous iv – instantaneous a) Determine the type of each one of the following codes and justify a1) Morse code: a2) A = 00.19) The codes can be classified in 4 types i – singular ii – non-singular.

and the last 10 lines correspond to line yellow. Suppose that the brackets are represented by 1 byte (ASCII character) and X is an integer number represented by 4 bytes. What is the file size of the previous file after this compression (in bits and bytes)? What is the compression rate with regard to the file obtained in a).23) Why run-length can be considered an example of redundancy detection? Does this algorithm ensure space saving? In what situations the algorithm results in a file bigger than the original? 24) How can the following sequences be encoded as run-length sequences? a) AAABBBBBBYYYYPPPPPPPPPTKKKKKKKK b) 111112223333312222221111111333333333 25) What compression rate was obtained in a) and b) of the previous exercise? 26) In what cases are recommended lossy or lossless compression? Justify. where the first 10 lines correspond to the color black. the following 10 lines correspond to the color red. Justify. where X is the number of consecutive repetitions of that color. Also. b) Consider the run-length compression method in which consecutive pixels of same color are encoded by a binary pattern followed by a value in brackets [X]. . Can the method be easily extended to colored images (RGB)? 28) Given the following messages: i) AAAAAAAAAAAAAAAAAAAABBBBBBBBBBBBBBBBBBBB ii) ABABABABABABABABABABABABABABABABABABABAB Would run-length be a great choice to encode the message i)? And the message ii)? Justify 29) Consider an image with the Germany flag in a resolution of 30X40 pixels in the RGB standard of 1 byte per channel. 27) Create a simple method to quantize an image with 8 bits gray levels in other image with 4 bits gray levels. a) What is the size of a file containing this image (in bits and bytes)? Justify. c) What compression methods would be the most adequate to compress the Germany flag image? The run-length or Huffman code? Justify. the compressed file must start with a header which informs the file resolution (two 2 bytes integers).