Professional Documents
Culture Documents
a Huffman Tree
By the end of this worksheet you should:
• Explain how data can be compressed
using Huffman coding.
So, a Huffman Tree can look confusing; all binary trees
appear this way at first.
The nice thing is that we only have to remember that it is
called a binary tree because each node can have a
maximum of two branches only.
A node without any branches (so the end of a branch) is
called a leaf node.
Yet, how could we create a Huffman Tree in the first
place?
Well, let’s take a sentence:
THE CAT SAT ON THE MAT
Now let’s go through the steps…
1. create a table showing the frequency of each
character in the sentence (including spaces,
punctuation and any other special characters):
Character Frequency
T 5
H 2
E 2
SPACE 5
C 1
A 3
S 1
O 1
N 1
M 1
Page 1 of 10
Creating a Huffman Tree
4. Now take the two items furthest to the right, pair
them together, joining the branches by summing
the frequencies together.
Continue to do so until you have matched all the
pairs you can, leaving any spares for the moment:
Page 2 of 10
Creating a Huffman Tree
6. Continue adding each character, and joining them
together until all characters are now shown on the
diagram:
7. You should now have all the characters on your
diagram. These are known as the leaf nodes.
Now return to the right-most side of the diagram
and begin joining the paired values together:
Page 3 of 10
Creating a Huffman Tree
9. Continue pairing up the tree until you come to a
single node, called the root node:
Page 4 of 10
Creating a Huffman Tree
Page 5 of 10
Creating a Huffman Tree
12. Again, start from the root node, find the path to
each leaf node (each character), identifying the 0
and 1 branches you have to use to get to each
one.
Place these in the Code column of your table:
Character Code
T 00
SPACE 01
A 100
H 101
E 1100
C 1101
S 11100
O 11101
N 11110
M 11111
13. You now have the code for each character.
Simply replace each character with its code and
you have your Huffman Coded data:
T H E SP C A T SP
00 101 1100 01 1101 100 00 01
S A T SP O N SP
11100 100 00 01 11101 11110 01
T H E SP M A T
00 101 1100 01 11111 100 00
The data is…
0010111000111011000001111001000001111011111001
001011100011111110000
…which adds up to 67 bits.
Page 6 of 10
Creating a Huffman Tree
Page 7 of 10
Creating a Huffman Tree
Page 8 of 10
Creating a Huffman Tree
Page 9 of 10
Creating a Huffman Tree
EXAM ALERT
The exam might ask you to calculate the ASCII value for
some text.
Remember that ASCII uses 7 bits, and not 8 bits per
character.
However, since all computers store data in bytes, and
when ASCII is transmitted it is always transmitted in
blocks of 8 bits, the exam will accept any calculation
worked out using 8 bits instead of 7.
Why warn you?
Only so that if you see an answer given with 7 bits instead
of 8 you will know why!
Page 10 of 10