0% found this document useful (0 votes)

481 views16 pages

Huffman Coding

Huffman coding is a variable-length encoding algorithm that assigns variable-length binary codes to characters, with more frequent characters having shorter codes. It builds a Huffman coding tree from character frequencies in a document, with external nodes storing characters and frequencies. The tree is built by inserting frequency-character pairs into a priority queue and repeatedly removing the two lowest frequencies to combine them as parent nodes in the tree until only one node remains.

Uploaded by

arupsil

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

481 views16 pages

Huffman Coding

Uploaded by

arupsil

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

Huffman Coding

Lawrence M. Brown

Huffman Coding†

• Huffman Coding
• Variable Length Encoding
• Building a Tree
• Decoding
• Encoding

†Adapted from: [Link]

25 September, 1999
1
Huffman Coding
Lawrence M. Brown

Huffman Coding
• Huffman Coding is a variable-length prefix encoding algorithm for
compression of character streams.

• Codes are assigned to characters such that the length of the

code depends on the relative frequency of the corresponding
character.
Letter Frequency Letter Frequency
A 77 N 67
Examples:
B 17 O 67
• File compression: C 32 P 20
• JPEG images. D 42 Q 5
• MPEG movies. E 120 R 59
• Transmission of data over band-limited F 24 S 67
G 17 T 85
channels: H 50 U 37
• Modem data compression. I 76 V 12
J 4 W 22
K 7 X 4
L 42 Y 22
M 24 Z 2
1
Frequency of occurrence per 1000 letters .

1Shaffer, Clifford A., A Practical Introduction to Data Structures and Algorithm Analysis, Java Edition, Prentice Hall (1998).
25 September, 1999
2
Huffman Coding
Lawrence M. Brown

Data Representation
Bits and Bytes

• Digital computers store data in binary or base-2 format.

• A binary digit (bit) is represented by a 0 or 1.

• A byte is an 8-bit number and is typically the smallest size of a binary

number represented on a computer.

010010112 = 0 × 2 7 + 1× 2 6 + 0 × 25 + 0 × 2 4 + 1× 23 + 0 × 2 2 + 1× 21 + 1× 2 0
= 1× 64 + 1× 8 + 1× 2 + 1× 1
= 6410 + 810 + 210 + 110
= 7510.

Longer words (16-bit, 32-bit, 64-bit) are constructed from 8-bit bytes.

25 September, 1999
3
Huffman Coding
Lawrence M. Brown

Unicode and ASCII

• Unicode is an International Standard that defines an universal character set
(16-bit unsigned integers).

ACSII Character Set

• Unicode characters range from 0 to 65,535
0 1 2 3 4 5 6 7
(\u0000 to \uFFFF) and incorporate all 0 NUL SOH STX ETX EOT ENQ ACK BEL
8 BS HT NL VT NP CR SO SI
languages (English, Russian, Asian, etc.).
16 DLE DC1 DC2 DC3 DC4 NAK SYN ETB
24 CAN EM SUB ESC FS GS RS US
32 SP ! " # $ % & '
• Java stores characters (char) as Unicode. 40 ( ) * + , - . /
48 0 1 2 3 4 5 6 7
56 8 9 : ; < = ?
• The standard set of ASCII characters 64 @ A B C D E F G
72 H I J K L M N O
still range from 32 to 127 80 P Q R S T U V W
88 X Y Z [ \ ] ^ _
(\u0020 to \u007F Unicode). 96 ` a b c d e f g
104 h i j k l m n o
112 p q r s t u v w
• ASCII characters represent the lowest 7 120 x y z { | } ~ DEL
bits of the Unicode set, with the upper
9 bits set to zero.

25 September, 1999
4
Huffman Coding
Lawrence M. Brown

Variable-length Encoding
• Unicode and ASCII are fixed-length encoding schemes. All characters
require the same amount of storage (16 bits and 8 bits, respectively).

• Huffman coding is a variable-length encoding scheme. The number of

bits required to store a coded character varies according to the relative
frequency or weight of the character.

• A significant space savings is achieved for frequently used

characters (requiring only one, two or three bits).

• Little space saving is achieved for infrequent characters.

Letter Frequency Huffman Code

E 120
I 10
2

25 September, 1999
5
Huffman Coding
Lawrence M. Brown

Huffman Coding Tree

• A Huffman Coding Tree is built from the observed frequencies of

characters in a document.

• The document is scanned and the occurrence of each character is

recorded.

• Next, a Binary Tree is built in which the external nodes store the
character and the corresponding character frequency observed in
the document.

Often, pre-scanning a document and generating a custom Huffman

Coding Tree is impractical. Instead, typical frequencies are used
instead of specific frequencies from a particular document.

25 September, 1999
6
Huffman Coding
Lawrence M. Brown

Building a Huffman Coding Tree

• Consider the observed frequency of characters in a string that requires
encoding:

Character C D E F K L U Z
Frequency 32 42 120 24 7 42 37 2

• The first step is to construct a Priority Queue and insert each

frequency-character (key-element) pair into the queue.

• Step 1:

2 7 24 32 37 42 42 120
Z K F C U L D E

Sorted, sequence-based, priority queue.

25 September, 1999
7
Huffman Coding
Lawrence M. Brown

Building a Huffman Coding Tree

• In the second step, the two Items with the lowest key values are
removed from the priority queue.

• A new Binary Tree is created with the lowest-key Item as the left
external node, and the second lowest-key Item as the right external
node.
• The new Tree is then inserted back into the priority queue.

• Step 2:

24 32 37 42 42 120
9 F C U L D E

2 7
Z K

25 September, 1999
8
Huffman Coding
Lawrence M. Brown

Building a Huffman Coding Tree

• The process is continued until only one node (the Binary Tree) is left in
the priority queue.

37 42 42 120
• Step 3: 32
C
33 U L D E

9 24
F

2 7
Z K

• Step 4: 37 42 42 120
U L D 65 E

32
33
C

9 24
F

2 7
Z K

25 September, 1999
9
Huffman Coding
Lawrence M. Brown

Building a Huffman Coding Tree

• Step 5:

42 120
D 65 79 E

32 37 42
33
C U L

9 24
F

2 7
Z K

25 September, 1999
10
Huffman Coding
Lawrence M. Brown

Building a Huffman Coding Tree

• Final tree, after n = 8 steps:

306

120 186
E

79 107

37 42 42 65
U L D

32 33
C

9 24
F

2 7
Z K

25 September, 1999
11
Huffman Coding
Lawrence M. Brown

Building a Huffman Coding Tree

Algorithm Huffmann( X ):
Input: String X of length n.
Ouput: Coding tree for X.

Compute frequency f(c) of each character c in X.

Initialize a priority queue Q.
for each character c in X do
Create a single-node tree T storing c.
Insert T into Q with key f(c).
while [Link]() > 1 do
f1 ← [Link]()
T1 ← [Link]()
f2 ← [Link]()
T2 ← Q. removeMinElement()
Create a new tree T with left subtree T1 and right subtree T2.
Insert T into Q with key f1 + f2.
return [Link]() // return tree

25 September, 1999
12
Huffman Coding
Lawrence M. Brown
Decoding
• To decode a bit stream (from the leftmost bit), start at the root node of the Tree:
• move to the left child if the bit is a “0”.
• move to the right child if the bit is a “1”.
• When an external node is reached, the character at the node is sent to the
decoded string.
• The next bit is then decoded from the root of the tree.

306
0 1

120 186
E
0 1
Decode:
79 107
1011001110111101 0 1
0 1
L1001110111101
37 42 42 65
L U1110111101 U L D
0 1
L U C111101
L U C K 32
33
C
0 1

9 24
F
0 1

2 7
Z K
25 September, 1999
13
Huffman Coding
Lawrence M. Brown

Encoding
• Create a lookup table storing the binary code corresponding to the path
to each letter.
• If encoding ASCII text, an 128-element array would suffice.
String[] encoder = new String[128];
encoder[‘C’] = “1110”;

Character Frequency Code # bits

Encode: C 32 1110 4
D 42 110 3
DEED E 120 0 1
110EED F 24 11111 5
1100ED K 7 111101 6
L 42 101 3
11000D U 37 100 3
11000110 Z 2 111100 6

• ASCII representation would require 32 bits.

• Huffman encoding requires 8 bits.
25 September, 1999
14
Huffman Coding
Lawrence M. Brown

Analysis
• Define fi = frequency of letter li, i = 1, … , n.
• Define ci = cost for each letter li (number of bits).

∑c f i i
• Expected cost per character, ECPC = i =1
n
bits/character.
∑f
i =1
i

• Actual message length, ML = ECPC ⋅ N bits, where N is the total

number of characters in the message.

Character Frequency Code # bits

4 ⋅ 32 + 3 ⋅ 42 + 1⋅120 + 5 ⋅ 24 + 6 ⋅ 7 + 3 ⋅ 42 + 3 ⋅ 42 + 3 ⋅ 37 + 6 ⋅ 2
ECPC =
C 32 1110 4

32 + 42 + 120 + 24 + 7 + 42 + 37 + 2 characters
D 42 110 3
E 120 0 1
F 24 11111 5
K 7 111101 6
L 42 101 3 ≈ 2.57 bits/character
U 37 100 3
Z 2 111100 6

A fixed-length encoding on 8 characters would require 3 bits per character,

with an ML of 918 bits.

25 September, 1999
15
Huffman Coding
Lawrence M. Brown

Summary
• Huffman codes are variable length and are based on the observed
frequency of characters.

• No Huffman code for a character in the set is the prefix of another

character.

• The best space savings for Huffman Coding compression is when the
variation in the frequencies of the letters is large.

25 September, 1999
16

Huffman Coding: Data Compression Explained
No ratings yet
Huffman Coding: Data Compression Explained
29 pages
5c. Huffman
No ratings yet
5c. Huffman
13 pages
Huffman Code
No ratings yet
Huffman Code
47 pages
Huffman Coding for Data Compression
No ratings yet
Huffman Coding for Data Compression
52 pages
Huffman Coding: Efficient Data Compression
No ratings yet
Huffman Coding: Efficient Data Compression
25 pages
Huffman Coding and Variable-Length Encoding
No ratings yet
Huffman Coding and Variable-Length Encoding
25 pages
Huffman Coding
No ratings yet
Huffman Coding
30 pages
Huffman
No ratings yet
Huffman
11 pages
Huffman Encoding Basics and Examples
No ratings yet
Huffman Encoding Basics and Examples
13 pages
Huffman Coding: Encoding Messages Efficiently
No ratings yet
Huffman Coding: Encoding Messages Efficiently
40 pages
Huffman Coding
No ratings yet
Huffman Coding
12 pages
Huffman Coding: Greedy Algorithm Guide
No ratings yet
Huffman Coding: Greedy Algorithm Guide
27 pages
Huffman Coding Implementation Project
No ratings yet
Huffman Coding Implementation Project
4 pages
Data Compression - Unit 2
No ratings yet
Data Compression - Unit 2
31 pages
Huffman Coding
No ratings yet
Huffman Coding
65 pages
Huffman Coding for CS Students
No ratings yet
Huffman Coding for CS Students
12 pages
Huffman Coding: Efficient Encoding Algorithm
No ratings yet
Huffman Coding: Efficient Encoding Algorithm
16 pages
Huffman Coding Explained: Variable-Length Encoding
No ratings yet
Huffman Coding Explained: Variable-Length Encoding
24 pages
Huffman Coding for Data Compression
No ratings yet
Huffman Coding for Data Compression
52 pages
Huffman Coding for Tech Enthusiasts
No ratings yet
Huffman Coding for Tech Enthusiasts
5 pages
Huffman
No ratings yet
Huffman
13 pages
Graph Theory - Important Application of Trees Huffman Coding
No ratings yet
Graph Theory - Important Application of Trees Huffman Coding
50 pages
Huffman Coding Algorithm Overview
No ratings yet
Huffman Coding Algorithm Overview
59 pages
Huffman Trees: Efficient Data Compression
No ratings yet
Huffman Trees: Efficient Data Compression
44 pages
Huffman Code
No ratings yet
Huffman Code
7 pages
Umair Week 7
No ratings yet
Umair Week 7
9 pages
Unit 4: Greedy Algorithms Overview
No ratings yet
Unit 4: Greedy Algorithms Overview
41 pages
Efficient Data Compression with Huffman Codes
No ratings yet
Efficient Data Compression with Huffman Codes
2 pages
L14 Huffman Code
No ratings yet
L14 Huffman Code
30 pages
Huffman Coding Ms 140400147 Sadia Yunas Butt
No ratings yet
Huffman Coding Ms 140400147 Sadia Yunas Butt
9 pages
Data Compression Techniques in ICS 220
No ratings yet
Data Compression Techniques in ICS 220
22 pages
Huffman Encoding and Data Compression
No ratings yet
Huffman Encoding and Data Compression
17 pages
Assignment 6: Huffman Encoding: Assignment Overview and Starter Files
No ratings yet
Assignment 6: Huffman Encoding: Assignment Overview and Starter Files
20 pages
Huffman Coding and Variable-Length Encoding
No ratings yet
Huffman Coding and Variable-Length Encoding
3 pages
Huffman Coding for Data Compression
No ratings yet
Huffman Coding for Data Compression
5 pages
Manual GRP A - Assignment 2 .Docx 1 1
No ratings yet
Manual GRP A - Assignment 2 .Docx 1 1
15 pages
Huffman Encoding Explained: Basics & Examples
No ratings yet
Huffman Encoding Explained: Basics & Examples
17 pages
Huffman Coding Project Overview
No ratings yet
Huffman Coding Project Overview
9 pages
Huffman Coding
No ratings yet
Huffman Coding
7 pages
Huffman Coding and Data Compression
No ratings yet
Huffman Coding and Data Compression
44 pages
Huffman Coding for File Compression
No ratings yet
Huffman Coding for File Compression
2 pages
Greedy Huffman Coding
No ratings yet
Greedy Huffman Coding
7 pages
Text Compression Techniques Overview
No ratings yet
Text Compression Techniques Overview
28 pages
Huffman Tree Data Compression Explained
No ratings yet
Huffman Tree Data Compression Explained
24 pages
Huffman Encoding and Decoding Explained
No ratings yet
Huffman Encoding and Decoding Explained
50 pages
Data Compression Techniques Explained
No ratings yet
Data Compression Techniques Explained
54 pages
Understanding Compression Techniques
No ratings yet
Understanding Compression Techniques
46 pages
Assignment No-05
No ratings yet
Assignment No-05
3 pages
Huffman Trees and Codes-V1
No ratings yet
Huffman Trees and Codes-V1
15 pages
Huffman Coding
No ratings yet
Huffman Coding
17 pages
HuffmanCoding 2
No ratings yet
HuffmanCoding 2
16 pages
Huffman
No ratings yet
Huffman
15 pages
Steps of Huffman Encoding:: Calculate The Frequency of Each Character Build A Priority Queue Build A Binary Tree
No ratings yet
Steps of Huffman Encoding:: Calculate The Frequency of Each Character Build A Priority Queue Build A Binary Tree
1 page
Huffman Coding Implementation in C++
No ratings yet
Huffman Coding Implementation in C++
26 pages
Term Paper Huffman Coding
No ratings yet
Term Paper Huffman Coding
9 pages
Huffman Coding: Efficient Data Compression
No ratings yet
Huffman Coding: Efficient Data Compression
6 pages
Huff Man
No ratings yet
Huff Man
8 pages
Huffman Coding: Data Compression Explained
No ratings yet
Huffman Coding: Data Compression Explained
69 pages
5.2 Huffman Algorithm
No ratings yet
5.2 Huffman Algorithm
12 pages
Grim Fandango - Manual - PC
No ratings yet
Grim Fandango - Manual - PC
20 pages
Kalasasaya Final 11
No ratings yet
Kalasasaya Final 11
1 page
CLB10503 Principles of Programming Assignment: Movie Ticket Booking Programme (Using C++ Coding)
67% (3)
CLB10503 Principles of Programming Assignment: Movie Ticket Booking Programme (Using C++ Coding)
17 pages
Introduction To Geospatial Technologies 3Rd Edition
No ratings yet
Introduction To Geospatial Technologies 3Rd Edition
427 pages
RAD750 Wind River Simics - Datasheet - Web
No ratings yet
RAD750 Wind River Simics - Datasheet - Web
2 pages
ENS491 Proposal
No ratings yet
ENS491 Proposal
7 pages
12 Computer Science
No ratings yet
12 Computer Science
25 pages
AI Reasoning: Procedural vs Declarative
No ratings yet
AI Reasoning: Procedural vs Declarative
22 pages
Medit Design UserGuide en v2.1
No ratings yet
Medit Design UserGuide en v2.1
91 pages
Data Warehouse and Data Mining MCQ Questions: Name: Shivani Dattatraya Chatte Roll No: 08
No ratings yet
Data Warehouse and Data Mining MCQ Questions: Name: Shivani Dattatraya Chatte Roll No: 08
46 pages
Multicore Computer Architecture Overview
No ratings yet
Multicore Computer Architecture Overview
18 pages
L-1A Visa Beneficiary Checklist Guide
No ratings yet
L-1A Visa Beneficiary Checklist Guide
5 pages
CSC 326 Transformer Protection Ied Technical Application Manual
No ratings yet
CSC 326 Transformer Protection Ied Technical Application Manual
445 pages
A Serial Boot Loader For PIC24F Devices
No ratings yet
A Serial Boot Loader For PIC24F Devices
26 pages
CCS Cse 2112 H2
No ratings yet
CCS Cse 2112 H2
30 pages
Latch/Unlatch Lab Manual for Automation
No ratings yet
Latch/Unlatch Lab Manual for Automation
5 pages
Online e-Mutation Process Guide
No ratings yet
Online e-Mutation Process Guide
1 page
DevOps Frameworks and Methodologies Guide
No ratings yet
DevOps Frameworks and Methodologies Guide
15 pages
Building SaaS Apps with C# and .NET
No ratings yet
Building SaaS Apps with C# and .NET
1 page
QuickBooks 2018 for Growing Businesses
No ratings yet
QuickBooks 2018 for Growing Businesses
21 pages
Floor Cleaning Robot
No ratings yet
Floor Cleaning Robot
23 pages
Single Aisle Technical Training Manual T1+T2 (CFM 56) (LVL 2&3) General Level 2 & 3
100% (1)
Single Aisle Technical Training Manual T1+T2 (CFM 56) (LVL 2&3) General Level 2 & 3
60 pages
ME 3781 IOT Ex 1,2, 3-1
No ratings yet
ME 3781 IOT Ex 1,2, 3-1
20 pages
Web Developer Profile: Jyoti Kumari
No ratings yet
Web Developer Profile: Jyoti Kumari
1 page
Questions and Answers SCRUM
No ratings yet
Questions and Answers SCRUM
71 pages
Programming Constructs
No ratings yet
Programming Constructs
1 page
Roblox Horror Game Project Files
No ratings yet
Roblox Horror Game Project Files
9 pages
3 - Polling and Timing CMPE2003
No ratings yet
3 - Polling and Timing CMPE2003
72 pages
Archimedes: Genius of Ancient Science
No ratings yet
Archimedes: Genius of Ancient Science
23 pages

Huffman Coding

Uploaded by

Huffman Coding

Uploaded by

Huffman Coding

†Adapted from: [Link]

• Codes are assigned to characters such that the length of the

• Digital computers store data in binary or base-2 format.

• A byte is an 8-bit number and is typically the smallest size of a binary

Unicode and ASCII

ACSII Character Set

• Huffman coding is a variable-length encoding scheme. The number of

• A significant space savings is achieved for frequently used

• Little space saving is achieved for infrequent characters.

Letter Frequency Huffman Code

Huffman Coding Tree

• A Huffman Coding Tree is built from the observed frequencies of

• The document is scanned and the occurrence of each character is

Often, pre-scanning a document and generating a custom Huffman

Building a Huffman Coding Tree

• The first step is to construct a Priority Queue and insert each

Sorted, sequence-based, priority queue.

Building a Huffman Coding Tree

Building a Huffman Coding Tree

Building a Huffman Coding Tree

Building a Huffman Coding Tree

Building a Huffman Coding Tree

Compute frequency f(c) of each character c in X.

Character Frequency Code # bits

• ASCII representation would require 32 bits.

• Actual message length, ML = ECPC ⋅ N bits, where N is the total

Character Frequency Code # bits

A fixed-length encoding on 8 characters would require 3 bits per character,

• No Huffman code for a character in the set is the prefix of another

You might also like