You are on page 1of 64

Telecommunications 1

• Week 6
Agenda

• Recap +
• Source encoding
• Channel encoding
• Block codes
• Error correction
• Other Order of Business
Types of Coding
• Source Coding - Code data to more efficiently represent the
information
– Reduces “size” of data
– Analog - Encode analog source data into a binary format
– Digital - Reduce the “size” of digital source data
• Channel Coding - Code data for transmission over a noisy
communication channel
– Increases “size” of data
– Digital - add redundancy to identify and correct errors
– Analog - represent digital values by analog signals
Source Encoding (compression)
• The digital result of the A/D conversion can be encoded (represented) in
many ways.
• One purpose of the source encoder is to eliminate redundant binary digits
from the digitized signal

• Thermometer code
• N bits for N levels

• Binary code
• log2N bits for N
levels
Huffman codes

• Both examples on previous page have fixed-length binary


character code: for an input alphabet of size n, represent
each character as a binary string of [lg n] bits.
 Example: 8-bit ASCII code.
• A more space-efficient representation can be obtained
using variable-length coding.
 Example: Huffman codes
Purpose of Huffman Coding

• Proposed by Dr. David A. Huffman in 1952


– “A Method for the Construction of Minimum Redundancy Codes”
• Applicable to many forms of data transmission
– Our example: text files
The Basic Algorithm

• Huffman coding is a form of statistical coding


• Not all characters occur with the same frequency!
• Yet all characters are allocated the same amount of
space
– 1 char = 1 byte, be it e or x
The Basic Algorithm

• Any savings in tailoring codes to frequency of character?


• Code word lengths are no longer fixed like ASCII.
• Code word lengths vary and will be shorter for the more
frequently used characters.
The (Real) Basic Algorithm
1. Scan text to be compressed and tally occurrence of all
characters.
2. Sort or prioritize characters based on number of
occurrences in text.
3. Build Huffman code tree based on prioritized list.
4. Perform a traversal of tree to determine all code words.
5. Scan text again and create new file using the Huffman
codes.
Building a Tree
Scan the original text

• Consider the following short text:

Eerie eyes seen near lake.

• Count up the occurrences of all characters in the text


Building a Tree
Scan the original text

Eerie eyes seen near lake.


• What characters are present?

E e r i space
ysnarlk.
Building a Tree
Scan the original text
Eerie eyes seen near lake.
• What is the frequency of each character in the text?

Char Freq. Char Freq. Char Freq.


E 1 y 1 k 1
e 8 s 2 . 1
r 2 n 2
i 1 a 2
space 4 l 1
Building a Tree
Prioritize characters

• Create binary tree nodes with character and frequency of


each character
• Place nodes in a priority queue
– The lower the occurrence, the higher the priority in the queue
Building a Tree

• The queue after inserting all nodes

E i y l k . r s n a sp e
1 1 1 1 1 1 2 2 2 2 4 8

• Null Pointers are not shown


Building a Tree
• While priority queue contains two or more
nodes
– Create new node
– Dequeue node and make it left subtree
– Dequeue next node and make it right subtree
– Frequency of new node equals sum of frequency of left
and right children
– Enqueue new node back into queue
Building a Tree

E i y l k . r s n a sp e
1 1 1 1 1 1 2 2 2 2 4 8
Building a Tree

y l k . r s n a sp e
1 1 1 1 2 2 2 2 4 8

E i
1 1
Building a Tree

y l k . r s n a sp e
2
1 1 1 1 2 2 2 2 4 8
E i
1 1
Building a Tree

k . r s n a sp e
2
1 1 2 2 2 2 4 8
E i
1 1

y l
1 1
Building a Tree

2
k . r s n a 2 sp e
1 1 2 2 2 2 4 8
y l
1 1
E i
1 1
Building a Tree

r s n a 2 2 sp e
2 2 2 2 4 8
y l
E i 1 1
1 1

k .
1 1
Building a Tree

r s n a 2 2 sp e
2
2 2 2 2 4 8
E i y l k .
1 1 1 1 1 1
Building a Tree

n a 2 sp e
2 2
2 2 4 8
E i y l k .
1 1 1 1 1 1

r s
2 2
Building a Tree

n a 2 sp e
2 4
2
2 2 4 8

E i y l k . r s
1 1 1 1 1 1 2 2
Building a Tree

2 4 e
2 2 sp
8
4
y l k . r s
E i 1 1 1 1 2 2
1 1

n a
2 2
Building a Tree

2 4 4 e
2 2 sp
8
4
y l k . r s n a
E i 1 1 1 1 2 2 2 2
1 1
Building a Tree

4 4 e
2 sp
8
4
k . r s n a
1 1 2 2 2 2

2 2

E i y l
1 1 1 1
Building a Tree

4 4 4
2 sp e
4 2 2 8
k . r s n a
1 1 2 2 2 2
E i y l
1 1 1 1
Building a Tree

4 4 4
e
2 2 8
r s n a
2 2 2 2
E i y l
1 1 1 1

2 sp
4
k .
1 1
Building a Tree

4 4 4 6 e
2 sp 8
r s n a 2 2
4
2 2 2 2 k .
E i y l 1 1
1 1 1 1

What is happening to the characters with a low number of occurrences?


Building a Tree

4
6 e
2 2 2 8
sp
4
E i y l k .
1 1 1 1 1 1
8

4 4

r s n a
2 2 2 2
Building a Tree

4
6 e 8
2 2 2 8
sp
4 4 4
E i y l k .
1 1 1 1 1 1
r s n a
2 2 2 2
Building a Tree

8
e
8
4 4
10
r s n a
2 2 2 2 4
6
2 2 2 sp
4
E i y l k .
1 1 1 1 1 1
Building a Tree

8 10
e
8 4
4 4
6
2 2
r s n a 2 sp
2 2 2 2 4
E i y l k .
1 1 1 1 1 1
Building a Tree

10
16
4
6
2 2 e 8
2 sp 8
4
E i y l k . 4 4
1 1 1 1 1 1

r s n a
2 2 2 2
Building a Tree

10 16

4
6
e 8
2 2 8
2 sp
4 4 4
E i y l k .
1 1 1 1 1 1
r s n a
2 2 2 2
Building a Tree

26

16
10

4 e 8
6 8
2 2 2 sp 4 4
4
E i y l k .
1 1 1 1 1 1 r s n a
2 2 2 2
Building a Tree
After enqueueing this node
there is only one node left in
priority queue.
26

16
10

4 e 8
6 8
2 2 2 sp 4 4
4
E i y l k .
1 1 1 1 1 1 r s n a
2 2 2 2
Building a Tree
Dequeue the single node left in the
queue.
26

This tree contains the new code 10


16
words for each character.
4 e 8
6 8
Frequency of root node should 2 2 2 sp 4 4
equal number of characters in text. 4
E i y l k .
1 1 1 1 1 1 r s n a
2 2 2 2

Eerie eyes seen near lake.  26 characters


Encoding the File
Traverse Tree for Codes

• Perform a traversal of the


tree to obtain new code
words 26

• Going left is a 0 going 16


10
right is a 1
• code word is only 4
6
e
8
8
completed when a leaf 2 2 2 sp 4 4
node is reached 4
E i y l k .
1 1 1 1 1 1 r s n a
2 2 2 2
Encoding the File
Traverse Tree for Codes

Char Code
E 0000
i 0001
y 0010
l 0011 26
k 0100 16
. 0101 10
space 011 4
e 10 e 8
6 8
r 1100 2 2 2
s 1101 sp 4 4
4
n 1110 E i y l k .
a 1111 1 1 1 1 1 1 r s n a
2 2 2 2
Encoding the File
• Rescan text and encode
file using new code words
Char Code
E 0000
Eerie eyes seen near lake. i 0001
y 0010
l 0011
000010110000011001110001010110110100
k 0100
. 0101
111110101111110001100111111010010010 space 011
1 e 10
r 1100
s 1101
n 1110
a 1111
 Why is there no need for a
separator character?
Encoding the File
Results

• Have we made things


000010110000011001110001010110110100
any better? 111110101111110001100111111010010010
1
• 73 bits to encode the
text
• ASCII would take 8 *
26 = 208 bits
If modified code used 4 bits per
character are needed. Total bits
4 * 26 = 104. Savings not as great.
Other data compression algorithms
• LZ77 (Lempel Ziv)
• LZR
• LZSS
• Deflate
• LZMA
• LZMA2
• Multi-Layer Perceptron (MLP)- Based Compression
• Dee Coder- Deep Neural Network Based Video Compression
• Convolutional Neural Network (CNN) – Based compression
• Generative Adversarial Network (GAN)- Based Compression
• Prediction by partial matching (PPM)
• Run-length encoding (RLE)
• bzip2
• Huffman coding
• ZStandard
Channel coding
Our plan to deal with bit errors:
bitstream with redundant redundant bitstream
information used for dealing possibly with errors
with errors

Channel Digital Digital Error


Coding Transmitter Receiver Correction

Message bitstream Recovered message bitstream

We’ll add redundant information to the transmitted bit stream (a process


called channel coding) so that we can detect errors at the receiver. Ideally
we’d like to correct commonly occurring errors, e.g., error bursts of bounded
length. Otherwise, we should detect uncorrectable errors and use, say,
retransmission to deal with the problem.
Error detection and correction

Suppose we wanted to reliably transmit the result of a single coin


flip:
This is a prototype of the “bit” coin
for the new information economy.
Value =
12.5¢

Heads: “0” Tails: “1”

Further suppose that during transmission a single-bit error occurs, i.e.,


a single “0” is turned into a “1” or a “1” is turned into a “0”.

“heads” “tails”

1
Hamming Distance
(Richard Hamming, 1950)

I wish he’d increase


his hamming
distance HAMMING DISTANCE:
The number of digit positions in
which the corresponding digits
of two encodings of the same
length are different

The Hamming distance between a valid binary code word and the
same code word with single-bit error is 1.

The problem with our simple encoding is that the two valid code words
(“0” and “1”) also have a Hamming distance of
1. So a single-bit error changes a valid code word into

another valid code word…


single-bit error

“heads” 0 1 “tails”
Error Detection
What we need is an encoding where a single-bit
error doesn’t produce another valid code word.

single-bit error
01
If D is the minimum
Hamming distance
between code
“heads” 00 11 “tails” words, we can
detect up to
(D-1)-bit errors
10

We can add single-bit error detection to any length code word by


adding a parity bit chosen to guarantee the Hamming distance between
any two valid code words is at least 2. In the diagram above, we’re
using “even parity” where the added bit is chosen to make the total
number of 1’s in the code word even.
Can we correct detected errors? Not yet…
Parity check
• A parity bit can be added to any length message and is chosen to make
the total number of “1” bits even (aka “even parity”).
• To check for a single-bit error (actually any odd number of errors), count
the number of “1”s in the received word and if it’s odd, there’s been an
error.

011001010011 original word with parity


011000010011 single-bit error (detected)
011000110011 2-bit error (not detected)

• One can “count” by summing the bits in the word modulo 2 (which is
equivalent to XOR’ing the bits together).
Error Correction
101 111 “tails”

100 110 single-bit error


If D is the minimum
Hamming distance
between code words,
001 011 we can correct up to (D-
1)/2- bit errors

“heads” 000 010

By increasing the Hamming distance between valid code words to 3, we guarantee that the sets of words produced by
single-bit errors don’t overlap. So if we detect an error, we can perform error correction since we can tell what the
valid code was before the error happened.
• Can we safely detect double-bit errors while correcting 1-bit errors?
• Do we always need to triple the number of bits?
Error Correcting Codes (ECC)
Basic idea:
– Use multiple parity bits, each covering a subset of the data bits.
– The subsets overlap, ie, each data bit belongs to multiple subsets
– No two data bits belong to exactly the same subsets, so a single-bit
error will generate a unique set of parity check errors
Suppose we check the parity and
discover that P2 and P3 indicate
an error?

P1 D1 P2 bit D3 must have flipped

D4 What if only P3 indicates an


error?
P1 = D1 XOR D2 XOR D4 D2 D3 P3 itself had the error!
P2 = D1 XOR D3 XOR D4
P3 = D2 XOR D3 XOR D4 P3
(n,k) Block Codes
k The entire block is called a “codeword”

Message bits Parity bits

n
• Split message into k-bit blocks
• Add (n-k) parity bits to each block, making each block n bits
long. How many parity bits do we need to correct single-
bit errors?
– Need enough combinations of parity bit values to identify which of the
n bits has the error (remember that parity bits can get errors too!), or
to indicate no error at all, so 2n-k ≥ n+1 or, equivalently, 2n-k > n
– Multiply both sides by 2k and divide by n: 2n/n > 2k
– Most efficient (i.e., we use all possible encodings of the parity bits)
when 2n-k – 1 = n
• (7,4), (15,11), (31, 26) Hamming codes

This code is shown on the previous slide


What (n,k) code does one use?
• The minimum Hamming distance d between codewords determines how we can use
code:
– To detect E-bit errors: D > E
– To correct E-bit errors: D > 2E
– So to correct 1-bit errors or detect 2-bit errors we need d ≥ 3. To do both, we need d ≥ 4 in order to avoid
double-bit errors being interpreted as correctable single- bit errors.
– Sometimes code names include min Hamming distance: (n,k,d)
• To conserve bandwidth want to maximize a code’s code rate, defined as k/n.
• Parity is a (n+1,n,2) code
– Efficient, but only 1-bit error detection
• Replicating each bit r times is a (r,1,r) code
– Simple way to get great error correction, but inefficient
A simple (8,4,3) code P1 is parity bit
for row #1

Idea: start with rectangular array of data bits, D1 D2 P1


add parity checks for each row and column.
Single-bit error in data will show up as parity D3 D4 P2
errors in a particular row and column,
pinpointing the bit that has the error.
P3 P4 P4 is parity bit
for column #2

011 0 1 1 011
110 10 0 111
10 1 0 10

Parity for each row and Parity check fails for row Parity check only fails for
column is correct #2 and column #2 row #2
=> no errors => bit D4 is incorrect => bit P2 is incorrect

If we add an overall parity bit P5,we get a (9,4,4) code!


Burst Errors
Correcting single-bit errors is nice, but in many
situations errors come in bursts many bits long
(e.g., damage to storage media, burst of
interference on wireless channel, …). How does
single-bit error correction help with that?

Well, can we think of a way to turn a B-bit error burst into B single-bit
errors?

Row-by-row Col-by-col
B transmission B transmission
order order

Problem: Bits from a particular Solution: interleave bits from B different


codeword are transmitted codewords. Now a B-bit
sequentially, so a B-bit burst burst produces 1-bit errors in B different
produces multi-bit errors. codewords.
Cyclic codes
• Special linear block codes with one extra property
• If a codeword is cyclically shifted (rotated), the result is another codeword
• If 1011000 is a codeword and we cyclically left-shift, then 0110001 is also a
codeword
• Cyclic Redundancy check (CRC) is a subset of cyclic codes
• Advantages of CRC
• Good performance in detection:
• Single-bit errors
• Double errors
• Odd number of errors
• Burst errors
• Easy and fast implementation
Polynomial Codes
• Bit string as polynomial w/0 and 1 coeffs
– ex: k bit frame, then xk-1 to x0
– ex: 10001 is 1x4+0x3+0x2+0x1+1x0 = x4+x0
• Polynomial arithmetic mod 2
10011011 11110000 00110011
+11001010 -10100110 +11001101
01010001 01010110 11111110
• Long division same, except subtract as above
• “Ok, so how do I use this information?”
Doing CRC
• Sender + receiver agree generator polynomial
– G(x), ahead of time, part of protocol
– with low and high bits a ‘1’, say 1001
• Compute checksum to frame (m bits)
– M(x) + checksum to be evenly divisible by G(x)
• Receiver will divide by G(x)
– If no remainder, frame is ok
– If remainder then frame has error, so discard
• “But how do we compute the checksum?”
Computing Checksums
• Let r be degree of G(x)
– If G(x) = x2+x0 = 101, then r is 2
• Append r zero bits to frame M(x)
– get xrM(x)
– ex: 1001 + 00 = 100100
• Divide xrM(x) by G(x) using mod 2 division (bitwise XOR)
– ex: 100100 / 101
• Care about remainder
• “Huh? Do you have an example?”
Dividing xrM(x) by G(x)
____1011__
101 | 100100
101
011
000
110
101
110
101
11  Remainder

“Ok, now what?”


Computing Checksum Frame
• Subtract (mod 2) remainder from xrM(x)
100100
11
100111
• Result is checksum frame to be transmitted
– T(x) = 100111
• What if we divide T(x) by G(x)?
– Comes out evenly, with no remainder
– Ex: 210,278 / 10,941 remainder 2399
– 210,279 - 2399 is divisible by 10,941
• “Cool!”
Let’s See if it Worked
____1011__
101 | 100111
101
011
000
111
101
101
101
0  yeah!
Summary
• To detect D-bit errors: Hamming distance > D
• To correct D-bit errors: Hamming distance > 2D
• (n,k,d) codes have code rate of k/n
• For our purposes, we want to correct single-bit errors and detect double bit errors, so d = 4
• Handle B-bit burst errors by interleaving B codewords
• Add sync pattern to interleaved codeword block so receiver can find start of block.
• Use checksum/CRC to detect uncorrected errors in message
So far…
• Redundancy is achieved through various coding schemes
• Sender adds redundant bits through a process that creates a relationship between
redundant bits and the actual data bits
• The receiver checks the relationships between the two sets of bits to detect errors
• The ratio of redundant bits to data bits and the robustness of the process are
important factors in any coding scheme
• Coding schemes can be divided into 2 broad categories:
• Block Coding (linear block codes, see Hamming)
• Convolution Coding

You might also like