You are on page 1of 29

Week 5

Speech Coder

Channel Coder

Interleaving
• Speech to Radio
Burst Assembly

Chipering

Modulation
From source information to radio waves

Kevin McDermott 2
S p eec h S peec h
Coding D ec oding

Channel Channel
Codin g D ec o ding

Interleaving D e-In terleaving

Burs t Burs t
As s em bly D is as s em bly

C hipering D ec hip ering

M odulation D em odulation
T rans m is s ion

Kevin McDermott 3
F rom S peec h S ourc e to Radio W aves
Speech coding
• Voice is generally assumed not to contain any useful
information above frequencies of 4 kHz.
• Hence, a sampling rate of 8 kHz is typically sufficient for an
acceptable voice quality.
• There is an internationally agreed-upon standard for voice
coding using 8-kHz sampling, known as pulse code modulation
(PCM).
• This utilises eight-bit sampling at 8 kHz, resulting in a bit rate
of 64 Kbytes/s.

Kevin McDermott 4
Speech coding
• More optimal voice coders would model the vocal tract of the
speaker based on their first few syllables and then send
information on how this vocal tract was generating sound.
• GSM utilises a coder type known as regular pulse excited–long-
term prediction (RPE-LTP)
• The LTP part sends some parameters showing what the vocal
tract is doing and the RPE shows how it is generating sound
(“being excited”).
• The speech signal is divided into blocks of 20 ms.
• These blocks are then passed to the speech codec, which has a
rate of 13 kbps, in order to obtain blocks of 260 bits.

Kevin McDermott 5
Physical Model

Kevin McDermott 6
When you speak:
• Air is pushed from your lung through your vocal tract and out
of your mouth comes speech.
• For certain voiced sound, your vocal cords vibrate (open and
close). The rate at which the vocal cords vibrate determines
the pitch of your voice. Women and young children tend to
have high pitch (fast vibration) while adult males tend to have
low pitch (slow vibration).
• For certain fricatives and plosive (or unvoiced) sound, your
vocal cords do not vibrate but remain constantly opened.
• The shape of your vocal tract determines the sound that you
make.

Kevin McDermott 7
When you speak:
• As you speak, your vocal tract changes its shape
producing different sound.
• The shape of the vocal tract changes relatively slowly
(on the scale of 10 msec to 100 msec).
• The amount of air coming from your lung determines
the loudness of your voice.

Kevin McDermott 8
Formant Frequencies
• Voiced sounds are by air flowing from the lungs over the
vocal chords, causing them to vibrate in a periodic
pattern generating a series of air pulses called ‘glottal
pulses’.
• The rate of vibration of the vocal chords determines the
‘pitch’ of sound produced.
• As these air pulses pass along the vocal tract, some of
the frequencies resonate.
• These frequencies are called the ‘format frequencies’ of
the voice being produced.

Kevin McDermott 9
• Unvoiced sounds are those which do not
cause vibration of the vocal chords.
• The vocal tract is modelled as a time varying
filter.
• It amplifies certain sound frequencies and
attenuates other frequencies.
• The sound is produced when a sound source
excites the vocal tract filter.

Kevin McDermott 10
Mathematical Model

Kevin McDermott 11
LPC Model
• The above model is often called the LPC
Model.
• The model says that the digital speech signal is
the output of a digital filter (called the LPC
filter) whose input is either a train of impulses
or a white noise sequence.

Kevin McDermott 12
Where is LPC10?
• Taxonomy of Speech Coders
Speech Coders

Waveform Coders Vocoders

Time Domain : Frequency Domain :


C 10
Linear Predictive Coder Formant
PCM. ADPCM Sub-band coders,
Adaptive transform
coder
LP Coders

Waveform Coders : Preserve the signal waveform not


speech
Vocoders : Analyze speech, extract parameters, use
parameters to synthesize speech
Vocoder
Encoder
Original Speech
Analysis:
• Voiced/Unvoiced decision
• Pitch Period (voiced only)
• Signal power (Gain)

Pitch Decoder
Period Signal Power
Pulse Train V/U
Vocal Tract
G Model

Synthesized Speech
Random Noise
Voicing Classification(1)

Voiced Source
– Generated by vocal cords’ vibrations
– Periodic, spacing is the pitch, F0

Unvoiced Source
– Generated without vibrations
– Excitation is modeled by a White Gaussian Noise source
– No pitch

How to discriminate? Fisher’s Method


Channel coding
• Channel coding adds redundancy bits to the original
information in order to detect and correct, if possible, errors
that occurred during the transmission.
• The channel coder will accept 260 bits of data from the
speech coder every 20ms ie. 13kbps,
• It will code these bits using both block codes and
convolutional code to produce a block of 456 data bits in
20ms ie. 22.8kbps.

Kevin McDermott 16
Voice Coding

Voice • Analog signal is sampled using PCM at 64kbps.


Encoder • The signal is broken into 20 ms samples, which contain
1280 bits each
• A Regular Pulse Excited - Linear Predictive Coder (RPE-
LPC) is used to compress the audio data, which outputs a
260 bit sample that represents 20 ms of analog voice
signal.

260 bits
IA – 50 bits IB – 132 bits II – 78 bits
Most critical Very Important Icing
Channel Coding - Blocks

• The 260 bit (20ms) sample is divided into class IA, IB and II,
Channel based on how important the bits are in determining the
Encoder sound quality.
One sample is 20ms • IA uses a 3 bit CRC. If the CRC fails, the whole sample is
of speech thrown out.
--> 456 bits • IA and IB together have a 4-bit trailer. This is then put
--> 8 blocks into a 1/2 convolutional coder of length 4 that doubles
the number of bits.
One block is 2.5ms of
• II bits are appended unencoded, giving an overall sample
speech
of 456 bits.
--> 57 bits
456 bits
IA – 50 IB – 132 bits IB – 132 bits II – 78 bits

Block - 57 Block - 57 Block - 57 Block - 57 Block - 57 Block - 57 Block - 57 Block - 57


• The 456 bit encoded sample is divided into 8 blocks of 57 bits each (each contains
the equivalent of 2.5 ms of speech) – these are the basic units of transmission.
TDMA Bursts

• Blocks are gathered together to form a TDMA burst


Channel
Encoder • 2 separate speech sample blocks are gathered together
• Interleaved to protect against burst errors
• 26-bit training sequence
• To characterize multipath and filter it out
• 16.25 tail/guard bits One burst is two blocks
--> Two 2.5ms samples of speech
• Total Burst is 156.25 bits from same source

IA – 50
First sample (20ms)
IB – 132 bits IB – 132 bits II – 78 bits IA – 50
Second sample (20ms)
IB – 132 bits IB – 132 bits II – 78 bits

Block - 57 Block - 57 Block - 57 Block - 57 Block - 57 Block - 57 Block - 57 Block - 57 Block - 57 Block - 57 Block - 57 Block - 57 Block - 57 Block - 57 Block - 57 Block - 57

156.25 bits
T T T
Block - 57 Training - 26 Block - 57 T/G
G G G
Sharing the channel – TDMA Frames

• Eight bursts (from different sources) make up a TDMA frame


Channel

One TDMA frame is eight bursts


--> 8 sources x (2 x 2.5ms sample of speech)

• This allows eight sources to share a channel


T T T
Block - 57 Training - 26 Block - 57 T/G
G G G

Burst Burst Burst Burst Burst Burst Burst Burst

TDMA Frame - 8 bursts - 8 x 2 x 2.5ms sample of speech - 1250 bits - 4.62

Each burst comes from a different source (phone)


Eight phones share a channel using TDM.
Sharing the channel
• 8 Bursts per TDMA frame (2 x 2.5ms sample each)
• 26 TDMA frames make up one Multi-frame
Channel

• 24 are for data (speech)


• 1 is for control , 1 is unused
One TDMA MultiFrame is 26 Frames (24 data)
--> 8 sources x (24 x (2 x 2.5ms sample of speech))
--> 8 sources x 2 x 60ms sample of speech
--> 8 sources, Two 60ms samples of speech
Burst Burst Burst Burst Burst Burst Burst Burst

F F F F F F F F F F F F F F F F F F F F F F F F F F
MultiFrame - 26 Frames - 24 x 8 x 2 x 2.5ms sample of speech - 32500 bits -
One TDMA MultiFrame takes 120ms
--> 8 sources, Two 60ms samples of speech each
--> Each of eight sources can transmit 2 60ms samples of speech
every 120 ms
Interleaving
• In a radio environment, the signal strength can fade rapidly
for short periods of time due to fading (Rayeigh) and
shadowing.
• This will introduce high errors for short bursts.
• In order for error correction codes to work effectively the
errors should be evenly distributed in time.
• By using interleaving the risk of loosing consecutive data bits
is greatly reduced

Kevin McDermott 22
Interleaving
• A normal burst in GSM transmits two blocks of 57 data bits
• Therefore the 456 bits corresponding to the output of the
channel coder fit into four bursts (4*114 = 456).
• The 456 bits are divided into eight blocks of 57 bits.
• The first block of 57 bits contains the bit numbers (0, 8,
16, .....448), the second one the bit numbers (1, 9,
17, .....449), etc.
• The last block of 57 bits will then contain the bit numbers (7,
15, .....455).

Kevin McDermott 23
0 1 2 3 4 5 6 7 0 8 ..... 440 448
8 9 10 11 12 13 14 15
1 9 ..... 441 449
. . . . . . . .
. . . . . . . . 2 10 ..... 442 450
. . . . . . . .
3 11 ..... 443 451
440 441 442 443 444 445 446 447
448 449 450 451 452 453 454 455 4 12 ..... 444 452

5 13 ..... 445 453

6 14 ..... 446 454

7 15 ..... 447 455

Kevin McDermott 24
Interleaving
• The output’s of the interleaver are then grouped into bursts
that are modulated and transmitted.
• Each sub block is carried by a different burst and in a different
TDMA frame as shown below.
• The interleaving pattern will vary depending on whether we
are talking about a control channel, speech channel or data
channel.
• With interleaving the bursty noise is effectively spread out
which will allow the convolutional code to recover the
corrupted bits. So a sudden bursty deterioration of the S/N
ratio is not a problem.

Kevin McDermott 25
Bloc k n-1 (456 bits ) Bloc k n (456 bits )

0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7

Interleaving

114 bits 114 bits 114 bits 114 bits

Kevin McDermott 26
T RAN SM IT T ER RECEIVER
Read in by row s
Read out by row s
1 2 3 4 5 6 1 2
7 8 9 10 11 12 7 8
13 14 15 16 17 18 13
19 20 21 22 23 24 19
25 26 27 28 29 30 25
31 32 33 34 35 36 31
Read out by
c olum n Read in by
1 7 13 19 25 31 2 8 etc c olum n

Interleaving

Kevin McDermott 27
Interleaving in GSM
160 s am ples 160 s am ples
2048 bits (20m s ) 2048 bits (20m s )

R P E-LT P RP E-LT P
S peec h S peec h
Enc oder Enc oder

260 bits 260 bits

C hannel Channel
Enc oding Enc oding

456 bits 456 bits

1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8

S tream of Burs ts

Kevin McDermott 1 2 3 4 5 6 7 8 28
GSM: Modulation
• GSM uses Gaussian-filtered Minimum Shift Keying (GMSK).
– MSK is a minimum-shift form of FSK
Modulator – Gaussian pre-filter reduces bandwidth

• MSK gives the best spectral efficiency of any digital


bandpass signal set.

• FSK only has one amplitude level, allowing for a simpler


amplifier in the handset

You might also like