Guided by : Mr.

Vijayendra Desai

Prepared by: Patel Hetal -82914 Nakrani Ankita -6247 Vora Mahesh-5267 Zalavadiya Ashish-7284

Concerned with obtaining compact digital representation of voice signals for more efficient transmission or smaller storage size.

Objective is to represent speech signal with minimum number of bits yet maintain the perceptual quality

The human’s vocal  apparatus consists of: – lungs – trachea (wind pipe) – larynx  contains 2 folds of skin called vocal cords which blow apart and flap together as air is forced through  oral tract  nasal tract

    

Speech Coder: device that converts speech to digital Types of speech coders – Waveform coders Convert any analog signal to digital form – Vocoders (Parametric coders) Try to exploit special properties of speech signal to reduce bit rate Build model of speech – transmit parameters of model – Hybrid Coders Combine features of waveform and vocoders

Type of speech codec
Types of Types of Speech Codecs Speech Codecs Waveform Waveform Codecs Codecs
ADPCM High Quality High Bit rate

Vocoders Vocoders
LPC Low Bit rate Low Quality

Hybrid Hybrid Codecs Codecs
CELP Medium Bit rate Good Quality

Waveform codecs

◦ Sample and code ◦ High-quality and not complex ◦ Large amount of bandwidth ◦ ◦ ◦ ◦ ◦ ◦ Match the incoming signal to a math model Linear-predictive filter model of the vocal tract A voiced/unvoiced flag for the excitation The information is sent rather than the signal Low bit rates, but sounds synthetic Higher bit rates do not improve much

source codecs (vocoders)

Hybrid codecs
◦ ◦ ◦ ◦ Attempt to provide the best of both Perform a degree of waveform matching Utilize the sound production model Quite good quality at low bit rate

Similar to images, we can also compress speech to make it smaller and easier to store and transmit. General compression methods such as DPCM can also be used. More compression can be achieved by taking advantage of the speech production model.

Why Adaptive Multi- Rate (AMR)? •Major challenges for designing of coder
•high quality speech throughout a wide variety of channel conditions • Traditionally, fixed source/channel bit allocation •Solution •Variable bit rate allocation for source and channel coder.

To satisfy the requirement of variable bit rate, • The quantization parameters of the fixed rate coders are changed. • In CELP (Code Excited Linear Predictive coder), size of code book, gain, Linear Predictive parameters, etc are changed to have variable bit rate. • CELP suffers from the larger processing time due to stochastic codebook. • Algebraic/ well structured codebook is used in ACELP to solve problem of CELP.

Operating Modes of AMR

Performance Comparison Between some Standardized Coders

The Speech Signal

Background Signal

Pitch Period

Unvoiced Signal (noise-like sound)
14

Speech Waveforms and Spectra
• S-silencebackground-no speech •U-unvoiced, no vocal cord vibration (aspiration, unvoiced sounds) • V-voiced-quasiperiodic speech
15

100 msec

Voiced Vs Unvoiced
– voiced stops are transient sounds produced by building up pressure behind a total constriction in the oral tract and then suddenly releasing the pressure, resulting in a pop-like sound • /B/ constriction at lips • /D/ constriction at back of teeth • /G/ constriction at velum – unvoiced stops have no vocal cord vibration during period of closure => brief period of fraction (due to sudden turbulence of escaping air) and aspiration (steady air flow from the

16

Pitch and formants
• For certain voiced sound, your vocal cords vibrate (open and close). • The rate at which the vocal cords vibrate determines the pitch of your voice. • For men pitch period is 4-20 ms (50-250Hz) • For women pitch period is 2-8 ms (120-500Hz) • Resonant frequency of vocal tract tube is called formants
Background Signal Pitch Period

17

Speech production Model

18

• •

• • •

Non uniform probability density function (PDF) Non zero auto correlation between successive speech sample Existence of voiced and unvoiced segment Quasi periodicity of speech signal Speech signals are band limited. • So it can samples at finite rate and signal can be reconstructed from this sample

Probability density function (PDF):
◦ ◦ ◦ ◦ Non-uniform PDF of speech signal Very high probability of near zero amplitude Significant probability of very high amplitude And monotonically decreasing function in between

◦ This PDF function has distinct peak at x=0, due to existence of frequency pauses and low level speech segment. non-uniform quantizer attempt to match distribution of ◦ quantization level to PDF of speech.

Autocorrelation function
 

Much correlation exists between the adjacent samples of segment of speech. So, in every sample of speech, there are large number of component which can be predicted from the previous samples with small random error. All differential and predictive type of coders are designed based on this property.

Characteristics of speech signal
Power spectral density function (PSD)  Non flat characteristic of PSD of speech makes it possible to obtain significant compression by coding speech in frequency domain.  Long term average PSD of speech shows that high frequency components contribute very little to total speech energy.  So, coding of speech in different frequency band can lead to significant coding gain

Quantizer removes irrelevance in the signal, and operation is irreversible
1)

Uniform quantization: Non uniform quantization: Adaptive quantization Vector quantization

 Amplitude level quantizer
1)

 A law and µ law companding
1) 2)

Vocoders • Channel vocoder • Formant vocoders • Cepstrum vocoder LPC • LPC vocoder • Multiplse Excited LPC • Code- Exited LPC

It uses analysis and synthesis approach  Signal need to be analyzed at the transmitter,  It determines the envelop of speech signal for number of frequency band and then sample encode and multiplexed these samples with encoded output of the filter.  Voiced unvoiced decision, energy information about each band and pitch frequency will be packed and transmitted.

Speech Production Models

Physical Model Mathematical Model

LPC Decoder
unpack Pitch voicing period Pitch period index decoder Power index Power decoder LPC index LPC decoder

LPC Bit stream

Impulse train generator Voiced/ Unvoiced speech

Gain computation

White noise generator

Synthesis filter

Synthesis speech

De-emphasis

29

Analysis-by-Synthesis Excitation Coding

CELP

Original speech sample

1,1 8,8

3,5

6,8

Speech S(n)

CELP Encoder Block Diagram LP parameters Buffer and
Gain, θ0 LP analysis Pitch Estimate P Pitch Synthesis Filter θp(z) e(n) Long term analysis Short term analysis LP Synthesis Filter θ(z) S (n ) *

LP Parameters

E n c o C h d a n e n e r l

Gaussian Excitation Codebook

+


ζ

ρ k(n)

Index ,k

θ P ( z) =

1 1 − βz −P

Perceptual Weighting Filter W(z) ζ
w

Error Energy minimizatio n Excitation Parameters

CELP bit allocation for AMR

Algebraic Code Excited Linear Predictive (ACELP) Coder

Results
Five.wav
Bitrate (kbps) 4.75 5.15 5.9 6.7 7.4 7.95 10.2 12.2 Process delay 1.4 1.958 2.484 3.555 5.955 10.234 70.658 284.194 SNR 11.392 12.3201 10.0201 8.7001 7.6269 7.8191 7.3258 6.2923 MSE 3.73E-04 3.01E-04 5.11E-04 6.93E-04 8.87E-04 8.49E-04 9.51E-04 0.0012

Bitrate (kbps) 4.75 5.15 5.9 6.7 7.4 7.95 10.2 12.2

Process delay 3.443 4.496 4.978 7.297 12.721 22.891 165.706 652.134

SNR 1.6833 1.9458 2.5418 2.7128 2.7049 3.7974 3.7305 4.312

MSE 0.0195 1.84E-02 1.60E-02 1.54E-02 1.54E-02 1.20E-02 1.22E-02 1.07E-02

Bitrate (kbps) 4.75 5.15 5.9 6.7 7.4 7.95 10.2 12.2

Process delay 7.073 6.776 10.589 15.565 25.128 47.284 387.415 1387.386

SNR 9.0667 9.6937 9.7603 10.1021 10.3836 11.198 11.9593 12.292

MSE 4.03E-04 3.49E-04 3.44E-04 3.19E-04 2.98E-04 2.48E-04 2.08E-04 1.92E-04