speech compression

speech compression

Published by Amol Chaudhari

Published by: Amol Chaudhari on Oct 07, 2010
Speech is a very basic way for humans to convey information to one another. With a bandwidth of only 4 kHz, speech can convey information with the emotion of a human voice.People want to be able to hear someone¶s voice from anywhere in the world as if the person wasin the same room .As a result a greater emphasis is being placed on the design of new andefficient speech coders for voice communication and transmission. Today applications of speechcoding and compression have become very numerous.This paper looks at a new technique for analyzing and compressing speech signals usingwavelets. Any signal can be represented by a set of scaled and translated versions of a basicfunction called the. mother wavelet. This set of wavelet functions forms the wavelet coefficientsat different scales and positions and results from taking the wavelet transform of the originalsignal. Speech is a non-stationary random process due to the time varying nature of the humanspeech production system. Non-stationary signals are characterized by numerous transitory drifts,trends and abrupt changes. The localization feature of wavelets, along with its time-frequencyresolution properties makes them well suited for coding speech signals.The coefficients represent the signal in the wavelet domain and all data operations can be performed using just the corresponding wavelet coefficients. Speech is a non-stationary random process due to the time varying nature of the human speech production system. Non-stationarysignals are characterized by numerous transitory drifts, trends and abrupt changes. Thelocalization feature of wavelets, along with its time-frequency resolution properties makes themwell suited for coding speech signals. In designing a wavelet based speech coder, the major issues explored in this paper are:i. Choosing optimal wavelets for speech,ii. Decomposition level in wavelet transforms,iii. Threshold criteria for the truncation of coefficients,iv. Efficiently representing zero valued coefficients andv. Quantizing and digitally encoding the coefficients.The performance of the wavelet compression scheme in coding speech signals and thequality of the reconstructed signals is also evaluated.
 1.1 Speech Signals
The human speech in its pristine form is an acoustic signal. For the purpose of communication and storage, it is necessary to convert it into an electrical signal. This isaccomplished with the help of certain instruments called µtransducers¶.This electrical representation of speech has certain properties.1. It is a one-dimensional signal, with time as its independent variable.2. It is random in nature.3. It is non-stationary, i.e. the frequency spectrum is not constant in time.4. Although human beings have an audible frequency range of 20Hz ±20kHz, the human speechhas significant frequency components only upto 4kHz,a property that is exploited in thecompression of speech.1.1.1 Digital representation of speechWith the advent of digital computing machines, it was propounded to exploit the powersof the same for processing of speech signals. This required a digital representation of speech. Toachieve this, the analog signal is sampled at some frequency and then quantized at discretelevels. Thus, parameters of digital speech are1. Sampling rate2. Bits per second3. Number of channels.The sound files can be stored and played in digital computers. Various formats have been proposed by different manufacturers for example µ.wav¶ µ.au¶ to name a few.In this thesis, theµ.wav¶ format is used extensively due to the convenience in recording it with µSound recorder¶software, shipped with WINDOWS OS.
1.2 Compression ± An Overview
In the recent years, large scale information transfer by remote computing and thedevelopment of massive storage and retrieval systems have witnessed a tremendous growth. Tocope up with the growth in the size of databases, additional storage devices need to be installedand the modems and multiplexers have to be continuously upgraded in order to permit largeamounts of data transfer between computers and remote terminals. This leads to an increase in
the cost as well as equipment. One solution to these problems is-³COMPRESSION´ where thedatabase and the transmission sequence can be encoded efficiently.
 1.3 Coding Techniques
There are various methods of coding the speech signal

