You are on page 1of 5

A Comparative Study between Discrete Wavelet Transform and Linear Predictive

Coding

D.Ambika V.Radha
Research Scholar, Associate Professor,
Department of Computer Science, Department of Computer Science,
Avinashilingam Institute for Home Science and Higher Avinashilingam Institute for Home Science
Education for Women, Coimbatore, India and Higher Education for Women, Coimbatore, India
ambikaphdscholar@gmail.com radhasrimail@gmail.com

Abstract— In this paper the analysis of the compression compression based on DCT. Section 5 explores the
process was performed by comparing the compressed signal performance evaluation for the adopted techniques. Finally,
against the original signal. To do this the most powerful speech the conclusion is summarized in section 6.
analysis and compression techniques such as Linear Predictive
Coding (LPC) and Discrete Wavelet Transform (DWT) was
implemented using MATLAB. Here nine samples of spoken II. TYPES OF SPEECH CODING TECHNIQUES
words are collected from different speakers and are used for
Although human beings have an audible frequency range
implementation. The results obtained from LPC were
compared with other compression technique called Discrete of 20Hz–20kHz, the human speech has significant frequency
Wavelet Transform. Finally the results were evaluated in components only up to 4 kHz, a property that is exploited in
terms of compressed ratio (CR), Peak signal-to-noise ratio the compression of speech [1]. The “fig 1” shows the various
(PSNR) and Normalized root-mean square error methods for coding the speech signal [2].
(NRMSE).The result shows that DWT performance was better Several techniques of speech coding such as LPC,
for these samples than the LPC method. Waveform coding and Sub Band coding exist. Waveform
coding is used to analyze code and reconstruct original
speech sample by sample. It includes time domain coding
Keywords- Speech compression, LPC, CR,DWT, PSNR, NRMSE. and frequency domain coding. The method such as Pulse
Code Modulation (PCM), Differential PCM (DPCM) [3],
I. INTRODUCTION Adaptive DPCM (ADPCM), Delta Modulation (DM), and
Adaptive PCMID are some of the popular time domain
Compression algorithm helps to reduce the bandwidth waveform coding techniques and Transform Coding (IC),
requirement and also provide a level of security for the data Sub band Coding (SBC) are a few spectral domain waveform
being transmitted. It is more important in teleconferencing coding techniques.
and wireless communication. Here it is more important to The PCM [3] is used to digitize the signals through signal
ensure that compression schemes retain the integrity of the conversion. The DPCM can be analog signal or a digital
speech. If the data is distorted in some way, it becomes signal. It uses the baseline of PCM but it adds some
difficult to understand [1]. Thus, speech compression needs functionality based on the prediction of the samples of the
to be performed in a way which retains the key qualities of signal. In DPCM, first an estimate of each sample is found
the data. Speech compression finds application in mobile based on prediction from past few samples and then the
satellite communication, cellular phones and in audio difference of estimate made from the original. The DPCM
conferencing system etc.Today applications of speech coding can provide PCM quality of speech at 56kbps. The Adaptive
and compression have become very numerous. Compression Differential Pulse Code Modulation (ADPCM) [3] which is
techniques can be classified into one of the two main used to provide much lower data rates by using a functional
categories: Lossless and Lossy. In Lossless compression, the model of the human speaking mechanism at the receiver end
original file can be perfectly recovered from the compressed [4]. The frequency domain includes sub band coding and the
file [2]. In case of Lossy compression, the original file transform coding. In transform coding the signal is
cannot be perfectly recovered from the compressed file, but transformed to its representation in another domain in which
it gives best possible quality for the compression than it can be compressed well than in its original form. This type
lossless by discarding less-critical data. Speech coding is a of coding uses the information about human vocal and
lossy type of coding, in which the output signal does not auditory systems. Using the transformation schemes such as
exactly sound like the input signal. Discrete Cosine Transform (DCT) and Discrete Wavelet
The paper is organized as follows. Section 2 explains the Transform (DWT), the important frequency components can
Types of speech coding techniques. Section 3 explains the be encoded with more precision than others.
speech compression using LPC. Section 4 deals with speech

978-1-4673-4804-1 2012
c IEEE 966
high pass and low pass filtering of the signal can be
represented using the following equations

Yhigh [k ] = ¦ n x[n]g[2k − n] (2)

Ylow [k ] = ¦ n x[n]h[2k − n] (3)

Where Y high and Y low are the outputs of the high pass
and low pass filters obtained by sub sampling by 2
Figure 1. Types of speech coding techniques [8].Assembling back the processed signal into the original
signal without loss of information is called synthesis.
In this paper the two promising techniques such as LPC and Whereas the mathematical manipulation that affects
the transformation technique DWT are used for the coding of synthesis is called inverse discrete wavelet transform
speech signal and it is evaluated for its performance. (IDWT). Different types of wavelets like Haar, db3, db7 and
db10 are experienced while implementation.

III. DISCRETE WAVELET TRANSFORM IV. LINEAR PREDICTIVE CODING


For analyzing and compressing speech signals there are Linear Predictive Coding (LPC) is used to compress the
many techniques available. The Discrete Wavelet Transform speech signal without losing its audibility. It breaks the
(DWT) is a special case of the Wavelet Transform that speech into segments and then sends the voiced or unvoiced
provides a compact representation of a signal. An important information, the pitch period and the coefficients for the
advantage of using wavelets for speech coding is that the filter. It is one of the most powerful methods used in audio
compression ratio can easily be varied, while most other and speech signal processing which extracts speech
parameters like pitch formants and spectra. LPC is used to
techniques have fixed compression ratios. It is mainly
estimate basic speech parameters like pitch, formants and
introduced to represent non-stationary signals more
spectra. The principle behind the use of LPC is to minimize
effectively than Fourier transform and since it retains both the sum of the squared differences between the original
the time and frequency aspect of the signal. The main speech and estimated speech signal over a finite duration [9].
advantage of the wavelet transforms is that it has a varying It allows encoding of good quality speech at a low bit rate
window size, being broad at low frequencies and narrow at and it also provides extremely accurate estimates of speech
high frequencies, thus leading to an optimal time frequency parameters. The most important aspect of LPC is the linear
resolution in all frequency ranges [5]. predictive filter which allows the value of the next sample to
be determined by a linear combination of previous samples
The DWT is computed by successive low pass filtering [7].The predictor coefficients are represented by ak and it is
and high pass filtering of the discrete time domain signal normally estimated in every frame with size of 20 ms long.
called the Mallat algorithm [6]. The DWT of the original Another important parameter is the gain (G). The transfer
signal can be obtained by concatenating the low frequency function of the time varying digital filter is given by [10]
components a[n] and high frequency components d[n],
starting from the last level of decomposition. It uses multi
G
resolution filter banks for the analysis and it is defined by the H ( z) = (4)
following equation 1 − ¦ a k z −k

The summation is computed starting at k=1 up to p,


W ( J , K ) = ¦ J ¦k X ( k ) 2 − J / 2ψ ( 2 − J N − K ) (1) which will be 10 for the LPC-10 algorithm. This means that
only the first 10 coefficient are transmitted to the LPC
synthesizer. Hence guaranteeing the stability of system H (z),
Where ȥ (t) is the basic analyzing function used and it is Levinson – Durbin recursion will be utilized to compute the
called as mother wavelet [7]. Here the time-scale required parameters for the auto-correlation method [13].
representation of a signal can be obtained using digital The LPC analysis of each frame also involves the decision
filtering techniques. The filtering and decimation process is making process of concluding if a sound is voiced or
continued until the desired level is reached. The successive unvoiced. A pitch detecting algorithm is employed to
determine the correct pitch period or the frequency. The

2012 World Congress on Information and Communication Technologies 967


performance of the compression techniques LPC and DWT Based on the performance evaluation, the DWT gives
used for the datasets are illustrated in “fig 5”.Here nine higher PSNR and lower NRMSE than the LPC. In the
samples are used for implementation, in which the five Compression ratio also the DWT produces good quality than
samples are given in this paper. the other method. The table1 shows the Performance
Analysis for the DWT and LPC based on PSNR, NRMSE
V. PERFORMANCE EVALUATION and CR. The following “fig 2, 3 and 4” represents the
The performance of the compression is analyzed by performance of two techniques in the graphical form.
comparing the compressed and decompressed signal against
the original signal [11].There are two ways in which the
techniques can be evaluated such as objectively and Performance analysis
subjectively. Objective analysis is done by evaluating the based on CR
performance of parameters such as Compression Ratio (CR),
Peak Signal to Noise ratio (PSNR), and Normalized Root 0.008
Mean Square Error Rate (NRMSE) [11].Whereas subjective 0.006

Range
analysis is based on hearing the reconstructed signal and 0.004
LPC
making the judgment which is done by Mean Opinion Score DWT
0.002
(MOS) [12]. For calculating the performance nine speech
0
samples are taken from various speakers and each file has a
1 2 3 4 5 6 7 8 9
different size with respect to other files. In this paper the
objective analysis is done in order to evaluate the parameters Number of samples
and the formulas are given below
Figure 2. Performance evaluation based on CR
A. Compression Ratio(CR)

Length( x(n)) Performance analysis


CR = (5) based on PSNR
Length(r (n))
16
The compression ratio can be calculated using the above
formula, where x(n) is the original signal and r(n) is the 15
Range

LPC
reconstructed signal [2] 14
DWT
13
B. Peak Signal to Noise Ratio (PSNR) 12
1 2 3 4 5 6 7 8 9
2 Number of samples
( NX )
PSNR = 10 log10 2
(6)
X −r Figure 3. Performance evaluation based on PSNR

The PSNR can be calculated using the above formula, Performance analysis based on
where N is length of the reconstructed signal, X is the
maximum absolute square value of the signal x and || x-r | |2 NRMSE
is the energy of the difference between the original and
reconstructed signals [12] 0.008
0.006
Range

LPC
0.004
C. Normalized Root Mean Square Error (NRMSE) 0.002
DWT

0
2 1 2 3 4 5 6 7 8 9
( x(n) − r (n))
NRMSE = (7) Number of samples
( x(n) − μx(n)) 2
Figure 4. Performance evaluation based on NRMSE
The NRMSE can be calculated using the above formula,
where x (n) is the speech signal, r(n) is the reconstructed
signal and x(n) is the mean of the speech signal [7].

968 2012 World Congress on Information and Communication Technologies


TABLE I. PERFORMANCE ANALYSIS USING DWT AND LPC
Speech sample DWT LPC

PSNR NRMSE CR PSNR NRMSE CR


Friday Rocks 15.18606 0.00255 0.9684 14.57136 0.00364 1.0146
Have a Good Weekend 14.84219 0.00311 0.3684 14.21918 0.00446 1.0244
Get Away From Me 14.26461 0.00434 0.982 13.66169 0.00614 1.0010
Water 15.19349 0.00254 0.7807 14.58647 0.00361 1.0057
Do U Like It 14.64838 0.00348 0.6150 14.04234 0.00493 1.0046
Dinner is Served 15.35273 0.00232 0.7748 14.74981 0.00328 1.0010
Come On You Can Do It 14.54993 0.00368 0.6280 13.92235 0.00529 1.0298
Can You Keep a Secret 14.93618 0.00295 0.8096 14.32655 0.00419 1.0087
Am I Totally Screwed Or 15.57102 0.00204 0.6147 14.96809 0.00289 1.0010

[3] D.Ambika,V.Radha,”Secure Speech Communication – A Review”,


VI. CONCLUSION International Journal of Engineering Research and Applications
(IJERA) Vol. 2, Issue 5, September- October 2012,pp.1044-1049
Data compression is the technology of representing [4] Jorgen Ahlberg,” Speech & Audio Coding” TSBK01 Image Coding
information with lowest number of bits. In general a good and Data Compression Lecture 11, 2003
reconstructed signal should produce high PSNR and low [5] Elif Derya Ubeyil,” Combined Neural Network Model Employing
NRMSE which means the signal have low error and high Wavelet Coefficients for ECG Signals Classification, Digital signal
reliability. In accordance with compression based algorithms Processing,vol 19,pp297-308,2008
using LPC and DWT, the DWT performs very well in [6] Shijo .M. Joseph,Firoz Shah A,and Babu Anto P,”Comparing Speech
Compression Using Waveform Coding and Parametric
processing which achieves higher PSNR and lower NRMSE. Coding”,International Journal of Electronics Engineering,3(1),pp.35-
The compression ratio provided by DWT also is better than 38,20011.
the LPC. But if the wavelet like db3, haar, db5 and db10 are [7] Sonia Sunny, David Peter S, K Poulose Jacob ,”Recognition of
used the compression ratio differs accordingly. If frame size Speech Signals: An Experimental Comparison of Linear Predictive
Coding and Discrete Wavelet Transforms”, International Journal of
is larger LPC technique is not able to analyze localized Engineering Science and Technology (IJEST),vol.4,April 2012
events accurately. The Wavelet transforms proved to be a [8] Y.T.Chan,”Wavelet Basics”,Kulwer Academic Publications,1995
useful tool for analysis of non stationary signals. It uses short [9] Amol R Madane, Zalak Shah and Raina Shah,”Speech compression
windows for high frequencies and long window at low using Linear Predictive coding”, International workshop on Machines
frequencies. This results in multi-resolutions analysis by Intelligence Research MIR labs, 2009
which the signal is analyzed with different resolutions at [10] Jalal Karam, “Various Speech Processing Techniques for Speech
different frequencies. Compression and Recognition”, World Academy of Science
Engineering and Technology,2007
[11] J.S Walker,” Wavelets and their Scientific Applications”, Chapman
and Hall/CRC,1999 research MIR labs
REFERENCES
[12] Y. Yorozu, M. Hirano, K. Oka, and Y. Tagawa, “Electron
spectroscopy studies on magneto-optical media and plastic substrate
[1] Shijo .M. Joseph,Firoz Shah A,and Babu Anto P, “Spoken Digit interface,” IEEE Transl. J. Magn. Japan, vol. 2, pp. 740–741, August
Compression A Comparative Study between Discrete Wavelet 1987 [Digests 9th Annual Conf. Magnetics Japan, p. 301, 1982].
Transforms and Linear Predictive Coding,” International Journal of [13] Nikhil Rao, “Speech Compression Using Wavelets”, ELEC 4801
Computer Applications (0975 – 8887),Vol 6, September 2010. Thesis Project,School of Information Technology and Electrical
[2] Dr.V.Radha,VimalaC. M.Krishnaveni, “Comparative Analysis of Engineering, Qld 4108,October 18,2001
Compression Techniques for Tamil Speech Datasets”,IEEE, ICRTIT,
June 3-5, 2011.

2012 World Congress on Information and Communication Technologies 969


Samples Original signal Reconstructed signal using LPC Reconstructed signal using DWT

Figure 5. Performance of the compression techniques using LPC and DWT

970 2012 World Congress on Information and Communication Technologies

You might also like