Professional Documents
Culture Documents
Coding
D.Ambika V.Radha
Research Scholar, Associate Professor,
Department of Computer Science, Department of Computer Science,
Avinashilingam Institute for Home Science and Higher Avinashilingam Institute for Home Science
Education for Women, Coimbatore, India and Higher Education for Women, Coimbatore, India
ambikaphdscholar@gmail.com radhasrimail@gmail.com
Abstract— In this paper the analysis of the compression compression based on DCT. Section 5 explores the
process was performed by comparing the compressed signal performance evaluation for the adopted techniques. Finally,
against the original signal. To do this the most powerful speech the conclusion is summarized in section 6.
analysis and compression techniques such as Linear Predictive
Coding (LPC) and Discrete Wavelet Transform (DWT) was
implemented using MATLAB. Here nine samples of spoken II. TYPES OF SPEECH CODING TECHNIQUES
words are collected from different speakers and are used for
Although human beings have an audible frequency range
implementation. The results obtained from LPC were
compared with other compression technique called Discrete of 20Hz–20kHz, the human speech has significant frequency
Wavelet Transform. Finally the results were evaluated in components only up to 4 kHz, a property that is exploited in
terms of compressed ratio (CR), Peak signal-to-noise ratio the compression of speech [1]. The “fig 1” shows the various
(PSNR) and Normalized root-mean square error methods for coding the speech signal [2].
(NRMSE).The result shows that DWT performance was better Several techniques of speech coding such as LPC,
for these samples than the LPC method. Waveform coding and Sub Band coding exist. Waveform
coding is used to analyze code and reconstruct original
speech sample by sample. It includes time domain coding
Keywords- Speech compression, LPC, CR,DWT, PSNR, NRMSE. and frequency domain coding. The method such as Pulse
Code Modulation (PCM), Differential PCM (DPCM) [3],
I. INTRODUCTION Adaptive DPCM (ADPCM), Delta Modulation (DM), and
Adaptive PCMID are some of the popular time domain
Compression algorithm helps to reduce the bandwidth waveform coding techniques and Transform Coding (IC),
requirement and also provide a level of security for the data Sub band Coding (SBC) are a few spectral domain waveform
being transmitted. It is more important in teleconferencing coding techniques.
and wireless communication. Here it is more important to The PCM [3] is used to digitize the signals through signal
ensure that compression schemes retain the integrity of the conversion. The DPCM can be analog signal or a digital
speech. If the data is distorted in some way, it becomes signal. It uses the baseline of PCM but it adds some
difficult to understand [1]. Thus, speech compression needs functionality based on the prediction of the samples of the
to be performed in a way which retains the key qualities of signal. In DPCM, first an estimate of each sample is found
the data. Speech compression finds application in mobile based on prediction from past few samples and then the
satellite communication, cellular phones and in audio difference of estimate made from the original. The DPCM
conferencing system etc.Today applications of speech coding can provide PCM quality of speech at 56kbps. The Adaptive
and compression have become very numerous. Compression Differential Pulse Code Modulation (ADPCM) [3] which is
techniques can be classified into one of the two main used to provide much lower data rates by using a functional
categories: Lossless and Lossy. In Lossless compression, the model of the human speaking mechanism at the receiver end
original file can be perfectly recovered from the compressed [4]. The frequency domain includes sub band coding and the
file [2]. In case of Lossy compression, the original file transform coding. In transform coding the signal is
cannot be perfectly recovered from the compressed file, but transformed to its representation in another domain in which
it gives best possible quality for the compression than it can be compressed well than in its original form. This type
lossless by discarding less-critical data. Speech coding is a of coding uses the information about human vocal and
lossy type of coding, in which the output signal does not auditory systems. Using the transformation schemes such as
exactly sound like the input signal. Discrete Cosine Transform (DCT) and Discrete Wavelet
The paper is organized as follows. Section 2 explains the Transform (DWT), the important frequency components can
Types of speech coding techniques. Section 3 explains the be encoded with more precision than others.
speech compression using LPC. Section 4 deals with speech
978-1-4673-4804-1 2012
c IEEE 966
high pass and low pass filtering of the signal can be
represented using the following equations
Where Y high and Y low are the outputs of the high pass
and low pass filters obtained by sub sampling by 2
Figure 1. Types of speech coding techniques [8].Assembling back the processed signal into the original
signal without loss of information is called synthesis.
In this paper the two promising techniques such as LPC and Whereas the mathematical manipulation that affects
the transformation technique DWT are used for the coding of synthesis is called inverse discrete wavelet transform
speech signal and it is evaluated for its performance. (IDWT). Different types of wavelets like Haar, db3, db7 and
db10 are experienced while implementation.
Range
analysis is based on hearing the reconstructed signal and 0.004
LPC
making the judgment which is done by Mean Opinion Score DWT
0.002
(MOS) [12]. For calculating the performance nine speech
0
samples are taken from various speakers and each file has a
1 2 3 4 5 6 7 8 9
different size with respect to other files. In this paper the
objective analysis is done in order to evaluate the parameters Number of samples
and the formulas are given below
Figure 2. Performance evaluation based on CR
A. Compression Ratio(CR)
LPC
reconstructed signal [2] 14
DWT
13
B. Peak Signal to Noise Ratio (PSNR) 12
1 2 3 4 5 6 7 8 9
2 Number of samples
( NX )
PSNR = 10 log10 2
(6)
X −r Figure 3. Performance evaluation based on PSNR
The PSNR can be calculated using the above formula, Performance analysis based on
where N is length of the reconstructed signal, X is the
maximum absolute square value of the signal x and || x-r | |2 NRMSE
is the energy of the difference between the original and
reconstructed signals [12] 0.008
0.006
Range
LPC
0.004
C. Normalized Root Mean Square Error (NRMSE) 0.002
DWT
0
2 1 2 3 4 5 6 7 8 9
( x(n) − r (n))
NRMSE = (7) Number of samples
( x(n) − μx(n)) 2
Figure 4. Performance evaluation based on NRMSE
The NRMSE can be calculated using the above formula,
where x (n) is the speech signal, r(n) is the reconstructed
signal and x(n) is the mean of the speech signal [7].