You are on page 1of 8 editor@tjprc.


Department of ECE, GZS-PTU Campus, Bathinda, Punjab, India

Voice communication is a foremost communication application adapted to transferring the information in a verbal
form. Owing to quite larger data size of voice signals in comparison to text, the need of speech compression techniques is
greatly envisaged to reduce communicating data size. In this work, a hybrid model is presented that first reduces the signal
noise and then compresses the voice signal. It works in three-stage (i) spectral subtraction method to perform high level
filtration, ii) Linear predictive coding (LPC) to reduce the signal noise and to reduce size, and finally iii) Discrete wavelet
transforms (DWT) to perform the compression. High compression percentage with lesser distortion over the signal is
observed in the results obtained. Evaluation on the basis of Mean Square Error (MSE), Peak Signal to Noise Ratio and
Compression Percentage is done after the LPC and after the DWT.
KEYWORDS: Voice Compression, LPC, Spectral Subtraction, DWT
Sound is an effective and specialized energy used for interactive communication among humans either directly or
in distant mode. In these days speech and hearing science has drawn a special attention of researchers in the area of
dynamics and processes during speech production and perception. Presently, a number of speech processing techniques are
available in literature for acoustic signals. Representation and processing speed of speech data have played a vital role for
effective voice communication.
A variety of approaches have been suggested to resolve the various speed related issues. Speech processing
involves voice encoding, synthesis, recognition, speaker recognition and verbal language translation. Speech encoding
ensures encoded transmission over the channel [2]. Different voice communication media and diverse encoding
mechanisms are available during compression of voice signal to achieve effective transmission in conserving bandwidth.
Speech encoding draws a special attention for secure communication over the network to keep maximum data within
minimum memory space. Speech synthesis is an integral component within a speech processing system to convert the
information from one form to another [3]. Various input media available for the voice include: microphone,
auto generated voice and voice captured from telephone lines. Afterwards, speech signal is converted to some standard
form to achieve effective and reliable communication with least distortion.
International Journal of Electronics, Communication
& Instrumentation Engineering Research
and Development (IJECIERD)
ISSN(P): 2249-684X; ISSN(E): 2249-7951
Vol. 4, Issue 2, Apr 2014, 155-162
TJPRC Pvt. Ltd.
156 Hitesh Garg, R K Bansal & Savina Bansal

Impact Factor (JCC): 4.9467 Index Copernicus Value (ICV): 3.0

Figure 1: Speech Processing Schematic
To reduce the speech size, speech compression is a major requirement for any of the speech processing
applications. The speech compression approach is generally divided into two main layers- first to enhance the speech
quality by reducing the noise in the speech and then to perform the speech compression.
The present work in this paper is organized as follows- Section II overviews various available speech compression
techniques and next section presenting the proposed work. Section IV, discusses the results obtained and last section
concludes the work.
In 2008, an effective compression was achieved, and performance was analyzed using mean absolute error,
SNR evaluation and compression ratio by Kekre and Sarode [1] while working on speech compression using Vector
Quantization. S Kumar et al [5] during his work in the area of voice compression, achieved compression ratio up to 50% by
performing the signal encoding with speech compression. The analysis has been done by performing frequency analysis
and the distortion analysis over the high-quality voice signal that is used for Internet-based communication. A DWT based
work is performed by Najih et al [6] for compression. Authors used wavelet filters to identify the best filter for speech
processing and to provide low bit rate to reduce complexity over the signal. The authors [6-7] worked under five distinct
schemes called DWT, Thresholding, Quantization, Huffman encoding and Reconstruction of signal. Najih et al [7] also
performed the comparative analysis of DWT approach from other filtration approaches under distinctive parameters,
including compression ratio and bit error rate. Radha [8] compared different compression techniques on high-quality
speech while implementing on Tamil Speech Datasets using LPC, DCT and DWT approaches. The author claims to have
obtained effective compression ratio. Another DWT based compression approach is explored by Elaydi [9]. This is a lossy
compression scheme that is often used to compress information such as speech signals. This paper presents a new lossy
algorithm to compress speech signals using Discrete Wavelet Transform (DWT) Techniques to solve the limited bandwidth
problem facing the Palestinian cellular company, Jawwal. The DWT performance for speech compression is compared to
other techniques such as -law speech coder. Neville and Hussain [10] worked on DWT on speech with different
coefficient vectors. Neville focused on analysis of wavelet compression effects. Two wavelet compression techniques,
thresholding and low-subband filtering have been utilized. Talele and Gandha in 2012 [11], proposed a simple speech
compression algorithm using sub-band division and ADPCM algorithm. Although speech data are stored on a
semiconductor memory device, its capacity and the available network capacity are limited. Therefore, it is necessary to
compress the data as much as possible.
Improved Speech Compression Using LPC and DWT Approach 157
In this present work, an intelligent hybrid approach is defined to perform the compression. The hybridization of
the approaches is performed by using Spread Spectrum, LPC and DWT approach. The work performs the speech filtration
to remove the unwanted noise contents from the speech. It makes it more effective as well as the size gets reduced. In this
work, at the first layer the combined Spread Spectrum and LPC will be used to estimate the noise and to remove it from the
speech file. Next, DWT is used to perform the speech compression with respect to speech contents. The reduction is
performed to obtain the linearity over the signal so that minimum distortion will occur. While performing the compression,
an analytical measure is implemented to control the error ratio. If the ratio is increased from this value, the compression
will be stopped. The complete work is divided into three main stages. In first stage, the spectral spectrum analysis is done
to perform high level filtration. In second layer, LPC is defined to perform the low-level filtration and to remove the noise
over the signal. In third layer, DWT based approach is used to achieve speech compression. In this section, these three
approaches are defined as follows. Here figure 2 shows the compression model. In this, at existing stage, the noise
reduction over the signal has been performed and at the next level, compression has been achieved by using DWT
approach. These three layers used in the process are as below:

Figure 2: Presented Model
3.1 Layer 1
Spectral subtraction [12] is an effective signal restoration approach based upon the magnitude spectrum of signal.
The approach is used to remove the additive noise over the signal. It can perform the high level filtration, by identifying the
average noise over the signal. Once the noise has been analyzed over the signal, the direction subtraction of the noise from
the signal is done. The effectiveness of this algorithm is based upon the estimation of the noise spectrum. Once the
estimation is done, the updation of signal is done and elimination of noise is done from the signal. The work also includes
some assumption that the noise is stationary and it will not be changed significantly. The restoration of the signal under the
time domain is defined to estimate the signal magnitude and adjust it with phase change to control the signal noise.
Once the adjustment of signal is done, the discrete fourier transformation is performed. In this case only the magnitude
change is done and no change to the phase is done by this approach. The signal y(n), the discrete noise corrupted input
signal, is composed of the clean speech signal s(n) and d(n)the uncorrelated additive noise signal, so the noisy signal can
be represented as:
158 Hitesh Garg, R K Bansal & Savina Bansal

Impact Factor (JCC): 4.9467 Index Copernicus Value (ICV): 3.0
y(n) = s(n)+d(n)
This assumption is based upon the fact that s(n), is stationary, but speech is not a stationary signal. The processing
is carried out on a short-time basis (frame-by- frame). Therefore, a time-limited window (number of frames multiplies the
original speech, noisy speech signal as well as the noise).Thus, the windowed signals can be represented as:
yw(n) = sw(n) + dw(n)
Finally, the noise computation over the signal has been evaluated by using FFT. After the estimation of signal,
the reduction of noise from the signal is done to derive the reconstructed signal. This signal is noise free.
3.2 LPC (Linear Predictive Coding)
LPC is the speech analysis and synthesis approach to the modeling of vocal signal as linear [8]. The signal uses
the IIR based filtration for the system transfer. This approach is based on the vocal pole parameters for the filtration [4].
The parameters incorporated this work include p that defines the number of poles, G represents the filtration gain
and a[k] represents the pole determination parameters. The frequency of signal is represented by Fo, pitch of signal is
represented by 1/Fo. The speech is generated using the filtration model with periodic impulse train. The next work is to
implement the Linear Predictor Coefficients.

Figure 3: LPC Filtration Process
(n) LPC(X, N) finds the coefficients, A(1) A(2) ... A (N+1) ], of an N
order forward linear predictor.
Xp (n) = -A (2)*X(n-1) - A(3)*X(n-2) - ... A (N+1)*X(n-N)
The main objective of work is to reduce the error over the signal.
err(n) = X(n) - Xp
X can be a vector or a matrix.If X is a matrixcontaining a separate signal in each column, LPC returns a model
estimate for each column in the rows of A. N specifies the order of the polynomial A(z) which must be a positive integer.
N must be smalleror equal to the length of X. If X is a matrix, N must be less or identical throughout the length of each
column of X.
3.3. DWT
DWT offers frequency based subdivision of a signal. It gives the perfect approach to reconstruct the signal and
decompose it under the frequency band [9]. The speech itself is considered as a two-dimensional signal space. The signal is
Improved Speech Compression Using LPC and DWT Approach 159
then approved to a series of low pass and high pass filtrations. It compacts much of energy into few coefficients,
these coefficients are preserved and other coefficients are discarded with little loss.
LL: Approximate
Sub band
HL: Horizontal
Sub band
LH: Vertical Sub
HH: Diagonal
Sub band

Figure 4: Level 1 Decomposition
This work is implemented on speech signal driven from the secondary method of data collection. This input data
is taken from web search. Here the result analysis is shown in figure 5.

Sample 1 Sample 2 Sample 3 Sample 4

Figure 5: Waveforms
The final results are presented in the above figure. The top row presents the input signal, and the second row
expresses the results after spectral subtraction method. The third row presents the results after LPC filtration. As we can
observe now that most of the error or the noise is pruned. The spectral subtraction can remove noise up to one level and
LPC improve the signal as well as remove the noise from the signal. Furthermore in the last step the compression is
performed by using DWT. Once the signal enhancement has been achieved, the compression is also achieved by using
LPC approach but the signal is compressed to its 66.66% of the original value approximately. And after this with the help
of DWT signal is compressed to 50% of its original value. In the present paper the results are also compared before and
after the discrete wavelet transform. The work is implemented for four different speech signals. The tabular representations
and the graphs of the signal obtained after third and final stage are shown. The comparison is made on the basis of Mean
Square Error (MSE), Peak Signal to Noise Ratio (PSNR) and Compression Percentage.
Table 1: Performance Based on MSE, PSNR, Compression Percentage
MSE PSNR Compression percentage
1 0.0015 0.0012 0.0011 42.7762 46.4137 49.1558 66.6667 50.0033 49.1558
2 0.0009 0.0016 0.0016 64.8876 38.2345 38.3025 66.6674 50.0038 49.9973
3 0.001 0.0008 0.00070 75.1867 90.803 98.4347 66.667 50.0028 49.9972
4 0.0004 0.0006 0.00060 149.0975 98.3523 97.7627 66.6672 50.0017 49.9966

160 Hitesh Garg, R K Bansal & Savina Bansal

Impact Factor (JCC): 4.9467 Index Copernicus Value (ICV): 3.0
The above table evaluates the performance of the above techniques on the basis of mean square error, peak signal
to noise ratio and compression percentage. The following figures 6(a), 6(b) and 6(c) represent the performance of these
techniques in the graphical form.

Figure 6(a)

Figure 6(b)

Figure 6 (c)
Improved Speech Compression Using LPC and DWT Approach 161
In this paper, a LPC improved DWT approach is presented to compress the speech signal. The work is divided
into two main stages. In first stage, the noise reduction over the signal is done and at second level, the voice compression is
achieved due to noise removal by LPC. The analysis of the work is defined under different parameters and obtained results
show the high level compression with adaptive PSNR and MSE ratios. The signal is compressed to half of its original size.
So the combination of both the techniques can reduce the size very effectively. As there is some loss in the speech quality
because it is the lossy compression. In the future, work can be done for the improvisation of the speech quality.
1. H. B. Kekre and T K. Sarode, "Speech data compression using vector quantization", International Journal of
Computer and Information Engineering, vol. 2, No.8, pp 535-538, 2008.
2. M H Johnson and A. Alwan, "Speech coding: fundamentals and applications", Wiley Encyclopedia of Telecom.,
pp 1-20, 2003.
3. D. Suendermann, et al "Challenges in speech synthesis", Speech Technology, Springer Science-Business Media,
pp 19-32, 2010.
4. B. Singh, et al "Speech recognition with Hidden- Markov model: A review", International Journal of Advanced
Research in Computer Science and Software Engineering, vol. 2, No. 3, 2012.
5. V. K Chaudhari, et al "A new algorithm for voice signal compression and analysis suitable for limited storage
devices using Matlab", Intl Journal of Computer and Electrical Engg, vol. 1, No. 5,pp 656-665, 2009.
6. M A Najih, et al "Proceedings IEEE, 4
National Conference on Telecommunication Technology,
pp 1-4, Malaysia, 2003.
7. M A Najih, et al "Comparing speech compression using wavelets with other speech compression schemes",
Proceedings IEEE, pp 55-58, Malaysia, 2003.
8. V Radha, "Comparative analysis of compression techniques for Tamil speech datasets", IEEE-International
conference on recent trends in Information Technology, pp 712-716, Chennai, 2011.
9. H Elaydi, "Speech compression using wavelets", Electrical & Computer Engineering department-Islamic
University of Gaza, Palestine, 2010.
10. K. L Neville and Z. M Hussain, "Effects of wavelet compression of speech on its Mel-Cepstral coefficients",
International Conference on Communication, Computer and Power (ICCCP'09), pp 387-390, Muscat, 2009.
11. KT Talele and S T Gandhe, "Speech compression using ADPCM", IJCA Proceedings on International Conference
in Computational Intelligence, vol. 8, New York, USA, 2012.
12. P. Kaur and P. Bahl, "Comparative analysis between DWT and WPD techniques of speech compression",
IOSR Journal of Engineering, vol. 2, No. 8, pp 120-128, 2012.