You are on page 1of 5

International Conference on Intelligent and Advanced Systems 2007

Adaptive Algorithm for Speech Compression using


Cosine Packet Transform
P.Prakasam and M.Madheswaran
Center for Advanced Research, Department of Electronics and Communication Engineering
Muthayammal Engineering College, Rasipuram 637 408, Tamilnadu, India.
Phone: +91 4287 226737, Fax: +91 4287 226537
email: prakasamp@gmail.com, madheswaran.dr@gmail.com
Abstract This paper presents a new adaptive algorithm for
speech Compression using Cosine Packet Transform.
The
proposed algorithm uses packet decomposition, which reduces a
computational complexity of a system. This paper compare the
compression ratio of methods using Wavelet Transform, Cosine
Transform, Wavelet Packet Transform and proposed adaptive
algorithm using Cosine Packet Transform for different speech
signal samples. The mean compression ratio is calculated for all
the methods and compared. The implemented results show that
the proposed compression algorithm gives the better performance
for speech signals.
Keywords: Discrete Cosine Transform, Discrete Wavelet Transform,
Wavelet Packet Transform, Cosine Packets, adaptive thresholding.

I. INTRODUCTION
With rapid deployment of speech compression technologies,
more and more speech content is stored and transmitted in
compressed formats. Speech signals has unique properties that
differ from a general audio/music signals. First, speech is a
signal that is more structured and band-limited around 4 kHz.
These two facts can be exploited through different models and
approaches and at the end, make it easier to compress. Today,
applications of speech compression involve real time
processing in mobile satellite communications, cellular
telephony, internet telephony, audio for videophones or video
teleconferencing systems, among others. Other applications
include also storage and synthesis systems used, for example,
in voice mail systems, voice memo wristwatches, voice logging
recorders and interactive PC software[1]. The idea of speech
compression is to compress speech signal to take up less
storage space and less bandwidth for transmission. To meet this
goal different methods for compression have been designed
and developed by various researchers [2-7]. The speech
compression is used in digital telephony, in multimedia and in
the security of digital communications. Before the introduction
of Packet based transform techniques, audio coding techniques
used DFT and DCT with window functions such as rectangular
and sine-taper functions. However, these early coding
techniques have failed to fulfil the contradictory requirements
imposed by high-quality audio coding. For example, with a
rectangular window the analysis/synthesis system is critically
sampled, i.e., the overall number of the transformed domain
samples is equal to the number of time domain samples, but the
system suffers from poor frequency resolution and block

1168 ~

effects, which are introduced after quantization or other


manipulation in the frequency domain. Overlapped windows
allow for better frequency response functions but carry the
penalty of additional values in the frequency domain, thus not
critically sampled. Discrete Cosine Packet Transform is
currently the best solution, which has satisfactorily solved the
paradox.
Speech compressions are done by either based on linear
prediction or based on orthogonal transforms methods. On the
basis of the classical papers written by Shannon, [8] and
Kolmogorov, [9], recently was highlighted a strong connection
between the systems proposed in many lossy compression
standards and the harmonic analysis, [10]. All these systems
use orthogonal transforms. The algorithm described in this
paper belongs to the second category. Unfortunately there is no
any fast algorithm for the computation of orthogonal transform.
This is the reason why in practice other orthogonal transforms
are used. The quality of compression system can be appreciated
with the aid of his rate distortion function. A compression
system is better than another if, at equal distortions, it realizes a
higher compression rate. The maximization of compression
rate can be done, if a good selection of orthogonal transform be
made.
This paper is organized as follows. The mathematical model
for speech signal and the description about Discrete Cosine
Transform is presented in Section II. With necessary
mathematical modeling, the proposed adaptive algorithm for
speech compression is explained in Section III. In section IV,
the developed algorithm is tested for various speech signal
samples and comparison is made with Wavelet Transform,
Cosine Transform and Wavelet Packet Transform. Finally,
section V concludes the paper with some discussions.
II. MATHEMATICAL MODEL
Mathematical model of speech signal
Every spoken word is a sequence of tons with different
intensities, frequencies and duration. Every ton is a sinusoidal
signal with a certain amplitude, frequency and duration.
Therefore it is possible to represent any speech signal in to a
sinusoidal model. A mathematical description of this model is
given by

1-4244-1355-9/07/$25.00 @2007 IEEE

Authorized licensed use limited to: Jawaharlal Nehru Technological University. Downloaded on November 17, 2008 at 06:25 from IEEE Xplore. Restrictions apply.

International Conference on Intelligent and Advanced Systems 2007


Q (t )

x(t )

Ai cos T i (t )

(1)

i 1

Where Ai, i and t are amplitude, frequency and time duration


of the particular incident respectively.
Every term of this sum is a signal with double modulation.
So this is not a stationary signal. But frequently the speech is
regarded like a sequence of stationary signals. Dividing the
speech signal into a sequence of stationary signals, each of
them having duration inferior to 25 ms, a sequence of
stationary signals is obtained. On each segment the speech
model can be of the form:
n

A cos Z t

x s (t )

(2)

i 1

This decomposition is very similar with the decomposition of


the signal xs
t into a cosine packet.
The energy of the signal xs(t) can be computed using the
following relation.
n

Ex

| A |
i

(3)

i l

The Discrete Cosine Transform

coefficients are extracted and fed into the adaptive threshold


detector to nullify the inferior coefficient for better
compression.
Selection of best packets
The main reason to choose the Packet Cosine transform is
cost functional used for the best packet. This transform is an
adaptive one. The result of its utilization in a given application
can be optimize using the best packet selection procedure. This
is a very efficient procedure which is able to enhance very
much quality of a given signal processing method. There are
some cost functions that can be minimized for the selection of
the best cosine packet. The most used is the entropy but its
utilization do not realizes the maximization of the compression
rate. The optimal cost functional for compression is that
realizing the minimization of the number of coefficients
superior to a given threshold, t, Ci. Using this cost functional,
Ci coefficients superior to the threshold t are obtained. This is a
minimal number because it was obtained using the appropriate
cost functional for the selection of the best packet. This is the
reason why this cost functional realizes the maximization of
the compression rate. Increasing the threshold value t, the
number Ci decreases and the compression rate increases.
Hence, the threshold detector must be an adaptive one. Another
parameter of the DCPT who must be considered for the
optimization of the compression is its number of iterations.

The most common DCT [11] definition of a 1-D sequence of


length N is

Ci D i

N 1

S (2 x  1)i
,
2 N

f ( x) cos
x 0

(4)

for i = 0,1,2,,N1.

Where

Di

Input Speech signal to be


compressed
Packet
Decomposition

2
N

for i 0

(5)
for i z 0

Computation of
DCT

It is clear from (1) that for i =0,

C (i

0)

1
N

N 1

f ( x)    

Extracting the
coefficients (Ci)

x 0

Thus, the first transform coefficient is the average value of


the sample sequence. In literature, this value is referred to as
the DC Coefficient. All other transform coefficients are called
the AC Coefficients.

Adaptive Threshold
Detector

III. PROPOSED ALGORITHM


The proposed adaptive algorithm for speech compression
using Cosine Packet Transform is shown in Fig 1. The speech
signal to be compressed is converted in to packets with finite
duration. The Discrete Cosine Transform is applied to each
packet and transformed coefficients are computed. The

Compressed Speech Signal


Fig 1. Flow diagram for the proposed adaptive algorithm

~ 1169

Authorized licensed use limited to: Jawaharlal Nehru Technological University. Downloaded on November 17, 2008 at 06:25 from IEEE Xplore. Restrictions apply.

International Conference on Intelligent and Advanced Systems 2007


Adaptive Threshold Detector
One of the most important processes of the proposed
compression algorithm is the threshold detector. The main role
of this process is to nullify all the coefficients obtained from
the Cosine Packet Transform smaller to a threshold value. This
is in fact the compression mechanism. This process is an
adaptive system, which automatically choose the threshold
value depending upon the transform coefficient value and
repeat the process for a certain condition.
Let us assume that the distortion parameter of a compression
system is a, a<1, N is the number of samples of signal to be
processed and Ex is the energy of the input speech signal, then
the threshold value is defined as

a . Ex
N

Coefficients from
DCT (Ci)

Compute the energy of


the input signal (Ex)

Initialize
b=-10log (a), a<1

Compute the
Threshold (t)

(7)

The constant a can be related with the signal to noise ratio of


the input signal x(t) and is defined as

b  10. log10 a

(8)

From the above equation a is given by

an

b
10
10

tn

Ex
N

YES

STOP
If
t > Ci
(9)

where an is nothing but the lower bound of a.


Using eqns (7) & (9) the lower bound value for the
threshold can be obtained as
b
10
10 .

YES
If
Ex > b

(10)

For the threshold a value t, superior to tn, an output signal to


noise ratio superior to b will be obtained. Unfortunately the
exact value of Ex will not be known a priori. This is the reason
why an adaptive algorithm for the election of the threshold
value is recommended. This algorithm can use the value tn
(obtained in the last relation) for initialization.
The flow diagram of adaptive threshold detector is shown in
Fig 2. The energy of the input signal to be compressed is
computed and the value of b is initialized. The threshold value
is calculated using eqn 10. The threshold value is increased
starting from this value. At every iteration the value Ex is
computed. If this value is higher than b then the extracted
coefficient is compared with threshold value t. If it is less then
the threshold value then the corresponding coefficients is
replaced with zero value otherwise the coefficients value is
maintained the same. The proposed adaptive algorithm is
stopped when for the first time the value Ex becomes smaller
than b.

Ci= Ci

Ci= 0

Compute the Energy (Ex)

Compute the New


Threshold t = t + 0.1
Fig 2. Flow diagram for the adaptive threshold detector

IV. SIMULATION RESULTS AND DISCUSSION


The various speech signal sample is simulated using
MATLAB. The generated speech signal sample is shown in Fig
3. The generated speech signal is segmented in to 15 packets
with 512 samples (the duration of each block being inferior to
25 ms) per packet. The Discrete Cosine Transform is computed
for each packets using eqn 4. The transformed coefficients are
extracted for further processing. The energy of the input signal
is computed and the threshold value is calculated using eqn 10.
The value of input energy is compared with b. If it is higher
then each and every transformed coefficient value is compared
with threshold value. The inferior coefficients are nullified.
The new energy of the signal is calculated and compared with
b. If energy is lower than b the above process is repeated for

1170 ~

Authorized licensed use limited to: Jawaharlal Nehru Technological University. Downloaded on November 17, 2008 at 06:25 from IEEE Xplore. Restrictions apply.

International Conference on Intelligent and Advanced Systems 2007


new threshold value otherwise the compression process is
stopped.

on the 20th sample. The proposed algorithm gives the better


compression ratio for most he the speech samples. The
comparison of compression ratio for speech signal sample from
1 to 10 and from 11 to 20 is plotted as shown in Fig 4 and 5
respectively for easy understanding.
Comparison of Compression ratio
60

DWT

Compression Ratio

50

DCT

40

WPT

30

Proposed
Algorithm

20

Fig 3. Speech Signal Sample

10

For 20 different speech signals, compression is performed


using Discrete Cosine Transform, Discrete Wavelet Transform,
Wavelet Packet Transform and the proposed adaptive
algorithm. The compression ratios achieved through these
methods are tabulated for various speech signal sample.

0
1

DCT

WPT

6.1229
6.1462
6.1462
6.1473
6.1452
6.1482
6.1473
6.1482
6.1482
6.1482
6.1479
6.1337
6.1477
6.1272
6.1461
6.1468
6.1461
6.1482
6.1452
6.1443

6.1421
6.1421
6.1421
6.1421
6.1421
6.1421
6.1421
6.1421
6.1421
6.1421
6.1421
6.1421
6.1421
6.1421
6.1421
6.1421
6.1421
6.1421
6.1421
6.1421

11.8444
12.1766
12.4397
12.3433
42.1520
51.5191
23.8063
40.9006
26.5968
35.1952
19.9917
21.5817
13.8164
30.9609
15.5718
30.4582
22.4461
29.2490
26.8928
60.5990

Proposed
Adaptive
algorithm
11.7985
12.2632
11.1433
12.8633
45.8856
55.7188
23.9820
43.0466
28.5052
36.0922
20.7104
21.7237
13.9029
31.1392
15.8481
31.6369
23.2086
30.8083
27.1709
68.0507

The Table I shows the comparison of compression ratio for


various methods. Analyzing the Table, the good performance
of the proposed adaptive algorithm can be observed. The
smallest compression rate, 11.1433, was obtained on the 3rd
sample and the better compression rate, 68.0507, was obtained

10

Comparison of Compression Ratio


70
DWT
60
Compression Ratio

DWT

4
5
6
7
Speech Signal Sam ple

Fig 4. Comparison of Compression ratio (Speech signal sample 1-10)

TABLE I COMPARISON OF COMPRESSION RATIO


Speech
Signal
Sample
1.
2.
3.
4.
5.
6.
7.
8.
9.
10.
11.
12.
13.
14.
15.
16.
17.
18.
19.
20.

DCT

50

WPT

40

Proposed
Algorithm

30
20
10
0
1

10

Speech Signal Sample

Fig 5. Comparison of Compression ratio (Speech signal sample 11-20

The analysis from the above figures show that out of 20


signal sample only 2 samples have a less compression ratio as
compared with WPT method and high as compared with other
two methods. The mean compression ratio for all the methods
are computed and tabulated in Table II.
TABLE II MEAN COMPRESSION RATIO
DWT

DCT

WPT

Proposed
Adaptive
algorithm

6.144

6.142

27.027

28.275

~ 1171

Authorized licensed use limited to: Jawaharlal Nehru Technological University. Downloaded on November 17, 2008 at 06:25 from IEEE Xplore. Restrictions apply.

International Conference on Intelligent and Advanced Systems 2007


The analysis form Table II shows that the mean compression
ratio for 20 samples was achieved using the proposed adaptive
algorithm is 28.275. This is a sufficiently high value, taking
into account the fact that any lossless compression method was
not used.
V. CONCLUSION
A new compression method based on adaptive threshold
detector is proposed and tested. The simulated results show that
the proposed algorithm gives the better compression ratio as
compared with other methods. Using this method, a mean
compression rate of 28.275, was obtained in the simulation
report. This value is superior to mean compression rate, of
other methods. Using fast DCT algorithm, the proposed
method can be implemented on a Digital Signal Processor. The
proposed system is a good alternative to the speech
compression systems based on the linear prediction
approaches.

REFERENCES
[1]. R. W. Yeung, A First Course in Information Theory, New York:
Kluwer Academic/Plenum Publishers, 2002.
[2]. A.Gersho, Advances in Speech and Video Compressions,
Proceedings of the IEEE, vol. 82, pp. 900-918, June 1994.
[3]. J.L.Flanagaran,
M.R.Schroeder,
B.S.Atal,
R.E.Crocherie,
N.S.Jayant and J.M.Tribolet, Speech Coding, IEEE Transactions
on Communications, vol. 27, pp.710-737, April 1979.
[4]. P.Noll, Wideband Speech and Audio Coding, IEEE
Communications Magazine, pp. 34-44, Nov. 1993.
[5]. K. Sayood and J. C. Borkenhagen, Use of residual redundancy in
the design of joint source/channel coders, IEEE Transactions on
Communications, 39(6):838-846, June 1991.
[6]. Edler, B., Coding of Audio Signals with Overlapping Block
Transform and Adaptive Window Functions, (in German),
Frequenz, vol.43, pp.252-256, 1989.
[7]. Q. Memon, T. Kasparis, Transform Coding of Signals Using
Approximate Trigonometric Expansions. Journal of Electronic
Imaging, Vol. 6, No. 4, October 1997, pp. 494-503.
[8]. C. E. Shannon, .A mathematical theory of communications,. Bell
System Technical Journal, vol. 27, pp. 379.423, 623.656, 1948.
[9]. A. N. Kolmogorov, .On the Shannon theory of information
transmission in the case of continuous signals,. Trans. IRE, vol. IT2, pp. 102.108, 1956.
[10]. D. L. Donoho, M. Vetterli, R. A. Devore, and I. Daubechies, .Data
compression and harmonic analysis,. IEEE Trans. Inf. Theory, vol.
44, no. 6, pp. 2435.2476, 1998.
[11]. N. Ahmed, T. Natarajan, and K. R. Rao, Discrete cosine
transform, IEEE Transactions on Computers, vol. C-32, pp. 90-93,
Jan. 1974.

1172 ~

Authorized licensed use limited to: Jawaharlal Nehru Technological University. Downloaded on November 17, 2008 at 06:25 from IEEE Xplore. Restrictions apply.

You might also like