This action might not be possible to undo. Are you sure you want to continue?

Welcome to Scribd! Start your free trial and access books, documents and more.Find out more

4, July 2010

Lossy audio coding flowchart based on adaptive time- frequency mapping, wavelet coefficients quantization and SNR psychoacoustic output

Khalil Abid

Laboratory of Systems and Signal Processing (LSTS) National Engineering School of Tunis ( ENIT ) BP 37, Le Belvédère 1002, Tunis, Tunisia Khalilabid06@yahoo.fr

**Kais Ouni and Noureddine Ellouze
**

Laboratory of Systems and Signal Processing (LSTS) National Engineering School of Tunis ( ENIT ) BP 37, Le Belvédère 1002, Tunis, Tunisia

Abstract—This paper describes a novel wavelet based audio synthesis and coding method. The adaptive wavelet transform selection and the coefficient bit allocation procedures are designed to take advantage of the masking effect in human hearing. They minimize the number of bits required to represent each frame of audio material at a fixed distortion level. This model incorporates psychoacoustic model into adaptive wavelet packet scheme to achieve perceptually transparent compression of highquality audio signals.

Keywords- D.W.T; Psychoacoustic Model; Signal to Noise Ratio; Quantization

reconstruct non-stationary signals efficiently. The audio signal is non-periodic and it varies temporally. The wavelet transform can be used to represent audio signals [14] by using the translated and scaled mother wavelets, which are capable to provide multi-resolution of the audio signal. This property of wavelet can be used to compress audio signal. The DWT consists of banks of low pass filters, high pass filters and down sampling units. Half of the filter convolution results are discarded because of the down sampling at each DWT decomposition stage [6] [11]. Only the approximation part of the DWT wavelet results is kept so that the number of samples is reduced by half. The level of decomposition is limited by the distortion tolerable from the resulting audio signal. II. STRUCTURE OF THE PROPOSED AUDIO CODEC

I.

INTRODUCTION

The vast majority of audio data on the Internet is compressed using some form of lossy coding, including the extremely popular MPEG1 Layer III (MP3) [1], Windows Media Archive (WMA) and Real Media (RM) formats. These algorithms can generally achieve compression ratios by using a combination of signal processing techniques, psychoacoustics and entropy coding,. most popular attention has been focused on lossy compression schemes like MP3, WMA and Ogg Vorbis. In general, these schemes perform some variant of either the Fast Fourier Transform (FFT) or Discrete Cosine Transformation (DCT) [8] to get a frequencybased representation of the sound waveform. Lossy algorithms generally take advantage of a branch of psychophysiology known as psychoacoustics that describes the ways in which humans perceive sound. By removing tones and frequencies that humans should not be able to hear, lossy algorithms can greatly simplify the nature of the data which they need to encode. By removing excess minor frequencies, the frequency representation of the sound data can now be efficiently compressed using any number of entropy coding techniques. The wavelet transform becomes an emerging signal processing technique [13] and it is used to decompose and

The main goal of this structure is to compress high quality audio maintaining transparent quality at low bit rates. In order to do this, the authors explored the usage of wavelets instead of the traditional Modified Discrete Cosine Transform (MDCT) [1]. Several steps are considered to achieve this goal: Design a wavelet representation for audio signals. Design a psychoacoustic model to perform perceptual coding and adapt it to the wavelet representation. Reduce the number of the non-zero coefficients of the wavelet representation and perform quantization over those coefficients. Perform extra compression to reduce redundancy over that representation Transmit or store the steam of data. Decode and reconstruct. Evaluate the quality of the compressed signal. Consider implementation issues.

49

http://sites.google.com/site/ijcsis/ ISSN 1947-5500

(IJCSIS) International Journal of Computer Science and Information Security, Vol. 8, No. 4, July 2010

**The flowchart of the proposed model is based on the following steps :
**

Start

Select the wavelet function

Define the level decomposition Wavelet

Define the audio wave file signal

Devide the audio wave file signal in N frames

Compression in the wavelet domain

Calculate the power spectrum density

Compute the masking thersholds

Calculate the tonality

Calculate the tone energy

Calculate the tone entropy

performance. The algorithm in level I uses a band pass filter bank that devides the audio signal into 32 equal width subbands [4]. This filter bank is also found in level II and III. The design of this filter bank is a compromise between computational efficiency and perceptual performance. The algorithm in level II is a simple enhancement of level I; it improves compression performance by coding the audio data in larger groups. Finally the level III algorithm is much more refined in order to come closer the critical bands [2] [5] . The psychoacoustic model is key component in the encoder. Its function is to analyze the spectral content of the input audio signal by computing the signal to noise ratio for each subband. This information is used by the quantizer-coder to decide the available number of bits to quantize each subband. This dynamic allocation of bits is performed so as to minimize the audibility of quantization noise. Finally frame-packing unit assembles the quantized audio samples into decodable bit stream. The decoder consists of three functional units: the frame unpacking unit, the frequency sample reconstruction and the frequency to time mapping. The decoder simply reverses the signal processing operations performed in the encoder, converting the received stream of encoded bits into time domain audio signal.

Audio signal (.wav)

Calculate the corresponding subband SNR

Define the quantization level Wavelet Decomposition Compute the offset to shift the memory location of the entire partition Psychoacoustic Model Wavelet Compression Bit Allocation Header

Reconstruct the signal based on the multi-level wavelet decomposition structure

Define the wavelet expander scheme in order to reconstruct the signal

Write the expanded audio wave file (compressed file)

Stream of Data Figure 2. The audio waveet encoder

Stop Figure 1. The different steps of the proposed audio wavelet compressed codec

Stream of Data

Header extraction

Wavelet Reconstructio n

The audio wave file is separated into small sections called frames (2048 samples). Each section is compressed using the proposed wavelet encoder and decoder. The encoder is consisting in four functional unit: the time to frequency mapping , the psychoacoustic model, the quantizer & coder and the frame packing unit. The function of the time to frequency mapping is used to decompose the input audio signal into multiple subbands for coding. This mapping is performed in three levels, labeled I ,II & III, which are caracterised with increasing complexity, delay and subjective

**Audio Compressed signal
**

Figure 3. The audio wavelet decoder

III.

THE PSYCHOACOUSTIC MODEL

The psychoacoustic model is a critical part of perceptual audio coding that exploits masking properties of the human auditory system. The psychoacoustic model analyzes signal content and combines induced masking curves to determine

50

http://sites.google.com/site/ijcsis/ ISSN 1947-5500

(IJCSIS) International Journal of Computer Science and Information Security, Vol. 8, No. 4, July 2010

Frequency (Bark)

what information below the masking threshold that is perceptually inaudible and should be removed. The psychoacoustic model is based on many studies of human perception. These studies have shown that the average human doesn’t hear all frequencies the same. Effects due to different sounds in the environment and limitations of the human sensory system lead to facts that can be used to cut out unnecessary data in an audio signal. The two main properties of the human auditory system that make up the psychoacoustic model are the absolute threshold of hearing [1] [15] and the auditory masking [1]. Each one provides a way of determining which portions of a signal are inaudible and indiscernible to the average human, and can thus be removed from a signal. A. The Absolute Threshold of Hearing . To determine the effect of frequency on hearing ability, scientists played a sinusoidal tone at a very low power. The power was slowly raised until the subject could hear the tone. This level was the threshold at which the tone could be heard. The process was repeated for many frequencies in the human auditory range and with many subjects. As a result, the following plot was obtained. This experimental data can be modeled by the following equation, where f is frequency in Hertz [2]:

Tq ( f ) = 3.64(

f −0.6( − 3.3) 2 f −0.8 f 4 + 0.001( ) − 6.5e 1000 ) 1000 1000

v( f ) = 13artg (0.00076) + 3.5artg[(

f 2 )] 7500

(2)

25

20

15

10

5

0

0

0.2

0.4

0.6

0.8 1 1.2 Frequency (Hz)

1.4

1.6

1.8 x 10

2

4

Figure 5. Relationship between Hertz and Bark Frequencies

(1)

60 50 40 30 20 10 0 -10 -20 0

C. Tone and Noise Masker Identification Masking curces of tonal and noise maskers [1] have different shapes [1] therefore it is necessary to separate them.. To find tonal components it is necessary to find local maximas and then compare them with their neighbourhood components. This action hints Eq. 3 [1] [3]: S SPL (i ) − S SPL (i ± ∆ i ) ≥ 7 (3) where:

S u d P ssu L ve (d ) o n re re e l B

∆i = +2

for for for for

∆i = +2 , +3

∆i = +2…+6 ∆i = +2 …+12

5000 10000 Frequency (Hz) 15000 20000

i ∈ [127, 255[

i ∈ [ 63,127[

i ∈ ]2, 63[

(4) (5) (6) (7)

i ∈ [ 255,512[

Figure 4. The absolute thershold of hearing

According to ISO/IEC MPEG1, Psychocacoustic Analysis Model1 of MPEG1 audio standard [ 1] sound pressure level of the tonal masker is computed by Eq.8 as a summation of the spectral density of the masker and its neighbours:

B. The Bark Frequency Scale After many studies, scientists found that the frequency range from 20 Hz to 20000 Hz [3] [10] can be broken up into critical bandwidths [12], which are non-uniform, non-linear, and dependent on the heard sound. Signals within one critical bandwidth are hard to separate for a human observer [7]. A more uniform measure of frequency based on critical bandwidths is the Bark. From the earlier discussed observations, one would expect a Bark bandwidth to be smaller at low frequencies (in Hz) and larger at high ones. Indeed, this is the case. The Bark frequency scale can be approximated by the following equation [2]:

X TM (i ) = 10.log10 ( ∑ 10

j =−1

1

S SPL ( i + j ) 10

) [dB]

(8)

Sound Pressure level of the noise maskers is computed according to Eq. 9 as a summation of the sound pressure level of all spectral components in corresponding critical band.

X NM (i ) = 10.log10 ( ∑ 10

j =−1

1

S SPL ( i ) 10

) [dB] ,

y (i) ∈ b (9)

where b represents the critical band, i index spectral components that lies in the corresponding critical band. Noise maskers are placed in the middle of the corresponding critical band.

51

http://sites.google.com/site/ijcsis/ ISSN 1947-5500

**(IJCSIS) International Journal of Computer Science and Information Security, Vol. 8, No. 4, July 2010
**

100 Tonal Component 80 Tonal Masker 60 Amplitude (dB)

Where

∆ y = y (i ) − y ( j ) represents bark distance form the

masker in barks. Note : Outside the interval [-3,8[ , MF is equal to −∞ Masking curves of the noise maskers is defined by ISO/IEC MPEG1 Psychoacoustic Analysis Model 1 [1] and it is similar to the tone masker. The noise is defined by the following equation [1]: (15) M NM (i, j ) = X NM (i ) + MF (i, j ) − 0.175. y ( j ) + 2.025 Where X NM is a Sound Pressure Level of the noise masker. The constant 2.025 represents the top of the masking curve.

40

20

0

-20 0

5000

10000 Frequency (Hz)

15000

20000

IV.

BIT ALLOCATION

Figure 6. The Tonal Components and Tonal Maskers

100

In order to determine the number of bits corresponding to each troncatured audio wave signal (2048 samples) we proceeded the following algorithm: We start by listing all the tonal components characterized by the following condition [1] [9] : (16) X (i ) > X (i − 1) & X (i ) > X (i + 1) Where X (i ) is the sound pressure level of the indexing ( i ) tonal component For each tonal masker corresponding to the indexed ( i ) tonal component , we calculate the corresponding tone energy caracterized by the following equation:

80

Noise Component Noise Masker

60 Amplitude (dB)

40

20

0

** X (i−1) 2 X (i) 2 X (i+1) 2 Etm (i) =10.log10 10 10 + 10 10 + 10 10
**

5000 10000 Frequency (Hz) 15000 20000

(17)

-20 0

Then, we calculate the global energy of the all tones energie corresponding to the troncatured audio wave signal (2048 samples).

Figure 7. The Noise Components and Noise Masker

EG = ∑10

i =1

Ntm

Etm ( i ) 10

(18)

D. Masking Thershold Calculation When tonal and noise maskers are identified, the masking thershold for each masker is determined. As defined in ISO/IEC MPEG1 Psychoacoustic Analysis Model 1 of MPEG1 audio standard [ ] tonal masker masking curve can be calculated the following equation 10 [1]:

M TM (i, j ) = X TM (i ) + MF (i, j ) − 0.275. y ( j ) + 6.025 (10 )

Where

X TM is a Sound Pressure Level of the tone masker.

**y ( j ) is the masking curve position on the bark axis. MF (i, j ) is a masking function defined by Eq. 11. The
**

constant 6.025 represents the top of the masking curve

Note: N tm is the total number of tonal maskers All this allows to deduce the entropy using the following equation : Ntm Etm ( i ) ∑ 10 10 (19) E = 10.log10 i =1 N tm SNR is calculated using Eq.20 as a subtraction of the maximum of sound pressure level and the entropy: [dB] (20) SNR = Max ( X ) − E Finally the number of bits corresponging to the troncatured signal is given by the following equation:

SNR nb = 6.02

MF (i, j ) = 17.∆ y − 0.4 X TM (i) + 11

∆ y ∈ [ −3, −1[

(11) (12)

MF (i, j ) = (0.4 X TM (i ) + 6).∆ y

MF (i, j ) = −17.∆ y

MF (i, j ) = (1 − ∆ y ).(17 − 0.15. X TM (i )) − 17

Identify applicable sponsor/s here. (sponsors)

∆ y ∈ [ −1, 0[

∆ y ∈ [1,8[ (14 )

∆ y ∈ [ 0,1[ (13 )

(21)

V.

DIAGRAM OF THE WAVELET ENCODER AND DECODER

The flowchart of the wavelet codec is devided in 5 parts :

52

http://sites.google.com/site/ijcsis/ ISSN 1947-5500

Start

y= wavread( Audio File.wav)

**y Number of ( Troncatured _ Signal )= fix ( ) 1024 _ Samples 1024
**

j=0

Troncatured Signal= y( [1024j+1: 1024(j+1)] )

j=j+1

False

j= fix ( 1024 ) -1

True

y

Troncatured Signal= y( [1024j+1: length(y)] )

[CF1,CF2]= wavedec ( Troncatured Signal, N ,’wname’ )

Find default values [THR,SORH,KEEPAP]= ddencmp ( ‘cmp’, ‘wv’, N , Troncatured Signal )

Performs a compression process of a ‘Troncaured signal’ [XC,CXC,LXC,PERF0,PERFL2]= wdencmp ( ‘gbl’, CF1,CF2, wname, N, THR, SORH,KEEPAPP)

CF1= CXC

CF2=LXC

A

Figure 8. Diagram of the wavelet encoder and decoder (part 1)

53

http://sites.google.com/site/ijcsis/ ISSN 1947-5500

A

X=powernormconst+10.*log10((abs(fft(Troncatured signal,fftlength))).^2)

for i =1:length(X)

False

if ( (2<=i ) &( i <=500))

True

Bool=’0

**False if ( (X(i-1)<X(i) ) &( X(i )<X(i+1) )) True False
**

Bool=’0

**if ((2<i ) & (i <63))
**

True

bool = ( X(i) - X(i-2) > 7 ) & (X(i)- X(i+2)>7 )

False if ((63<=i ) & (i <127))

True bool = ( X(i) - X(i-2) > 7 ) & (X(i)- X(i+2)>7 ) & ( X(i) - X(i-3) > 7 ) & ( X(i) - X(i+3) > 7 )

False if ((127<=i ) & (i <255)) True bool = ( X(i) - X(i-2) > 7 ) & (X(i)- X(i+2)>7 ) & ( X(i) - X(i-3) > 7 ) & ( X(i) - X(i+3) > 7 ) & (X(i) - X(i-4) > 7 ) & (X(i)- X(i+4)>7 ) & ( X(i) - X(i-5) > 7 ) & ( X(i) - X(i+5) > 7 ) & (X(i) - X(i-6) > 7 ) & ( X(i) - X(i+6) > 7 )

**False if ((255<=i ) & (i <=500)) True bool =(X(i) -X(i-2) >7) & (X(i)- X(i+2)>7 ) &( X(i) - X(i-3) >7) &(X(i) - X(i+3)>7)&(X(i)-X(i-4)> 7)&(X(i)-X(i+4)>7 ) & ( X(i) - X(i-5) >7)&( X(i) - X(i+5) > 7 ) &(X(i) - X(i-6) > 7) &( X(i) - X(i+6) >7)&(X(i) - X(i-7) >7) & ( X(i) X(i+7) > 7 ) & (X(i) - X(i-8) >7) & ( X(i) -X(i+8) >7) & (X(i) - X(i-9) > 7) &( X(i) -X(i+9) > 7) &(X(i) - X(i-10) > 7)& ( X(i) - X(i+10) > 7 ) & (X(i) -X(i-11) > 7)&(X(i) -X(i+11) >7)&(X(i) -X(i-12) > 7 ) & ( X(i) - X(i+12) > 7)
**

Bool=’0

False Bool=’1’ True

B

E

Figure 9. Diagram of the wavelet encoder and decoder (part 2)

54

http://sites.google.com/site/ijcsis/ ISSN 1947-5500

B

Etm = zero (1, Ntm)

E

Etm(i )= 10*log10 ((10.^ (0.1.* X(i-1))) .^2+(10.^ (0.1.* X(i))) .^2+(10.^ (0.1.* X(i-1))) .^2)

For i= 1:1:Ntm

Find the Entropy E=10*log10(10.^(Etm(i)/10)/Ntm))

Find Signal to Noise Ratio: SNR=max (X)-E

Number of bits required for quantization :

nb=fix(0.166*SNR)

False

**if Quantization= ‘ON’
**

True

Implementation of A-law compressor: C=command (C, 87.6 , max(C), ‘ A /compressor’)

Z=C

[index, quant, distor] = quantiz(Z, min(Z):((max(Z)-min(Z))/2^nb):max(Z) , min(Z):((max(Z)-min(Z))/2^nb):max(Z))

For k = 0 : fix(y/1024) -1

False

if Z(k+1)=0

True

vector= - quant(k+1)

quant=vector + quant

C=quant

E C1

Figure 10. Diagram of the wavelet encoder and decoder (part 3)

55

http://sites.google.com/site/ijcsis/ ISSN 1947-5500

C1

E

Avar=[Avar C]

Cvar=[Cvar length(C)]

Bvar=[Bvar L]

S= length(L)

W=length (Avar) )

W= length(Bvar)

Avar=Avar(1,w)

Cvar=[Cvar(2) Cvar( fix( y /1024 )+2)

Bvar=[Bvar (2 :S+1) Bvar ( 1+(S*fix(y/1024)) : Z]

t=0

Decsig= Avar([((1+(Cvar(2)*t) : (t+1)*Cvar(2)] )

t=t+1

Max(Decsig) == 0

False

False

t= fix ( 1024 ) -1

True

y

Decsig= Avar([((1+(Cvar(2)*t) : t*Cvar(2)+ Cvar(3)] )

False Max(Decsig) == 0

Decsig=compand( Decsig, 255, Max(Decsig), ‘mu/ expander’) Decsig= compand( Decsig, 255, Max(Decsig), ‘mu/ expander’)

D1 D2

Figure 11. Diagram of the wavelet encoder and decoder (part 4)

E

56

http://sites.google.com/site/ijcsis/ ISSN 1947-5500

D1

D2

E

Xr= wavrec(Decsig, Bvar(1 :S), ‘wname’)

Xr= wavrec(Decsig, Bvar(S+1 : Z), ‘wname’)

Xrf=[Xrf Xr]

R= Length (Xrf)

Xrf= Xrf(2 : R)

Wavwrite( Xrf)

Stop

Figure 12. Diagram of the wavelet encoder and decoder (part 5)

VI.

IMPLEMENTATION AND RESULTS

Size _(Original _ File) CR = Size _(Compressed _ File)

(23)

The proposed wavelet−packet audio codec is realized as m files and simulated using MATLAB software. We adjust parameters such as structure of the decomposition tree, frame size, number of wavelet coefficients, etc. The suitable set of parameters is selected to optimize among decoded audio quality, encoded bit rate and computation complexity. A number of quantitative parameters can be used to evaluate the performance of the proposed audio wavelet codec, in terms of reconstructed signal quality after decoding. The used quantitative parameters are the Signal to Noise Ratio (SNR) and the compression ratio wich are calculated for different types of wavelet A. The signal to noise ratio

C. Results In order to evaluate the proposed codec, we used for various wavelet (‘haar’, ‘coif’, ‘morl’, ‘meyr’, ‘dB’) some types of sound such as Soul, Slow, and Rock. The evaluation is based on the SNR and the compression ratio.

The original signal:"Rock Sound.wav" 2 Amplitude 1.5 1 0.5 0 0 1 2 4 5 6 7 Time (Seconde) The 'Rock Sound' compressed signal using the proposed wavelet codec 3 8 9 10

**σ SNR = 10.log10 σ
**

2 x 2 e

(22)

1.6 Amplitude

is the mean square of the speech signal and σ is the mean square difference between the original and reconstructed signals.

2 x 2 e

σ

1.2 0.8 0.4 0 0 1 2 3 4 5 6 Time (Seconde) 7 8 9 10

B. Compression ratio The compression ratio is defined as the quotion between the original audio size file and the compressed one.

Figure 13. The original signal and the wavelet compressed signal using (bitrate=128Kbits/s wavelet= ‘db’ ‘Rock sound.wav’)

57

http://sites.google.com/site/ijcsis/ ISSN 1947-5500

The original signal:"Soul Sound.wav" 1

TABLE III. Wavelet name SNR CR

EVALUATION OF THE PROPOSED CODEC USING DIFFERENT WAVELET (ROCK MUSIC. WAV ) haar 30.091 5.918 coif 30.411 6.628 morl 31.903 6.992 meyr 30.104 5.758 db 31.515 7.735

A m plitude

0.5

**VII. AUDIO QUALITY MEASURE USING MEAN OPINION
**

5 0 4 5 6 7 8 9 Time (Second) The 'Soul Sound' compressed signal using the proposed wavelet codec 1 2 3

SCORE

0.8 0.6 A m plitude 0.4 0.2 0 0

1

2

3

4 5 Time (Second)

6

7

8

9

Figure 14. The original signal and the wavelet compressed signal using (bitrate=128Kbits/s wavelet= ‘db’ ‘soul sound.wav’)

The original signal:"Slow Sound." 0.8 Amplitude 0.6 0.4 0.2 0 0 1 2 3 Time (Second) 4 5 6

The 'Slow Sound' compressed signal using the proposed wavelet codec' 0.1 Amplitude

0.05

It is hard to objectively measure the performance of audio compression in the realm of perceptual media, due to the variation in human senses, and the qualitative nature of such a process. However, some attempt has been made to do this. As a measure of quality, the most popular subjective assessment method is the mean opinion scoring where subjects classify the quality of coders on an N-point quality scale. The final result of such tests is an averaged judgement called the mean opinion score (MOS). 5-point adjectival grading scales are in use, one for signal quality, and the other one for signal impairment, and an associated numbering. The 5-point ITU-R impairment scale of Table 4 is extremely useful if coders with only small impairments have to be graded. For this purpose, we invited several subjects to hear some wavelet compressed files resulting from the proposed codec based on wavelet analysis. The protocol of evaluation consists in listening to the wavelet compressed sound file. Then, the listeners can listen to it as long as they wish. The listeners are 12: 6 men and 6 women between 15 and 30 years old. Our aim is to determine the best wavelet compression sound quality for each type of sound in a statistic card as shown in Figure 16. Note: The sound quality histogram amplitude represent the sum of the integers scores given by the 12 listeners.

0

0

1

2

3 Time (Second)

4

5

6

Mos Score

Figure 15. The original signal and the wavelet compressed signal using (bitrate=128Kbits/s wavelet= ‘db’ ‘slow sound.wav’)

TABLE I. Wavelet name SNR CR TABLE II. Wavelet name SNR CR

EVALUATION OF THE PROPOSED CODEC USING DIFFERENT WAVELET (SOUL MUSIC.WAV) haar 30.514 5.762 coif 31.379 6.342 morl 30.282 6.117 meyr 30.461 5.451 db 31.012 7.249

50 45 40 35 30 25 20 15 10 5 0 Haar Morl meyr db Wavelet

Rock Soul Slow

EVALUATION OF THE PROPOSED CODEC USING DIFFERENT WAVELET (SLOW MUSIC.WAV) haar 30.631 5.663 coif 30.233 6.817 morl 31.681 6.219 meyr 30.298 5.358 db 30.767 7.471

Figure 16. The MOS diagram wavelet listening test

58

http://sites.google.com/site/ijcsis/ ISSN 1947-5500

TABLE IV. 5-POINTS MOS IMPAIRMENT SCALE Impairment scale Perceptible Perceptible, but not annoying Slightly annoying Annoying Very annoying [8] Mean opinion score 5 4 3 2 1 M. Bosi and R. E. Goldberg, Introduction to Digital Audio Coding and Standards, Kluwer Academic Publishers, New York, NY, USA,

2003.

[9] D. Sinha and A. H. Tewfik, “Low bit rate transparent audio compression using adapted wavelets,” IEEE Transactions on Signal Processing, vol. 41, no. 12, pp. 3463–3479, 1993. [10] M. R. Zurera, F. L. Ferreras, M. P. J. Amores, S.M. Basc´ on, and N. R. Reyes, “A new algorithm for translating psycho-acoustic information to the wavelet domain,” Signal Processing, vol. 81, no.

VIII. CONCLUSION AND FUTURE WORK Audio compression coding is currently an active topic for research in the areas of circuit technologies and Digital Signal Processing (DSP). The Wavelet Transform performs very well in the compression of recorded audio signals. Point of view compression ratio, using wavelets can be easily varied, while most other compression techniques have fixed compression ratios. Further data compaction is possible by exploiting the redundancy in the encoded transform coefficients. A bit encoding scheme could be used to represent the data more efficiently. A common loss-less coding technique is Entropy coding. Two common entropy coding schemes are Prefix coding and tree-structured Huffman coding.

REFERENCES [1] ISO/IEC 11172-3, “Information technology—coding of moving picture and associated audio for digital storage media at up to about 1.5 Mbits—part 3: audio,” 1993. Z. Hajayej, Etude, mise en oeuvre et évaluation des techniques de paramétrisation perceptive des signaux de parole. Application à la reconnaissance de la parole par les modèles de Markov cachés, PhD Thesis on Electrical Engineering, National Engineering School of Tunis, October 2009 T. Painter and A. Spanias, “Perceptual coding of digital audio,” Proceedings of the IEEE, vol. 88, no. 4, pp. 451–512, 2000. M. D. Swanson, B. Zhu, A. H. Tewfik, and L. Boney, “Robust audio watermarking using perceptual masking,” Signal Processing, vol. 66, no. 3, pp. 337–355, 1998. P.R. Deshmukh, Multi-wavelet Decomposition for Audio Compression, IE(I) Journal –ET, Vol 87, July 2006 Q. Liu, “Digital audio watermarking utilizing discrete wavelet packet transform,” M.S. thesis, Institute of Networking and Communication, Chaoyang University of Technology, Taichung, Taiwan, 2004. J. D. Johnston, “Transform coding of audio signals using perceptual noise criteria,” IEEE Journal on Selected Areas in Communications,

3, pp. 519–531, 2001.

[11] B. Carnero and A. Drygajlo, “Perceptual speech coding and enhancement using frame-synchronized fast wavelet packet transform algorithms,” IEEE Transactions on Signal Processing, vol. 47, no. 6, pp. 1622–1635, 1999. [12] C. Wang, Y. C. Tong, An improved critical-band transform processor for speech applications, Circuits and Systems, May 2004, pp 461–464 [13] I. Daubechies, Ten Lectures on Wavelets, vol. 61 of CBMSNSF Regional Conference Series in AppliedMathematics, SIAM, Philadelphia, Pa, USA, 1992. [14] P. Rajmic, J. Vlach, Real-time Audio Processing Via Segmented wavelet Transform, 10th International Conference on Digital Audio Effect , Bordeaux, France, Sept. 2007 [15] B. Lincoln, “An experimental high fidelity perceptual audio coder,”

**Project in MUS420 Win97, March 1998.
**

AUTHORS PROFILE K. Abid received the B.S. degree in Electrical Engineering from the National School of Engineering of Tunis, (ENIT), Tunisia, in 2005, and the M.S degree in Automatic and Signal Processing in 2006 from the same school. He started preparing his Ph.D. degree in Electrical Engineering in 2007. His research interesets in Audio Compression Using Multiresolution Analysis K. Ouni received the M.Sc. from Ecole Nationale d’Ingénieurs de Sfax in 1998, the Ph.D. from Ecole Nationale d’Ingénieurs de Tunis, (ENIT), in 2003, and the HDR in 2007 from the same institute. He has published more than 70 papers in Journals and Proceedings. Professor Kaïs Ouni is currently the Electrical Engineering Department Head at Institut Supérieur des Technologies Médicales de Tunis (ISTMT), Tunisia. He is also a researcher at Systems and Signal Processing Laboratory (LSTS), ENIT, Tunisia. His researches concern speech and biomedical signal processing. He is Member of the Acoustical Society of America and ISCA (International Speech Communication Association). N. Ell ouze was born in 19 December, 1945. He received a Ph.D. degree in 1977 at INP (Toulouse- France), and Electronic Engineering Diploma from ENSEEIHT in 1968 University P. Sabatier. in 1978. Pr. Ellouze joined the Electrical Engineering Department at ENIT (Tunisia). In 1990, he became Professor in signal processing, digital signal processing and stochastic process. He was the head of the Electrical Department from 1978 to 1983 and General Manager and President of IRSIT from 1987-1994. He is now Director of Research

[2]

[3] [4]

[5] [6]

[7]

vol. 6, no. 2, pp. 314–323, 1988.

59

http://sites.google.com/site/ijcsis/ ISSN 1947-5500

- Journal of Computer Science IJCSIS March 2016 Part II
- Journal of Computer Science IJCSIS March 2016 Part I
- Journal of Computer Science IJCSIS April 2016 Part II
- Journal of Computer Science IJCSIS April 2016 Part I
- Journal of Computer Science IJCSIS February 2016
- Journal of Computer Science IJCSIS Special Issue February 2016
- Journal of Computer Science IJCSIS January 2016
- Journal of Computer Science IJCSIS December 2015
- Journal of Computer Science IJCSIS November 2015
- Journal of Computer Science IJCSIS October 2015
- Journal of Computer Science IJCSIS June 2015
- Journal of Computer Science IJCSIS July 2015
- International Journal of Computer Science IJCSIS September 2015
- Journal of Computer Science IJCSIS August 2015
- Journal of Computer Science IJCSIS April 2015
- Journal of Computer Science IJCSIS March 2015
- Fraudulent Electronic Transaction Detection Using Dynamic KDA Model
- Embedded Mobile Agent (EMA) for Distributed Information Retrieval
- A Survey
- Security Architecture with NAC using Crescent University as Case study
- An Analysis of Various Algorithms For Text Spam Classification and Clustering Using RapidMiner and Weka
- Unweighted Class Specific Soft Voting based ensemble of Extreme Learning Machine and its variant
- An Efficient Model to Automatically Find Index in Databases
- Base Station Radiation’s Optimization using Two Phase Shifting Dipoles
- Low Footprint Hybrid Finite Field Multiplier for Embedded Cryptography

by ijcsis

This paper describes a novel wavelet based audio synthesis and coding method. The adaptive wavelet transform selection and the coefficient bit allocation procedures are designed to take advantage o...

This paper describes a novel wavelet based audio synthesis and coding method. The adaptive wavelet transform selection and the coefficient bit allocation procedures are designed to take advantage of the masking effect in human hearing. They minimize the number of bits required to represent each frame of audio material at a fixed distortion level. This model incorporates psychoacoustic model into adaptive wavelet packet scheme to achieve perceptually transparent compression of high quality audio signals.

- <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Tdgransitional//EN" "http://www.w3.org/TR/html4/loose.dtd"> <HTML><HEAD><META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=iso-8859-1"> <TITLE>ERROR: The requested URL could not be retrieved</TITLE> <STYLE type="text/css"><!--BODY{background-color:#ffffff;font-family:verdana,sans-serif}PRE{font-family:sans-serif}--></STYLE> </HEAD><BODY> <H1>ERROR</H1> <H2>The requested URL could not be retrieved</H2> <HR noshade size="1px"> <P> While trying to process the request: <PRE> TEXT http://www.scribd.com/titlecleaner?title=Robust+speed+control.docx HTTP/1.1 Host: www.scribd.com Proxy-Connection: keep-alive Accept: */* Origin: http://www.scribd.com X-CSRF-Token: 14399c013c57cff05f8306d93b3155670fbfda4b User-Agent: Mozilla/5.0 (Windows NT 5.1) AppleWebKit/537.31 (KHTML, like Gecko) Chrome/26.0.1410.64 Safari/537.31 X-Requested-With: XMLHttpRequest Referer: http://www.scribd.com/upload-document?archive_doc=33297239&metaby Naeemrind

- 21ms414
- SmartAnt LMS RLSalgo
- Overland Conveyor Bulk Flow Analyst 15
- Session I 1 Software Defined Radio Zhou
- Aarti_Sharma_601161010
- balancing and navigation.pdf
- <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Tdgransitional//EN" "http://www.w3.org/TR/html4/loose.dtd"> <HTML><HEAD><META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=iso-8859-1"> <TITLE>ERROR
- 2
- 3
- ANC
- EEE Syllabus 22 June 2013
- jhkk ioh
- Implementation of a Real-Time Spectrum Analyzer.pdf
- 03 28 Heat Ex Changer Fan
- 182
- Design Triangular Microstrip Patch Antenna
- pdf
- Amps a Design Philosophy
- Kanakidis Et Al. - JLT March 2003
- ME2404_Computer Aided Simulation and Analysis Lab Manual
- Genes Ys
- Compressor
- ase_valve
- Le Thanh2011-Course Report Simulation of a Communications System Using Matlab
- www.ijerd.com
- 4_Measurement in a DWDM System
- Paper Presentation for Mullana.doc
- 804_1
- IRJET-POWER OPTIMIZATION USING BODY BIASING METHOD FOR DUAL VOLTAGE FPGA
- antenna

Are you sure?

This action might not be possible to undo. Are you sure you want to continue?

We've moved you to where you read on your other device.

Get the full title to continue

Get the full title to continue listening from where you left off, or restart the preview.

scribd