Welcome to Scribd, the world's digital library. Read, publish, and share books and documents. See more
Download
Standard view
Full view
of .
Look up keyword
Like this
2Activity
0 of .
Results for:
No results containing your search query
P. 1
Lossy audio coding flowchart based on adaptive time- frequency mapping, wavelet coefficients quantization and SNR psychoacoustic output

Lossy audio coding flowchart based on adaptive time- frequency mapping, wavelet coefficients quantization and SNR psychoacoustic output

Ratings: (0)|Views: 139 |Likes:
Published by ijcsis
This paper describes a novel wavelet based audio synthesis and coding method. The adaptive wavelet transform selection and the coefficient bit allocation procedures are designed to take advantage of the masking effect in human hearing. They minimize the number of bits required to represent each frame of audio material at a fixed distortion level. This model incorporates psychoacoustic model into adaptive wavelet packet scheme to achieve perceptually transparent compression of high quality audio signals.
This paper describes a novel wavelet based audio synthesis and coding method. The adaptive wavelet transform selection and the coefficient bit allocation procedures are designed to take advantage of the masking effect in human hearing. They minimize the number of bits required to represent each frame of audio material at a fixed distortion level. This model incorporates psychoacoustic model into adaptive wavelet packet scheme to achieve perceptually transparent compression of high quality audio signals.

More info:

Published by: ijcsis on Aug 12, 2010
Copyright:Attribution Non-commercial

Availability:

Read on Scribd mobile: iPhone, iPad and Android.
download as PDF, TXT or read online from Scribd
See more
See less

10/25/2012

pdf

text

original

 
(IJCSIS) International Journal of Computer Science and Information Security,Vol. 8, No. 4, July 2010
Lossy audio coding flowchart based on adaptivetime- frequency mapping, wavelet coefficientsquantization and SNR psychoacoustic output
Khalil Abid
Laboratory of Systems and Signal Processing (LSTS)National Engineering School of Tunis ( ENIT )BP 37, Le Belvédère 1002, Tunis, TunisiaKhalilabid06@yahoo.fr
Kais Ouni and Noureddine Ellouze
Laboratory of Systems and Signal Processing (LSTS)National Engineering School of Tunis ( ENIT )BP 37, Le Belvédère 1002, Tunis, Tunisia 
 Abstract 
This paper describes a novel wavelet basedaudio synthesis and coding method. The adaptive wavelettransform selection and the coefficient bit allocationprocedures are designed to take advantage of the maskingeffect in human hearing. They minimize the number of bitsrequired to represent each frame of audio material at afixed distortion level. This model incorporatespsychoacoustic model into adaptive wavelet packet schemeto achieve perceptually transparent compression of high-quality audio signals.
 Keywords- D.W.T; Psychoacoustic Model; Signal to Noise Ratio; Quantization
I.
 
I
NTRODUCTION
The vast majority of audio data on the Internet iscompressed using some form of lossy coding, including theextremely popular MPEG1 Layer III (MP3) [1], WindowsMedia Archive (WMA) and Real Media (RM) formats. Thesealgorithms can generally achieve compression ratios by usinga combination of signal processing techniques,psychoacoustics and entropy coding,. most popular attentionhas been focused on lossy compression schemes like MP3,WMA and Ogg Vorbis. In general, these schemes performsome variant of either the Fast Fourier Transform (FFT) orDiscrete Cosine Transformation (DCT) [8] to get a frequency-based representation of the sound waveform. Lossy algorithmsgenerally take advantage of a branch of psychophysiologyknown as psychoacoustics that describes the ways in whichhumans perceive sound. By removing tones and frequenciesthat humans should not be able to hear, lossy algorithms cangreatly simplify the nature of the data which they need toencode. By removing excess minor frequencies, the frequencyrepresentation of the sound data can now be efficientlycompressed using any number of entropy coding techniques.The wavelet transform becomes an emerging signalprocessing technique [13] and it is used to decompose andreconstruct non-stationary signals efficiently. The audio signalis non-periodic and it varies temporally. The wavelettransform can be used to represent audio signals [14] by usingthe translated and scaled mother wavelets, which are capableto provide multi-resolution of the audio signal. This propertyof wavelet can be used to compress audio signal. The DWTconsists of banks of low pass filters, high pass filters and downsampling units. Half of the filter convolution results arediscarded because of the down sampling at each DWTdecomposition stage [6] [11]. Only the approximation part of the DWT wavelet results is kept so that the number of samplesis reduced by half. The level of decomposition is limited bythe distortion tolerable from the resulting audio signal.II.
 
S
TRUCTURE OF THE
P
ROPOSED
A
UDIO
C
O
D
EC
 The main goal of this structure is to compress high qualityaudio maintaining transparent quality at low bit rates. In orderto do this, the authors explored the usage of wavelets insteadof the traditional Modified Discrete Cosine Transform(MDCT) [1]. Several steps are considered to achieve this goal:
 
Design a wavelet representation for audio signals.
 
Design a psychoacoustic model to perform perceptualcoding and adapt it to the wavelet representation.
 
Reduce the number of the non-zero coefficients of thewavelet representation and perform quantization over thosecoefficients.
 
Perform extra compression to reduce redundancy over thatrepresentation
 
Transmit or store the steam of data. Decode and reconstruct.
 
Evaluate the quality of the compressed signal.
 
Consider implementation issues.
49http://sites.google.com/site/ijcsis/ISSN 1947-5500
 
(IJCSIS) International Journal of Computer Science and Information Security,Vol. 8, No. 4, July 2010
The flowchart of the proposed model is based on the followingsteps :
Figure 1. The different steps of the proposed audio wavelet compressedcodec
The audio wave file is separated into small sections calledframes (2048 samples). Each section is compressed using theproposed wavelet encoder and decoder. The encoder isconsisting in four functional unit: the time to frequencymapping , the psychoacoustic model, the quantizer & coderand the frame packing unit. The function of the time tofrequency mapping is used to decompose the input audiosignal into multiple subbands for coding. This mapping isperformed in three levels, labeled I ,II & III, which arecaracterised with increasing complexity, delay and subjectiveperformance. The algorithm in level I uses a band pass filterbank that devides the audio signal into 32 equal widthsubbands [4]. This filter bank is also found in level II and III.The design of this filter bank is a compromise betweencomputational efficiency and perceptual performance. Thealgorithm in level II is a simple enhancement of level I; itimproves compression performance by coding the audio datain larger groups. Finally the level III algorithm is much morerefined in order to come closer the critical bands [2] [5] . Thepsychoacoustic model is key component in the encoder. Itsfunction is to analyze the spectral content of the input audiosignal by computing the signal to noise ratio for each subband.This information is used by the quantizer-coder to decide theavailable number of bits to quantize each subband. Thisdynamic allocation of bits is performed so as to minimize theaudibility of quantization noise. Finally frame-packing unitassembles the quantized audio samples into decodable bitstream. The decoder consists of three functional units: theframe unpacking unit, the frequency sample reconstructionand the frequency to time mapping. The decoder simplyreverses the signal processing operations performed in theencoder, converting the received stream of encoded bits intotime domain audio signal.
Figure 2. The audio waveet encoderFigure 3. The audio wavelet decoder
III.
 
T
HE PSYCHOACOUSTIC MODEL
 The psychoacoustic model is a critical part of perceptualaudio coding that exploits masking properties of the humanauditory system. The psychoacoustic model analyzes signalcontent and combines induced masking curves to determine
HeaderextractionWaveletReconstructio 
Audio CompressedsignalStream of Data
WaveletDecompositionWaveletCompressionPsychoacousticModelBitAllocationAudio signal(.wav)HeaderStream of DataSelect the waveletfunctionDefine the leveldecompositionWaveletDefine the audiowave file signalDevide the audiowave file signal in NframesCompression in thewavelet domainStartCompute themasking thersholdsCalculate the powerspectrum densityCalculate the tonalityCalculate the toneenergyCalculate the toneentropyCalculate thecorrespondingsubband SNRDefine thequantization levelReconstruct thesignal based on themulti-level waveletdecompositionstructureCompute the offsetto shift the memorylocation of the entirepartitionDefine the waveletexpander scheme inorder to reconstruct thesignalWrite the expandedaudio wave file(compressed file)Stop
50http://sites.google.com/site/ijcsis/ISSN 1947-5500
 
(IJCSIS) International Journal of Computer Science and Information Security,Vol. 8, No. 4, July 2010
what information below the masking threshold that isperceptually inaudible and should be removed. Thepsychoacoustic model is based on many studies of humanperception. These studies have shown that the average humandoesn’t hear all frequencies the same. Effects due to differentsounds in the environment and limitations of the humansensory system lead to facts that can be used to cut outunnecessary data in an audio signal. The two main propertiesof the human auditory system that make up the psychoacousticmodel are the absolute threshold of hearing [1] [15] and theauditory masking [1]. Each one provides a way of determiningwhich portions of a signal are inaudible and indiscernible tothe average human, and can thus be removed from a signal.
 A.
 
The Absolute Threshold of Hearing
. To determine the effect of frequency on hearing ability,scientists played a sinusoidal tone at a very low power. Thepower was slowly raised until the subject could hear the tone.This level was the threshold at which the tone could be heard.The process was repeated for many frequencies in the humanauditory range and with many subjects. As a result, thefollowing plot was obtained. This experimental data can bemodeled by the following equation, where f is frequency inHertz [2]:
2
0.6(3.3)0.841000
()3.64()6.50.001()10001000
 f q
 fTfe
= +
 
(1)
0 5000100001500020000-20-100102030405060Frequency (Hz)
   S  o  u  n   d   P  r  e  s  s  u  r  e   L  e  v  e   l   (   d   B   )
 
Figure 4. The absolute thershold of hearing
 B.
 
The Bark Frequency Scale
After many studies, scientists found that the frequencyrange from 20 Hz to 20000 Hz [3] [10] can be broken up intocritical bandwidths [12], which are non-uniform, non-linear,and dependent on the heard sound. Signals within one criticalbandwidth are hard to separate for a human observer [7].
 
Amore uniform measure of frequency based on criticalbandwidths is the Bark. From the earlier discussedobservations, one would expect a Bark bandwidth to besmaller at low frequencies (in Hz) and larger at high ones.Indeed, this is the case. The Bark frequency scale can beapproximated by the following equation [2]:
2
()13(0.00076)3.5[()]7500
 f vfartgartg
= +
(2)
00.20.40.60.811.21.41.61.82x 10
4
0510152025Frequency (Hz)
   F  r  e  q  u  e  n  c  y   (   B  a  r   k   )
 
Figure 5. Relationship between Hertz and Bark Frequencies
C.
 
Tone and Noise Masker Identification
Masking curces of tonal and noise maskers [1] have differentshapes [1] therefore it is necessary to separate them.. To findtonal components it is necessary to find local maximas andthen compare them with their neighbourhood components.This action hints Eq. 3 [1] [3]:
()()7
SPLSPLi
SiSi
±
(3)where:
i
= +2 for
] [
2,63
i
(4)
i
= +2 , +3 for
[ [
63,127
i
(5)
i
= +2…+6 for
[ [
127,255
i
(6)
i
= +2 …+12 for
[ [
255,512
i
(7)According to ISO/IEC MPEG1, Psychocacoustic AnalysisModel1 of MPEG1 audio standard [ 1] sound pressure level of the tonal masker is computed by Eq.8 as a summation of thespectral density of the masker and its neighbours:
()110101
()10.log(10)
SPL
SijTM  j
 Xi
+=−
=
[dB] (8)Sound Pressure level of the noise maskers is computedaccording to Eq. 9 as a summation of the sound pressure levelof all spectral components in corresponding critical band.
()110101
()10.log(10)
SPL
Si NM  j
 Xi
=−
=
[dB] ,
()
 yib
(9)where b represents the critical band, i index spectralcomponents that lies in the corresponding critical band. Noisemaskers are placed in the middle of the corresponding criticalband.
51http://sites.google.com/site/ijcsis/ISSN 1947-5500

You're Reading a Free Preview

Download
/*********** DO NOT ALTER ANYTHING BELOW THIS LINE ! ************/ var s_code=s.t();if(s_code)document.write(s_code)//-->