This action might not be possible to undo. Are you sure you want to continue?
Website: www.ijetae.com (ISSN 2250-2459, Volume 2, Issue 6, June 2012)
Speech Compression with Voice Excited Linear
Arpana Mishra , 2Javed Ashraf
M. tech. Scholar, AFSET Faridabad
Assistant Professor, AFSET Faridabad
The digital filter and its slow changing parameters are
usually encoded to achieve compression from the speech
signal .There are many other characteristics about speech
production that can be exploited by speech coding
algorithms. One fact that is often used is that period of
silence take up greater than 50% of conversations. An easy
way to save bandwidth and reduce the amount of
information needed to represent the speech signal is to not
transmit the silence.
All vocoders, including LPC vocoders, have four main
attributes: bit rate, delay, complexity, quality. Any voice
coder, regardless of the algorithm it uses, will have to make
trade offs between these different attributes.
First attribute of vocoders the bit rate, is used to
determine the degree of compression that a vocoder
achieves. Uncompressed speech is usually transmitted at 64
kb/s using 8 bits/sample and a rate of 8 kHz for sampling.
Any bit rate below 64 kb/s is considered compression. The
linear predictive coder transmits speech at a bit rate of 2.4
kb/s, an excellent rate of compression.
Delay is another important attribute for vocoders that are
involved with the transmission of an encoded speech
signal. Vocoders which are involved with the storage of the
compressed speech, as opposed to transmission, are not as
concern with delay. The general delay standard for
transmitted speech conversations is that any delay that is
greater than 300 ms is considered unacceptable.
The third attribute of voice coders is the complexity of
the algorithm used. The complexity affects both the cost
and the power of the vocoder. Linear predictive coding
because of its high compression rate is very complex and
involves executing millions of instructions per second. LPC
often requires more than one processor to run in real time.
The final attribute of vocoders is quality. Quality is a
subjective attribute and it depends on how the speech
Abstract—the aim of the project to develop a system for
encoding good quality speech at a low bit rate .To implement
this we have used very efficient speech analysis technique
Linear Predictive Coding (LPC). It provides accurate
estimation of speech parameters. An alternate explanation is
that the linear prediction filters attempt to predict future
values of the input signal based on past signals. The speech
signals of males and females were coded. The encoding
process of LPC involves determining a set of accurate
parameters for modeling the vocal tract during the
production of a given speech signal. Decoding involves using
the parameters acquired in the encoding and analysis to build
a synthesized version of the original speech signal. The
conclusion indicates that project was successful in coding the
speech signal at relatively low bit rates with good quality.
Keywords—ACR, CODER, LPC, RESIDUAL SIGNAL,
Speech coding has been and still is a major issue in area
the of digital speech processing .There exist many different
types of speech compression that make use of a variety of
different techniques. However, most methods of speech
compression exploit the fact that speech production occurs
through slow anatomical movements and that the speech
produced has a limited frequency range. The frequency of
human speech production ranges from around 300 Hz to
3400 Hz. Most forms of speech coding are usually based on
a lossy algorithm. Lossy algorithms are considered
acceptable when encoding speech because the loss of
quality is often undetectable to the human ear.
Another fact about speech production that can be taken
advantage of is that mechanically there is a high correlation
between adjacent samples of speech. Most forms of speech
compression are achieved by modeling the process of
speech production as a linear digital filter.
ijetae.com (ISSN 2250-2459. good.PATH OF HUMAN SPEECH PRODUCTION The idea of the air from the lungs as a source and the vocal tract as a filter is called the source-filter model for sound production. June 2012) sounds to a given listener. This air can be periodic. fair. Subjective analysis will consist of listening to the encoded speech Signal and making judgments on its quality. II. or bad. An objective analysis will be performed by computing Segmental Signal to Noise Ratio (SEGNER) between the original and the coded speech signal. It is based on the idea of separating the source from the filter in the production of sound. or it can be turbulent and random when producing unvoiced sounds. The encoding process of LPC involves determining a set of accurate parameters for modeling the vocal tract during the production of a given speech signal.HUMAN VS.VOICE CODER SPEECH PRODUCTION 307 . and out through the mouth to generate speech.International Journal of Emerging Technology and Advanced Engineering Website: www. One of the most common test for speech quality is the absolute category rating (ACR) test. The quality of played back speech will be solely based on the opinion of the listener. when producing voiced sounds through vibrating vocal cords. The source-filter model is the model that is used in linear predictive coding.The report will be conducted with the summary and some ideas for future work. FIGURE 1. This model is used in both the encoding and the decoding of LPC and is derived from a mathematical approximation of the vocal tract represented as a varying diameter tube. In this type of description the lungs can be thought of as the source of the sound and the vocal tract can be thought of as a filter that produces the various types of sounds that make up speech. Decoding involves using the parameters acquired in the encoding and analysis to build a synthesized version of the original speech signal. poor. it only sends the model to produce the speech and some indications about what type of sound is being produced. The speech coder that is developed is analyzed using both subjective and objective analysis. The excitation of the air travelling through the vocal tract is the source. LPC never transmits any estimates of speech to the receiver. FIGURE 2. This test involves subjects being given pairs of sentences and asked to rate them as excellent. Issue 6. HUMAN SPEECH PRODUCTION The process of speech production in humans can be summarized as air being pushed from the lungs through the vocal tract. Volume 2.
For a good reconstruction of the excitation only the low frequencies of the residual signals are coded. This could be used to give a unique set of predictor coefficients. which contain most of the energy. The LPC reconstructed speech has a lower pitch than the original sound. The analysis part of LPC involves examining the speech signal and breaking it down into segments or blocks. June 2012) III. Quantizing the intermediate values is less problematic than quantifying the predictor coefficients directly. Thus the Input speech signal in each frame is filtered with the estimated transfer function of LPC analyzer. It has two key components: analysis or encoding and synthesis or decoding.BLOCK DIAGRAM OF A VOICE EXCITED LPC VOCODER The main idea behind the voice-excitation is to avoid the imprecise detection of the pitch and the use of an impulse train while synthesizing the speech. A. FIGURE 3. These are intermediate values during the calculation of the well-known Levinson-Durbin recursion.The transfer function of the time-varying digital filter is given by: FIGURE 4. Issue 6. If this signal is transmitted to the receiver one can achieve a very good quality. Thus one way to compress the signal is to transfer only the coefficients. these predictor coefficients are normally estimated every frame. Volume 2. Another important parameter is gain (G). the reconstructed speech has a lower Quality then the input speech sentences. The filtered signal is called residual. The voice-excited LPC reconstructed file sounds more spoken and less whispered. Power Signal to Noise Ratio Where A is samples of original signal and MSE is Mean Square Error is given by. The principle behind the use of LPC is to minimize the sum of the squared differences between the original speech signal and the estimated speech signal over a finite duration. which is normally 20ms long. The sound seems to be whispered. 308 . QUANTIZATION OF LPC COEFFICIENTS Usually direct quantization of the predictor coefficients is not considered. To achieve a high compression rate we employed the Discrete cosine transform (DCT) of the residual signal. To ensure stability of the coefficients (the pole and zeros must lie within the unit circle in the zplane) a relatively high accuracy (8-10 bits per coefficients) is required. This comes from the effect that small changes in the predictor coefficients lead to relatively large changes in the pole positions. In both cases.International Journal of Emerging Technology and Advanced Engineering Website: www.BLOCK DIAGRAM OF AN LPC VOCODER IV.ijetae. LPC MODEL The particular source-filter model used in LPC is known as the Linear predictive coding model. V. The predictor coefficients are represented by a k.com (ISSN 2250-2459. COMPARATIVE ANALYSIS OF LPC METHOD A compression of the original sentences against the LPC reconstructed and the voice-excited LPC method is done.
Eaglewood Cliffs. June 2012) VI. Gibson.hut. Hodson. Anjini Shukla . Golden.Deptartment of Electronics. ―A practical handbook of Speech coding‖ chapter 4.chapter9.R. We can see that the voice excited waveform looks closer to the original sound than the plain LPC reconstructed one. Issue 6. But at same time when SNR for both cases were compared it was observed that the sound due to Plain LPC was found to be more noisy having a negative SNR.1-9  Mark Nelson and Jean-loup Gailly. by. Sask. Saskatchewan Univ. D.1978 B.1-4  Jerry.ijetae.org/wiki/linear_predictive_coding C. The voice excited LPC having a positive SNR. David .uk/speech_codecs/ Woodard. http://www-  Richard V.html FIGURE 7.volume35. ―Media Signal Processing‖. Voice –discrimination tests indicate that voice identity is well preserved.WAVEFORM OF ORIGINAL SPEECH SIGNAL  L R Rabiner and R W Schafer .  ―Voice–Excited Vocoders for practical Speech‖ .fi/publications/file/theses/lemetty_mst/chap3.soton. ―Digital Processing Of Speech Signal‖ Prentice-Hall. Voice Excited LPC reconstructed Speech Signal  Roger M. Eng.ac.E. FIGURE 6. Crucial factors influencing the remade speech quality are accuracy of spectral flattening and the impulse response of the analyzer low pass filters. Volume 2.  Landy Goldbarg and Lance Reik . the bits per sample increase causing an increase in Bandwidth of signal. Jr.F.ecs..com www. Internet / Mbone Audio(2000)5-7 309 . CONCLUSION Waveforms of LPC reconstructed. voice-excited LPC reconstructed speech signals give the idea of quality of signals.data-compression.acostic. WAVEFORMS ANALYSIS VII..Cox ―Speech Coding(1999)‖.WAVEFORM OF LPC RECONSTRUCTED SPEECH SIGNAL M. Schroeder. Original Speech Signal REFERENCES FIGURE 5. B.International Journal of Emerging Technology and Advanced Engineering Website: www. ―Digital Computer Simulation of a SampledData Voice-Excited Vocoder‖ Journal of the acoustical society of America. Saskatoon. Hardman and O.Taguchi . Though there is an improvement in quality when we use voice excited method.WAVEFORM OF VOICE EXCITED LPC RECONSTRUCTED SPEECH SIGNAL  V. E.  ―Study of Linear Prediction Model for Audio Synthesis‖.issue9e AcouJournal of the  ―Voice Exciting LPC‖ .wikipedia.. Logan  en. conference Publications  www.com (ISSN 2250-2459. A. LPC reconstructed Speech Signal  ―Speech coding overview‖ Jason mobile.NJ.‖Speech compression‖.
This action might not be possible to undo. Are you sure you want to continue?
We've moved you to where you read on your other device.
Get the full title to continue reading from where you left off, or restart the preview.