This action might not be possible to undo. Are you sure you want to continue?
Presented by Srijan Silwal(108671) Siddharth Sah(108672)
• • • • • • • Introduction to speech Compression and its need Polynomial approximation Frame parameters Interpolation Encoding Compression of spectral component,gain,pitch
.Speech Signal Human Speech is acoustic signal It is converted to electrical signal by transducers.
with time as its independent variable.e. the frequency spectrum is not constant in time. Although human beings have an audible frequency range of 20Hz –20kHz. . • 3.a property that is exploited in the compression of speech. • 2. the human speech has significant frequency components only upto 4kHz. • 4.Properties of electrical signals • 1. It is random in nature. It is non-stationary. It is a one-dimensional signal. i.
.Digital Representation of Speech • With the advent of digital computing machines. it was propounded to exploit the powers of the same for processing of speech signals. • The analog signal is sampled at some frequency and then quantized at discrete levels.
Parameters of Digital Speech 1. Various formats have been proposed by different manufacturers for example ‘. The sound files can be stored and played in digital computers. Sampling rate 2.wav’ ‘.au’ . Bits per second 3. Number of channels.
.What is COMPRESSION? • Compression is a process of converting an input data stream into another data stream that has a smaller size. • Compression is possible only because input data has some amount of redundancy associated with it. • The main objective of compression systems is to eliminate this redundancy.
Hence compression of these files has become a necessity. • When compression is used to reduce storage requirements. overall program execution time may be reduced. .Why Compression? • Multimedia files in general need plenty of disk space for storage and sound files are no exception. This is because reduction in storage will result in the reduction of disc access attempts.
Applications of Compression • 1. In the case of tapes. The playing time of the medium is extended in proportion to the compression factor. • 2. . The use of compression in recording applications is extremely powerful. the access time is improved because the length of the tape needed for a given recording is reduced and so it can be rewound more quickly.
• 4. .• 3. The time required for a web page to be displayed and the downloading time in case of files is greatly reduced due to compression. compression is used to reduced the bandwidth needed. In digital audio broadcasting and in digital television transmission.
spectrum..e. • A method for compressing speech is based on polynomial approximations of the trajectories in time of various speech features (i. . and pitch).Polynomial Approximation-Introduction • Methods for speech compression aim at reducing the transmission bit rate while preserving the quality and intelligibility of speech. gain.
. • Useful compression results if the number of bits per second needed to transmit the polynomial coefficients is smaller than the number of bits per second needed to transmit the original feature frames for the segment. uses polynomial functions to approximate trajectories of speech features present in successive time frames.Continued. • One method of compression. called segmental coding. .
such as voicing. gain (energy). This process is repeated for successive windows and results in a discrete stream of frame parameters. and pitch.POLYNOMIAL SPEECH COMPRESSION: FRAMES • The input signal is analyzed in brief windows (frames) that usually span a few tens of milliseconds. are also obtained and assigned to frames. • Other speech features. These frame parameters already represent a compressed form of the original signal. . • The speech samples contained in a frame are processed by a spectral analyzer that provides a relatively small number of spectral features for each frame.
Additional compression can be obtained by exploiting this redundancy across successive frames. • Successive frames are analyzed independently of each other and they usually contain some redundant information. . • Linear interpolation is employed usually in the decoder between successive frames to smooth the transitions of the parameters across frames.• The frame parameters are transmitted to the receiver (decoder) where the signal is synthesized to resemble the original signal.
. Such methods are suitable for the quantization of fixed-length segments. • A method to alleviate this problem uses polynomial approximation of the speech features included in such segments. • Matrix quantization techniques require a larger amount of data.Coding of Frame Parameter • An efficient method to perform coding of frame parameters is based on matrix quantization. This poses a problem for longer segments of speech due to the higher spectral-temporal variability encountered in such segments and the sparseness of the data. A whole block of frame vectors is constituted as a matrix.
they are described by a relatively small number of parameters which is necessary to achieve a significant compression. • There are two main advantages to using polynomial functions for approximation: • First.Continued.. they can approximate various shapes of feature trajectories with arbitrary accuracy. . depending on the polynomial order • Second.
continuous on a closed interval [α. for any arbitrary function f(x).β]. there exists an algebraic polynomial p(x). • Thus.Approximation • One of the most popular approaches to function approximation is the least squares method. of order d. . that can best approximate the function on that interval in the L2 norm.
• Such bidirectional transformation is also required in order to reconstruct the original signal from the vector-space representation. .• It is assumed that there is a bidirectional transform between the original signal and its vector space representation.
pnP are the polynomial coefficients. and P is the polynomial order for feature element .1n+ai.P(n)=ai.2n2+…+ai. .0… ai.pnP where ai.Interpolation • interpolation is done in the least-squares sense by a polynomial function defined as follows: Fi.0+ai.
.• The maximum order of the polynomial is limited to P=N-1 because there are only N data points available to estimate the coefficients in the leastsquares sense. • The lower the polynomial order P in the range[0.N-1]. the higher the approximation error for an arbitrary trajectory.….
• A feature compression factor is defined as follows: . the condition for achieving compression is P+1<N which presupposes some approximation errors.Compression • Thus.
. these coefficients can be uniquely represented by sampling the polynomial function at P+1 arbitrary points and encoding these P+1 trajectory feature samples.Encoding of Polynomial Coefficients • Instead of encoding the P+1 coefficients. • These new P+1 feature samples can be encoded using the original VQ codebook because they are vectors in the original D dimensional feature space.
.Polynomial Compression of Spectral Parameters • Among the popularly used spectral representations in speech coding are the LPC and LSF features.
they are transformed into P+1 gain feature samples by sampling the polynomial function at P+1 points.Polynomial Compression of Gain Parameters • For good quality and intelligibility of the encoded speech other parameters such as gain and pitch are also important. • Then instead of encoding the P+1 polynomial coefficients. • These gain feature samples can be encoded using the coder’s original codebook for gain. . • The trajectory of the gain feature is approximated by a polynomial function of order P on a segment S containing N frames.
Polynomial Compression of Pitch • Because pitch is measurable only in the voiced frames. • A way of adapting the method to this special case is to build the polynomial function based on a possible reduced number of voiced frames(Nv) in the speech segment(Nv<N) . an arbitrary speech segment S containing N frames can contain frames with no pitch measurements.
Conclusion • Polynomial approximation has proved to be a useful and efficient method for compression of the speech parameters. • The spectral parameters. Such a method can be applied to both coding and storing of speech. the gain and pitch parameters can also be compressed by polynomial approximation methods. especially those with low dynamics such as LSF parameters. . • In addition. are particularly suitable to supplementary compression by polynomial approximation.
This action might not be possible to undo. Are you sure you want to continue?
We've moved you to where you read on your other device.
Get the full title to continue reading from where you left off, or restart the preview.