- rfengines_resampling.pdf
- Dsp Architecture
- Definitions Dcom
- Amin- Seismic Ground Motion Response for SRS Profile
- metalic design
- Pr100 User Manual
- Statistical Sensor Fusion
- l81-032
- A Hi-Resol. Tech. for Ultrasound Harmonic Imaging
- Am J Physiol Heart Circ Physiol-2001-Mukkamala-H2714-30
- Paper 09
- ijsrp-p14142
- Considerations in Establishing a Vibration Based Predictive Maintenance Program for Machine Tools
- Spectral Correlation Based Signal Detection Method
- Ch5 Fourier Series
- Paper 11
- 0953-2048_15_7_328
- lab01_Matlab
- Komunikasi Data 02 - Data and Signal
- Ch4 2 v1 [Compatibility Mode]
- Quiz # 1
- A Real-time Implementation of a Stereophonic Acoustic Echo Canceler
- A Compact Fourth-Order Multi-fold Hairpin Line Microstrip Bandpass Filter at 1650 Mhz for Rf Wireless Communications
- Modeling of FACTS Devices Based on SPWM VSCs
- Aqwa vs SeaFEM vs Wamit
- H. Zhang et al- The effect of mechanical deformation on spiral turbulence
- AC Signals & Sampling
- Berido Op Layout
- DSP
- CA8LUBKP
- 12 of 1964(E).pdf
- 27 of 2014 (E).pdf
- 07 of 2005-E.pdf
- EC-2010
- 33 of 2015 (E).pdf
- karnataka land reforms Karnataka act 02 of 2015

COURSE SGN-1650 AND SGN-1656, 2010–2011

In this work, we implement a simple speech bandwidth extension system that converts a narrowband speech signal into a wideband signal. Passing the course SGN-4010 Speech processing methods before selecting this exercise is recommended. You should be familiar with the basic (speech) signal processing methods, such as windowing, Fourier transform, and linear prediction.

1

1.1

Introduction

Artiﬁcial bandwidth extension

In digital signal processing, signals are bandlimited with respect to the used sampling frequency. For instance, if a sampling frequency of 8kHz is used, the highest possible frequency component in the signal is 4kHz (Nyquist frequency). In analog telephone speech, speech bandwidth has traditionally been limited to 300–3400Hz. Most of the information in speech is below the upper boundary 3.4kHz and even though the very low frequencies are not transmitted, the human hearing system can detect the speech fundamental frequency based on the harmonic components present in the signal. For simplicity, the terms narrowband speech and wideband speech are used here to refer to speech signals with bandwidth of 4kHz (sampling frequency of 8kHz) and 8kHz (sampling frequency of 16kHz), respectively. The amount of information in narrowband speech is smaller compared to wideband speech and the perceived speech quality is thus lower. To achieve wideband speech quality without actually transmitting wideband signals, algorithms for artiﬁcial bandwidth extension (ABE, BWE) have been developed. These algorithms convert original narrowband signals into artiﬁcial wideband signals by estimating the missing high-frequency content based on the existing low-frequency content. In this work we implement a simple ABE system that utilizes a source-ﬁlter model of speech. Each narrowband signal frame is decomposed into a source part and a ﬁlter part and the parts are extended separately. The vocal tract is modeled as an all-pole ﬁlter and the ﬁlter coeﬃcients are estimated using linear prediction (LP). The model residual is used as a source signal. The vocal tract model is extended using the most suitable wideband model taken from a codebook and the residual signal by time domain zero-insertion. The created signal is added to a resampled and delayed version of the original narrowband signal to form an artiﬁcial wideband signal.

1.2

Linear prediction of speech

Linear prediction (LP) is one of the most important tools used in digital speech processing. Speech production can be modeled as a source-ﬁlter system where a source signal produced by the vocal cords is ﬁltered by a vocal tract ﬁlter with resonances at formant frequencies. For recap, check www.cs.tut.fi/ kurssit/SGN-4010/LP en.pdf. Vocal tract can be considered to be pth order all-pole ﬁlter 1/A(z): 1 1 = A(z) 1 + a1 z −1 ... + ap z −p (1)

where the ﬁlter coeﬃcients a1 . . . ap are estimated using linear prediction. Figure 1 illustrates the spectrum of the all-pole ﬁlter 1/A(z) estimated from a short speech frame. The thin line represents the amplitude spectrum (absolute value of discrete Fourier transform, DFT) of the frame. Speech frame Y (z) (now in frequency domain) is formed by ﬁltering the residual signal X(z) by the vocal tract all-pole ﬁlter 1/A(z) (remember that convolution/ﬁltering in time domain corresponds to multiplication in frequency domain): Y (z) = X(z) A(z)

1

Amplitude and LP spectrum for a Finnish vowel ’y’ 40 30 20 10 Amplitude (dB) 0 −10 −20 −30 −40 −50 −60 0 1000 2000 3000 4000 5000 Frequency (Hz) 6000 7000 8000 Figure 1: Frame LP amplitude spectrum (thick line) and amplitude spectrum (thin line). use function lpc to compute LP coeﬃcients of a given order: a = lpc(frame. ω2 . LP spectrum and LSF coefficients for a Finnish vowel ’y’ 40 30 20 10 Amplitude (dB) 0 −10 −20 −30 −40 −50 −60 0 1000 2000 3000 4000 5000 Frequency (Hz) 6000 7000 8000 Figure 2: Frame LP spectrum (thick line) and corresponding LSF values (thin lines). LSFs have good quantization and interpolation properties and are thus widely used in speech coding. % Convert LP coefficients into LSF coefficients a = lsf2poly(w). % Estimate LP coefficients For speech coding purposes. Residual signal X(z) is formed by ﬁltering the frame Y (z) by the vocal tract inverse ﬁlter A(z): X(z) = Y (z)A(z) In Matlab. The LSF representation of the previous LP spectrum is given in Figure 2.tut. .order). An LSF coeﬃcient vector Ω corresponding to A(z) of order p consists of p root angle (frequency) values ωi : Ω = (ω1 . . ωp ). In Matlab. % Convert LSF coefficents into LP coefficients Note that the values of vector w are between 0 and π (whereas in Figure 2 the LSF values are scaled to be between 0 and 8kHz) 2 .fi/sgn/arg/8003102/syn en. .cs. The thick line represents the LP spectrum and the frequency values of the thin lines represent the LSF coeﬃcient values.pdf. The idea is to decompose A(z) into two polynomials that have their roots on the unit circle. LP polynomial A(z) can be decomposed into line spectral frequencies (LSF). . LP-to-LSF and LSF-to-LP conversions can be computed using functions poly2lsf and lsf2poly: w = poly2lsf(a). A more detailed derivation of LSFs can be found in http://www.

% Downsample (wideband -> narrowband) By default. As test data.fi.kaiser(500. % Upsample signal 3 .wav .pdf). let’s go through some basic speech processing Matlab functions.speech. 2.cs. are there any audible diﬀerences between them? Increase the sampling frequency by upsampling the narrowband signal: yus = resample(ynb.wav are used as LSF codebook training data. figure.cs.silen@tut. What can you say about the frequency content of the narrowband signal compared to the wideband signal? Listen to the signals (soundsc).edu/cmu_arctic/packed/cmu_us_slt_arctic-0. plot(ywb). arctic a0100.1 What to include in the report? The report of the work should contain the commented Matlab codes of your bandwidth extension system.5). specgram(ywb. Read the selected test wideband speech signal into Matlab: [ywb.2). . Write the whole report in a single ﬁle and send it to hanna.wav’). e. Question 1: What is the sampling frequency in the downloaded wideband signal? What is the highest frequency component of this band-limited signal? Create a narrowband signal by downsampling the wideband signal: ynb = decimate(ywb.3. The extended parts are combined using ﬁltering to form a frame that contains artiﬁcial high-frequency components missing from the narrowband frame. The ﬁlter part is extended using a codebook built in Section 2.475). we build a simple speech bandwidth extension system. arctic a0501. Question 2: Plot the created time domain signal and its spectrogram (note the new sampling frequency).cmu. .2 Work assignment In this work.512. Extended time domain frames are catenated using overlap-add (see http://www. Therefore.bz2 The package is extracted in Lintula by typing: tar -xjf cmu_us_slt_arctic-0. The extended signal is added to an interpolated and delayed version of the original narrowband signal.bz2 The wideband speech ﬁles should now be in folder cmu us slt arctic/wav/.tut. Before starting to build the ABE system. Waveﬁles arctic a0001. function decimate ﬁlters out the high-frequency content before downsampling thus preventing aliasing. Include the names and student numbers of the group members! 2. answers to the questions including the plotted ﬁgures.tar.wav. The system should read a narrowband speech signal and process it framewise so that each frame is ﬁrst decomposed into an all-pole ﬁlter and a source signal.fs] = wavread(’cmu_us_slt_arctic/wav/arctic_b0001.fi/kurssit/SGN-4010/ikkunointi en.2 Getting started The speech ﬁles can be downloaded from: http://www.tar.95-release. The ﬁlter and the source signal parts are extended separately. in this case you do not need to take care of the anti-aliasing ﬁltering.g.1). and plot the time domain signal and spectrogram: figure. any of the waveﬁles excluded in the training data can be used.95-release.fs.2.

yf(1:2:end) = ynb. This can be done easily in frequency domain: 4 . a suitable wideband LSF vector is found based on the corresponding narrowband representation. To construct the LSF codebook. use for the narrowband signal a pre-emphasis ﬁlter whose DFT on the frequency band 0–4kHz (sampling frequency is 8kHz) is identical to the DFT of the wideband ﬁlter on the band 0–4kHz (sampling frequency is 16kHz). we are going to need both narrowband and wideband LP spectra.ywb).3 LSF codebook construction The ﬁrst step in building our ABE system is the construction of the LSF codebook. % Filter signal ywb The frequency response of the wideband pre-emphasis ﬁlter is illustrated in Figure 4...95].4 for ﬁnding a suitable wideband representation for the spectral envelope based on the known narrowband representation.1. NB LSF vector 1 NB LSF vector 2 WB LSF vector 1 WB LSF vector 2 . How do they diﬀer? Add a zero after each sample of the narrowband signal: yf = zeros(length(ynb)*2. Question 4: Plot the signal spectrogram and listen to the signal. In the bandwidth extension.95z −1 : ywb = filter([1 -0.. use a FIR ﬁlter H(z) = 1 − 0. NB LSF vector N WB LSF vector N Figure 3: The LSF codebook consisting of narrowband-wideband representation pairs. Compare the spectrogram to the narrowband spectrogram.. the following processing should be repeated for every training data waveﬁle. The codebook stores narrowband-wideband representation pairs and it is used in extension phase of Section 2. Therefore. . Pre-emphasis Before LP analysis. The narrowband signals can be formed by decimating the existing wideband signals (decimate). ﬁlter the signals using a pre-emphasis ﬁlter.1). How has the signal changed? 2.Question 3: Plot the time domain signal and spectrogram. 10 Magnitude (dB) 0 −10 −20 −30 0 1000 2000 3000 4000 5000 Frequency (Hz) 6000 7000 8000 80 Phase (degrees) 60 40 20 0 0 1000 2000 3000 4000 5000 Frequency (Hz) 6000 7000 8000 Figure 4: Frequency response of the wideband pre-emphasis ﬁlter. For wideband signals.

pdf.’symmetric’).nfft).% FFT length for the wideband signal nfft = 2^nextpow2(length(ywb)).200). 5 . The resulting matrix centroids of matrix clcentr are used as wideband entries of the codebook.005).25*nfft 0. Vector clidx contains the cluster index for every original wideband LSF vector. The wideband vectors are clustered using k-means clustering and the resulting cluster centroids are used as wideband codebook entries. A detailed description about k-means clustering can be found at http://www. % Filtering in time domain corresponds % to multiplication in frequency domain Ynb = Ynb(:).fi/∼jupeto/jht lectnotes eng.tut.clcentr] = kmeans(codevec_wb. The number of clusters was now set to 200 and may be varied. Note that the sampling frequency (fs) for the narrowband signal is 8kHz and for the wideband signal 16kHz. Use ﬁlter order 10 for narrowband and 18 for wideband signals. % Inverse DFT: frequency domain -> time domain ynb = ifft(Ynb. tmp = hanning(fs*0. . form a mean vector (of size 1x10) of the corresponding narrowband LSF vectors. Windowing Window the signals using the following 25ms window and no overlapping between adjacent frames: awinlen = round(fs*0.5*nfft). Cluster the wideband LSF vectors using Matlab function kmeans: [clidx. % Wideband filter DFT (length nfft) % Note: DFT includes frequencies 0-16kHz (sampling frequency 16kHz) Hwb = fft([1 -0. % Narrowband signal DFT (length nfft/2) Ynb = fft(ynb. Start by estimating of the LP coeﬃcients (lpc) and convert them further into LSF coeﬃcients (poly2lsf).025). LSF computing Compute framewise LSF coeﬃcients for the narrowband and wideband signals.75*nfft+1:nfft]).cs. For each cluster.95]. % Narrowband filter DFT (length nfft/2) % Note: DFT includes frequencies 0-8kHz (sampling frequency 8kHz) Hnb = Hwb([1:0. Use this mean vector as a key for the corresponding cluster.0. < ωp ). where the matrix rows correspond to LSF vectors of individual frames N being the total number of frames in the training data.1). Check that the clustered LSF vectors result in stable LP ﬁlters (ω1 < ω2 < . LSF clustering The collected LSF vectors are not used as codebook entries as such. Repeat the framewise processing for each training waveﬁle and store the results in two matrices of size Nx10 and Nx18. awinfun = [tmp(1:length(tmp)/2). ones(awinlen-length(tmp).*Hnb(:). . tmp(length(tmp)/2+1:end)].

Increase the sampling rate of the narrowband signal using time domain zero-insertion (or spectral mirroring in frequency domain).fi/kurssit/SGN-4010/ikkunointi en. In analysis. A block diagram of the system is given below. Resampled and delayed narrowband signal SOURCE SIGNAL EXTENSION LP ANALYSIS SPECTRAL ENVELOPE EXTENSION WAVEFORM GENERATION Narrowband signal + Artificial wideband signal Figure 5: Bandwidth extension procedure. For overlap-add. After scaling the frames are joined together using overlap-add. the input signal is ﬁltered using a pre-emphasis ﬁlter.4 Extension of a narrowband test signal Write a Matlab code that extends a given narrowband test signal artiﬁcially into a wideband signal. a 25ms analysis window and a 10ms synthesis window are used.pdf. You can modify this code or write your own implementation. Use the same model order as in the codebook training.2.tut. You can operate either in time domain (use ﬁltering) or frequency domain (use multiplication/division). . In synthesis. compute the all-pole ﬁlter coeﬃcients 1. The time diﬀerence between adjacent frames is 5ms in both analysis and synthesis. . use the same window function as earlier (awinfun). Use the same narrowband pre-emphasis ﬁlter as before. ap for the frame and then form the model residual signal as was explained in Section 1. Overlap-add The ﬁltered signal is extended framewise and the extended frames are catenated using overlap-add technique. Create a wideband source signal that has its energy mainly on the frequency band 4–8kHz. Join the windowed segments using an overlap of 5ms. . Note that the sampling frequency of our ABE system is 8kHz in analysis and 16kHz in synthesis! Extension of the source and ﬁlter parts Decompose each narrowband frame using LP analysis into source and ﬁlter parts. reconstruct a 25ms time domain speech frame and use a 10ms Hanning window to extract a segment around the center of the frame. . This will create a signal whose spectrum on the band 4–8kHz is a mirrored copy of the spectrum on the band 0–4kHz (check the spectrogram). Use the narrowband source signal as a basis for the wideband source signal: 1. Note that the extension causes a delay in the signal and therefore also the resampled narrowband signal must be delayed to synchronize the signals. The frames are decomposed into source signal and ﬁlter parts using LP analysis and the parts are extended separately. Pre-emphasis As in the codebook construction. The frame waveform is reconstructed by ﬁltering the extended source signal using the extended ﬁlter.cs. The basic idea is to create a signal that contains the frequencies that are missing from the original narrowband signal.2. First. An example code for overlap-add is available at http://www. a1 . 6 . This signal having its energy mainly on the frequency band 4–8kHz is then added to an interpolated version of the original narrowband signal that has most of its energy on the band 0–4kHz. Start the processing by windowing the incoming signal.

create a signal whose energy is mainly on the band 4–8kHz (check the spectrogram). Find the narrowband codebook entry with the minimum Euclidean distance to the current LSF vector.95]. Listen to the resulting signal. 3. Delay the signal with with the total delay of your ABE system. Use this signal as a wideband source signal. Are there audible diﬀerences compared to the original narrowband and wideband signals? Question 6: What are the most dominant artifacts caused by the extension? Could they be avoided somehow? 7 . Scale the frame energy according to the energy of the original narrowband frame. Convert the selected wideband LSF coeﬃcients into LP coeﬃcients for waveform synthesis.[1 -0.2. Convert LP coeﬃcients of the current narrowband frame into LSFs.95z −1 ). How does it diﬀer from the spectrogram of the original wideband and narrowband signals? Listen to the signal. Waveform synthesis Reconstruct the wideband frame by ﬁltering the created wideband source signal with the selected wideband all-pole ﬁlter. Add the resampled and delayed signal to the artiﬁcially extended signal. Then compute energy Enb of the original narrowband frame (sampling frequency is 8kHz). 2. p Multiply each sample of the frame by the scaling factor Enb /Ecb . Question 5: Plot the spectrogram of the resulting signal.sig). In Matlab: sig = filter(1. Increase the sampling frequency of the original narrowband signal from 8kHz to 16kHz using command resample. Use the codebook to extend the spectral envelope: 1. Using the signal in (1). Use overlap-add to join the scaled time domain frames. compute energy Ecb of the frequency band 0–4kHz for the frame that results from ﬁltering an interpolated version of the narrowband source signal (sampling frequency of the interpolated signal is 16kHz) by the selected wideband all-pole ﬁlter. First. Select the corresponding wideband entry to be used as a wideband spectral envelope representation. Remove the eﬀect of pre-emphasis ﬁltering by ﬁltering the signal with the inverse ﬁlter of the wideband pre-emphasis ﬁlter H(z): 1/H(z) = 1/(1 − 0.

- rfengines_resampling.pdfUploaded byjasonpeng
- Dsp ArchitectureUploaded bySenthilkumar Senthilkumar
- Definitions DcomUploaded byRabia Abu Bakar
- Amin- Seismic Ground Motion Response for SRS ProfileUploaded byAnonymous iSnRlK
- metalic designUploaded byhemalcmistry
- Pr100 User ManualUploaded byjllaredo
- Statistical Sensor FusionUploaded byMohammad Jarrah
- l81-032Uploaded byhalawalla
- A Hi-Resol. Tech. for Ultrasound Harmonic ImagingUploaded byrotero_pujol
- Am J Physiol Heart Circ Physiol-2001-Mukkamala-H2714-30Uploaded byIgor Uus
- Paper 09Uploaded byho-fa
- ijsrp-p14142Uploaded byJoseph Knight
- Considerations in Establishing a Vibration Based Predictive Maintenance Program for Machine ToolsUploaded byMohd Asiren Mohd Sharif
- Spectral Correlation Based Signal Detection MethodUploaded bylogu_thalir
- Ch5 Fourier SeriesUploaded byTold Told Leung
- Paper 11Uploaded byho-fa
- 0953-2048_15_7_328Uploaded byrodrigoct88
- lab01_MatlabUploaded byRajendrakumar Kumar
- Komunikasi Data 02 - Data and SignalUploaded byAzham
- Ch4 2 v1 [Compatibility Mode]Uploaded byBobin baby
- Quiz # 1Uploaded bySuper Rage
- A Real-time Implementation of a Stereophonic Acoustic Echo CancelerUploaded byapi-3738687
- A Compact Fourth-Order Multi-fold Hairpin Line Microstrip Bandpass Filter at 1650 Mhz for Rf Wireless CommunicationsUploaded byInternational Journal of Research in Engineering and Technology
- Modeling of FACTS Devices Based on SPWM VSCsUploaded byshehnozmohd
- Aqwa vs SeaFEM vs WamitUploaded byrajumj66
- H. Zhang et al- The effect of mechanical deformation on spiral turbulenceUploaded byGretymj
- AC Signals & SamplingUploaded byahmed4665
- Berido Op LayoutUploaded byRovil Berido
- DSPUploaded bySahil Khan
- CA8LUBKPUploaded bynayeem4444