4 views

Uploaded by api-360549281

© All Rights Reserved

- Music Theory Fundamentals
- 254_Proc
- PS1-6.pdf
- Brains and Music, Whales and Apes, Hearing and Learning . . . and More
- Psw Gtr173 03 Martell
- Hadenability Prediction Based on Chemical Composition
- Neural Network Applications in Finance
- Bolivia - Trombone
- IJAIEM-2013-12-10-021
- V2I1-0016
- Stock Market Analysis
- Chapter 6 Product Specifications
- Test Check
- testje
- [Jurnal] Frequent Route Based Continuous Moving Object Location- And Density Prediction on Road Networks
- As Vesta Was
- Predicting Road Cycling Ride Duration
- EasyDM Visual Introduction DMF
- fldrc infographic
- Econometrics Models

You are on page 1of 9

Student ID: 102061144

Name:

Introduction

In this lab, we will implement the linear prediction that teacher mentioned in the course, the

main formula of linear prediction is below:

P

x [ n ]= ak x [ nk ] +e [n]

k=1

in this formula the e[n] we call it the prediction error or the glottal source, we could easily find

that if the error is smaller the result will be better. Moreover, the parameter a is the linear

prediction coefficient.

Through this procedure, we could recover the environment of our throat or even the reflectance

coefficient.

Adjust the frame length and the order of LP. Listen to the resulted excitation

signal excitat. Find out what range of these two parameters results in successful

removal of the original vowel quality.

The method we use is to fix the frame length first and to see the mean of the absolute value of

excitation noise. In the beginning, we try to use our ears to hear the sound and specify whether

it is success or not, but this method is too subjective. Then we calculate the mean to be

determine, and try to make the mean smaller and smaller. Some results we try is below:

(1) fix the frame length at 0.016. (pronunciation of /a/)

P Mean of e_n

16 0.12955

30 0.1237

45 0.12428

60 0.12204

70 0.11888

80 0.11785

150 0.09912

In this case, we found if we keep increasing the linear prediction order(p) the mean of the

excitation error will keep decreasing, too. However, if we listen to the sound of excitation, after p

equals to 60, the sound left only buzzing like teacher mentioned in the video.

(2) fix the frame length at 0.032. (pronunciation of /a/)

P Mean of e_n

16 0.062

30 0.058

45 0.059

60 0.058

70 0.0569

80 0.0567

150 0.04

In this case, we also find if we increase the frame length, the mean of the excitation will also

decrease. In order to check this, we also decrease the frame length to 0.008, and the result is as what

we expect. Therefore, we think if we want to get the most successful result, we just need to increase

the frame length and also the linear prediction order. However, we guess there must be a limit, could

not keep increasing the both parameter.

Investigating how stationary the estimated K parameters are as they inevitably

vary cross frames. Since the parameters are intimately related to the shape of

the vocal tract, do they remain more consistent if you try to sustain your vowel

quality by holding your articulators still, such as your jaw position and your

tongue position? How about your pitch? Does it help to keep Kcoeff stationary

by keeping your voice at the same pitch.

In this question, I try several methods to check the stationary of the K parameters. The first one

is to keep my voice low and at the same pitch, then I calculate the mean of the whole K parameters

matrix to be the comparison standard. After getting this, we use the K parameters matrix to minus the

mean column by column (frame by frame), and calculate the average of the difference again at the

end. The code is below:

%%

% How stationary

Kcoeffs_mean = mean(Kcoeffs,2);

Kcoeffs_diff = zeros(size(Kcoeffs));

for i = 1:size(Kcoeffs,2)

Kcoeffs_diff(:,i) = Kcoeffs(:,i) - Kcoeffs_mean;

end

Kcoeffs_diff = mean(Kcoeffs_diff,2);

figure

plot(Kcoeffs_diff);

hold on;

plot(zeros(size(Kcoeffs_diff)),'r');

The result is below: (red line is zero)

The next experiment is increasing the tone of the /a/ pronunciation, and the method is the same

as what we mentioned above. The figure is below:

Increasing pitch

In these experiments, we could find that if we keep our pitch at still no matter it is high or low

doesnt vary too much. The most different is that if we increasing the pitch in the record, the

difference seems like symmetric to the zero with the former one. We dont know how to explain the

symmetry phenomenon, but we think if we try to hold our tongue and the jaw at the same position

and keep still will help to keep the stationary of K parameters!

Write your own code to estimate the frequency of the first three formants of

each of the vowels. Perform pairwise comparison between:

/a/ and /i/

/i/ and /u/

and

In this question, we write a code to find out the peaks, but if we just use the findpeaks function

in the matlab, there will be some problem about the small peaks or the false peaks.

Therefore, after discussing with our classmates, we decide to group some peaks together and

find out the true peak. The main idea of our code is that we will use the mean of each frame to be the

figure which we want to find out the peaks. First we will use the matlab function to find the peaks, but

this will contain many false peaks, so we have to filter them by ourselves. We adjust a distance of the

frequency to group peaks. In the code, we will set two condition in the parameter check, if check

equals to zero which means they are not belong to the same group peaks, otherwise, they are in the

same group.

Therefore, we will continue update the peak in the same group with the maximum one to be the

true peak. The code is below:

H_mean = mean(Htmp,2);

[pks,locs] = findpeaks(H_mean);

check = 0;

pks_new = [];

locs_new = [];

for i= 1: size(pks,1)-1

if ff(locs(i+1))-ff(locs(i))< 200 %frequency distance less than 200

if check == 0

check = 1; %we are checking the same group

[p_tmp, index] = max([pks(i), pks(i+1)]);

i_tmp = i-1+index;

elseif check == 1 % still in the same group

[p_tmp, index] = max([p_tmp, pks(i+1)]);

if index == 1

i_tmp = i_tmp;

elseif index == 2

i_tmp = i-1+index;

end

end

else

if check == 0

check = check;

pks_new = [pks_new; pks(i)];

locs_new = [locs_new; locs(i)];

elseif check == 1

check = 0;

pks_new = [pks_new; p_tmp];

locs_new =[locs_new; locs(i_tmp)];

end

end

end

peaks(:,1) = pks_new;

peaks(:,2) = locs_new;

[~, index_sort] = sort(peaks(:,1), 'descend');

[i_final, ~ ] = sort(index_sort(1:3));

f = ff(peaks(i_final(:),2));

After we find out the true peaks, we sort it through the frequency and pick the largest three

peaks to be the first three formants, though this method is not robust enough. If there is a better

solution, we will fix this part in the future.

The following are the average of the pronunciation /a/, /i/, /u/, and , in these figures we could

find the peaks are almost the true peaks.

/a/ /i/

/u/

After getting this result, we also compare with the true frequency position of the

pronunciation /a/, /i/, /u/, the information we find is below:

In this result, we could easily find that the pronunciation /a/ and /i/ are similar to the true

frequency we find. However, the pronunciation /u/ is a little different from the information we find,

but we think this is due to the difference from person to person, if we look at the wave trend, it is also

similar to the information we find, too.

Then, we use the peaks we find out the calculate the f2-f1 v.s. f1 and the f3-f1 v.s. f1. To see the

difference between each other. The reason why we look at f3 is due to f1 and f2 are usually close and

not too far, so we want to see if f1 and f2 of two different vocal tracts are close, what about the f3 of

these two vocal tracts.

/a/ and /i/

/a/ f3 - f1 v.s. f1 /i/ f3 f1 v.s. f1

and

/i/ f2 - f1 v.s. f1 f2 - f1 v.s. f1

After getting these results, we could compare with the result in teachers ppt in the course below:

Though the result is not totally the same, the positions are relatively the same. If we plot them into

one figure, the result is below:

We think this result is very similar to the figure above, the positions are relatively correct.

Discussion

In this homework, I think we have known how to calculate the linear prediction and also the to

get the parameters about it. However, we still a little curious about the implement, hope we could

implement it on our final successfully. In this home work we know the most is how to analysis the

result and each parameters meaning!

If we could combine it with other emotion to apply on our final, we think the effect will be funny

and useful!

- Music Theory FundamentalsUploaded byChristy Kuruvilla Thomas Malathettu
- 254_ProcUploaded byEnTe Saxophonist
- PS1-6.pdfUploaded bycyrcos
- Brains and Music, Whales and Apes, Hearing and Learning . . . and MoreUploaded byjuan marti
- Psw Gtr173 03 MartellUploaded byDragana Milosevic
- Hadenability Prediction Based on Chemical CompositionUploaded bykoradiakishan
- Neural Network Applications in FinanceUploaded bymario2008
- Bolivia - TromboneUploaded bymanu
- IJAIEM-2013-12-10-021Uploaded byAnonymous vQrJlEN
- V2I1-0016Uploaded bykshitij_gaur13
- Stock Market AnalysisUploaded byIJRASETPublications
- Chapter 6 Product SpecificationsUploaded byhieudaivuong
- Test CheckUploaded byChrisYap
- testjeUploaded byLucas Vogels
- [Jurnal] Frequent Route Based Continuous Moving Object Location- And Density Prediction on Road NetworksUploaded bySyaiful Anwar
- As Vesta WasUploaded byjharney1
- Predicting Road Cycling Ride DurationUploaded byKosio Varbenov
- EasyDM Visual Introduction DMFUploaded byHuliman Hu Yu Feng
- fldrc infographicUploaded byapi-240414995
- Econometrics ModelsUploaded byPhan Vha
- W3 DQ FinalUploaded byWust53
- jUploaded byapi-258292510
- 12 term 3 science inquiry rubrics 2012Uploaded byapi-308993660
- Cognitive representation of ‘‘musical fractals”: Processing hierarchy and recursion in the auditory domainUploaded byGianina
- i u - week 1Uploaded byapi-242286143
- Limitation of Apriori AlgoUploaded byGarima Saraf
- Ue Teoria JocurilorUploaded byTEO2ROCK2331
- Liquidity Modeling Real Estate Survival AnalysisUploaded byEmerson Alves Silva
- Modified Model of Predicting Traffic using KNN and Euclidean DistanceUploaded byAnonymous vQrJlEN
- Muestra Business Resul Pre Interm SBUploaded byclau a

- Sanskrit Metres of PoetryUploaded byshahshyam
- Beginners' Book in NorseUploaded bykurtstallings
- Dutch PronunciationUploaded bym_oanam
- HISTORY OF THE ENGLISH LANGUAGEUploaded byEva Luna
- How to Scan Poetry in LatinUploaded byOana Ţenter
- Read JapaneseUploaded bykin
- Progress Test 03Uploaded byBernard
- Essentials Off Ren 00 to Rou of tUploaded byعادل محمد
- english lyric diction workbookUploaded byLaydson
- Rousseau = ''Origin of Languages''Uploaded bya thinking reed....
- An Acoustic Study of the Japanese Voiceless Bilabial Fricative-1Uploaded byJordan Peralta
- Urdu VowelsUploaded bySafi Ud Din Khan
- Austronesians LanguagesUploaded byMaximillian Heartwood
- Navi.pdfUploaded bytrdunsworth
- Lesson 4Uploaded byconocimientoyverdad
- 230039964-The-Four-Pronged-Approach.pptxUploaded byCrystalButorLopez
- MCA Lab ManualUploaded byV SATYA KISHORE
- AfroasiaticComparativeVocabulary RatcliffeUploaded byPablo Carreño
- Pre-Calculus Math 40s - Permutations & Combinations - Lesson 1Uploaded byJordy13
- Tips for Pronunciation - Linda Lane.pdfUploaded byKatia Manero
- sound-obj assnsUploaded byapi-374117005
- Dan-Mateiesc-English-Phonetics-and-Phonological-Theoty.pdfUploaded byGreen Bergen
- The Phonology of Ilocano Dialect in Comparison to Tagalog.docxUploaded byMary Joy Dailo
- 978-1-4438-1296-2-sampleUploaded byReemi Al Ward
- Barrick and Busenitz - Hebrew GrammarUploaded byJahaida Yvette Cardona
- Chapter 2 PhonologyUploaded byEsther Ponmalar Charles
- Biblical a PTUploaded byRicardo Bezerra
- An update on the phonology of Gaeilge Chorca DhuibhneUploaded byDmitry Nikolayev
- PERS_160_lessons_of_Persian.pdfUploaded byDiego Novaček
- Cajun French GlossaryUploaded byAustin J Alfred