P. 1


|Views: 38|Likes:
Published by Anand Kumar

More info:

Published by: Anand Kumar on May 03, 2012
Copyright:Attribution Non-commercial


Read on Scribd mobile: iPhone, iPad and Android.
download as DOCX, PDF, TXT or read online from Scribd
See more
See less






17 | P a g e

6. Introduction
6.1 Purpose The purpose of this document is to present the detailed description of the Speaker Recognition System. This report will discuss each stage of the project, the Requirements Specification Phase, the System Design Phase, the Implementation Phase, and the Testing Phase. This report will also discuss Recommendations to the project. 6.2 Intended Audience The different types of reader that the document is intended for can be developers, project managers, testers, documentation writers and users which include the faculty members, the institute staff, students and the alumni of the institute. It is advisable for all audiences except users to have a look through the complete document for a better understanding of this software product. 6.3 Scope of the Project Speaker recognition system is a standalone application. It can be used to restrict access to confidential information. It can be integrated into other systems to provide security. 6.4 References IEEE. IEEE Std 830-1998 IEEE Recommended Practice for Software Requirements Specifications. IEEE Computer Society, 1998.

18 | P a g e

7. Requirement Model
Requirement model is used in the systems and software engineering to refer to a process of building up a specification of the system. In this we will find actors and will make use cases, interface descriptions and problem domain objects. To draw all the relevant diagrams and notations we use UML (Unified Modeling Language). 7.1 User Requirements 7.1.1Functional Requirements a. The user should be able to enter records. b. Each record represents information about a person and contains his/her voice sample. c. Records may consist of: i. First name. ii. Last name. iii. Phone. iv. Address (city-street address). v. Voiceprint. vi. ID-number.

d. The system can do noisy filter on voice signal, which may have come from environment noise or sensitivity of the microphones. e. The system must be able to take voiceprint and user-id (in case of speaker verification) as an input and search for a match the database, and then show the result. f. The result should be viewed by showing the user-id‟s matching the input. g. The user should be able to see his/her full information upon successful identification/verification. 7.1.2 Non Functional Requirements a. Records are maintained in a database. b. Every record shall be allocated a unique identifier (id-number).

19 | P a g e

c. User should be able to retrieve data by entering id and voiceprint on successful identification/verification. d. To improve the performance, the database should store the compressed codebook for each user instead of voiceprint. Voiceprint of user is discarded after calculating codebook.

7.2 System Requirements 7.2.1 Actors with their description Users: Provides the system with a voice sample and expects the system to show match and details about the user. Administrator: Manages the entire speaker recognition system.

20 | P a g e

Enroll User Request Match Edit Information Remove Add Users Administrator Remove Users View Statistics Figure 1: Use Case Diagram 21 | P a g e .

phone number.2. The user requests a voice sample to be matched with a voiceprint in the database and retrieve details about it (on successful verification).2. Administrator can view the performance of the system. Request Match Update Records Remove User/Records View Statistics 7. such as name.2. etc (on successful verification) System allows the administrator to remove user. Allows the user to add or update (remove) the records in the system. The user must provide his/her details and voice sample to the system during enrollment.2 Use Cases with their description Use Case Add Records Description Administrator adds the new users to the system.1 Administrator Use case Add Users Administrator Remove Users View Statistics Figure 12: Administrator Use Case Diagram 22 | P a g e . ID.7.

Post conditions: a.Use Case Name: Add Users Brief Description: Administrator enrolls the users into system. 2. 3. Administrator once again enters user-id. 2. Post conditions: System no longer contains any information about the user. Administrator should be logged into the system. Main flow of events: a. Preconditions: a. User-id given by administrator is not found in the system. Notification appears for confirmation. Notification appears that user is enrolled. The administrator inputs user‟s voice sample. A user-id with relevant details is displayed. Preconditions: a. 5. Administrator should be logged into the system. Main flow of events: a. User is not removed from the system. c. b. 4. administrator selects no. 1. Special Requirements: none Use Case Name: Remove Users Brief Description: Administrator removes the users from system. b. Special Requirements: none 23 | P a g e . Alternate Flow: a. 1. 3. b. b. 4. b. The system must be fully functional and connected to the Database. The administrator inputs user-id into the system. On confirmation for removal of user. User is enrolled. c. The administrator inputs user details into the system. User is removed. The system must be fully functional and connected to the Database. b.

2. b. Special Requirements: none 24 | P a g e . b. 1.Use Case Name: View Statistics Brief Description: Administrator views the performance statistics of the system. The system must be fully functional and connected to the Database. Administrator selects to see the performance statistics. Post conditions: None. 3. Alternate Flow: None 5. Preconditions: a. Statistics is shown. Administrator should be logged into the system. Main flow of events: a. 4.

A user-id with relevant details is displayed. Notification appears that user is enrolled. 3. 1.2. User is enrolled. The user inputs his/her details into the system. Special Requirements: none 25 | P a g e . 2. c. d.2. b.2 User Use case Enroll User Request Match Edit Information Remove Figure 13: User use case diagram Use Case Name: Enroll Brief Description: User enrolls into system. Preconditions: The system must be fully functional and connected to the Database.7. Main flow of events: a. The user inputs his/her voice sample. Post conditions: c. 4.

System asks user to enter his/her user-id and voice sample. 5. b. Main flow of events: a. Notification appears for confirmation. User is removed. Preconditions: The system must be fully functional and connected to the Database. 2. On confirmation for removal of user. Alternate Flow: None 26 | P a g e . b. c. Post conditions: System no longer contains any information about the user. user selects no. c. 2. Main flow of events: a. b. The user inputs his/her user-id into the system. User selects to test. 4. Preconditions: The system must be fully functional and connected to the Database. Matching is done and result is shown to the user. 3. 4. 1. Special Requirements: none Use Case Name: Request Match Brief Description: User enters his/her voice sample and runs the test phase. Post conditions: User is allowed to login into the system. 3. 1. user once again enters user-id.Use Case Name: Remove Brief Description: User removes himself/herself from system. User is not removed from the system. Alternate Flow: a. User-id given by user is not found in the system.

he/she is also presented with Edit Information and Remove buttons. User edits his/her information and selects save. b. damage. Voiceprint Test.4. User selects to edit. 3. The dialog box will have following fields: 27 | P a g e . Main flow of events: a. c.4 User Interfaces In the main menu. These will be Enroll. 2. Remove Users and View Statistics button.3 Safety Requirements There are no safety requirements that concerned. After the user logs in. Post conditions: System contains updated user information.1 Enroll Clicking on the new user button will cause a dialog box to open with the title New User. Special Requirements: none Use Case Name: Edit Information Brief Description: User edits his/her information stored in the system. Preconditions: The system must be fully functional and connected to the Database. User is not allowed to edit his/her already stored voice sample. such as possible loss. 7. 4.5. System displays full detail of the user. 7. Administrator is presented with Add Users. Special Requirements: none 7. or harm that could result from the use of Speaker Recognition System. User must be logged into the system. the user will be presented with several buttons. 1. Alternate Flow: None 5.

Note: the New User dialogue box will remain in the foreground when recording begins. If Delete is clicked. Last name. Two buttons will give the option to return to the main menu (OK button) or perform the test.4. the system will begin recording from the microphone for 10 second. the program will respond with a success. After the countdown is complete. 7. Note: the Voiceprint Test dialogue box will remain in the foreground when recording begins.3 Remove The Remove option will delete a user‟s profile (user-id and voiceprint). 7. Hitting „y‟ and then enter will delete the profile and return the user to the main menu.i. ii. iii. Hitting „n‟ will return the user to the dialogue box without deleting the profile. At the end of the recording.4. Upon clicking on Remove a dialog box will pop up with the title Remove User for confirmation. Cancel will bring the user back to the main menu. the user will be prompted with “Are you sure?” The user will then have to either hit „y‟ for yes or „n‟ for no.4 Statistics The display of statistics is an element that will be given flexibility. If the recording was successful then the user will be returned to the main menu.4. First name. Enroll and Cancel. Phone. At this time. or error. the only requirements are that performance statistics be available. A dialogue box will pop up with the title Voiceprint Test. Address (city-street address). 7. 28 | P a g e . It will also contain the two buttons. iv. Cancel will return the user to the main menu with no user created. Recording will be carried out in the same way specified for enrollment but the responses at the end will be different. If an error occurred during recording (for example silence) a descriptive message will be displayed (for example no sound recorded) and the dialogue box will remain. There will also be the buttons Cancel and Delete. fail. Enroll will prompt the user to speak so as to record his/her voice and give a countdown starting from 2 seconds.2 Voiceprint Match Clicking on Test will allow users to test their voiceprint with the implemented verification algorithm.

7. it requires Matlab runtime to function properly. It requires Microsoft XP Service Pack 3 and above to run.6 Software Interfaces Speaker Recognition System is built for windows operating system. Since the software is built on Matlab. 29 | P a g e .7.5 Hardware Interfaces Speaker Recognition System requires access to system‟s microphone to capture user voice.

Problem Domain Object (PDO) User Administrator VoicePrint Figure 14: Problem Domain Object 30 | P a g e .

Sequence diagrams realize the use cases by describing the flow of events in the use cases when they are executed. Interface objects: Models behavior and information that is dependent on the interface to the system. Control objects: Models functionality that is not naturally tied to any other object. 31 | P a g e . It aims to structure the system independently of the actual implementation environment. Behavior consists of operating on several different entity objects. Example: A person with the associated data and behavior. Focus is on the logical structure of the system The following are the three types of analysis objects into which the use cases can be classified: Analysis Objects Interface Objects Entity Objects Control Objects Figure 15: Types of Analysis Object Entity objects: Information to be held for a longer time all behavior naturally coupled to information. Example: User interface functionality for requesting information about a person. These use case realizations model how the parts of the system interact within the context of a specific use case. The analysis model identifies the main classes in the system and contains a set of use case realizations that describe how the system will be built.8. It can be used as the foundation of the design model since it describes the logical structure of the system. Analysis Model The analysis model describes the structure of the system or application that you are modeling. doing some computations and then returning the result to an interface object. Example: Calculate taxes using several different factors. but not how it will be implemented.

Interface Objects MICROPHONE IF. Figure 16: Interface Objects 32 | P a g e .

Entity Objects User Information Voice Sample Figure 17: Entity Objects Control Objects: Figure 18:Control Objects 33 | P a g e .

Start Panel Receive Information User Information <includes> Generate Result Voice Sample Request Match Add/Remove/Edit User Admin Interface User Panel Figure 19: Analysis Model View Statistics 34 | P a g e .

9. Sequence diagrams typically are associated with use case realizations in the Logical View of the system under development. and timing diagrams. Sequence diagrams are sometimes called event diagrams. SEQUENCE DIAGRAMS A sequence diagram in a Unified Modeling Language (UML) is a kind of interaction diagram that shows how processes operate with one another and in what order. event scenarios. A sequence diagram shows object interactions arranged in time sequence. It is a construct of a Message Sequence Chart. Sequence Diagram: User Enrollment Enrollment Profile Feature Extract Codebook Calculation Users Create Request for Voice Sample Voice Sample/Training Speech Voice Sample Acoustic Vectors Acoustic Vectors Codebook Add To User List Return User Id Return User Id Figure 20: Sequence diagram for user enrollment 35 | P a g e . It depicts the objects and classes involved in the scenario and the sequence of messages exchanged between the objects needed to carry out the functionality of the scenario.

Sequence Diagram: Voice Match User Match Voice Feature Extractor Feature Comparator Codebook Request to initiate match Requests for voice and user id input VoicePrint and User Id VoicePrint Acoustic Vector Codebook Result is returned Requests for user's codebook. Input: UserId Result Result Figure 21: Sequence diagram for Voice Match 36 | P a g e .

Sequence Diagram: Edit Information User Authenticator Edit User Database Voice Sample and UserId Successfully logged in User Id & Request to Retrieve Information Information Updated Information UserId User Information Updated Information Success Success Figure 22: Sequence Diagram for Editing Information 37 | P a g e .

most transitions are caused by internal events. except that it has a few additional symbols and is used in a different context. Request Matching Figure 24: Activity Diagram for voice matching 38 | P a g e . such as the completion of an activity. Activity Diagrams An activity diagram is like a state diagram. In a state diagram. An activity diagram is used to understand the flow of work that an object or component performs.10. most transitions are caused by external events. Enroll new user Figure 23: Activity Diagram for enrolling new user 2. It can also be used to visualize the interaction between different use cases. however‟ in an activity diagram. 1.

Update user information Figure 26: Activity Diagram for updating user information 39 | P a g e .3. Remove User Figure 25: Activity Diagram for removing user 4.

Figure 27: High Level Block Diagram of Speaker Verification System 11. This subsystem performs analog-to-digital conversion. Design Model 11. By analyzing this training speech.1 User Enrollment Module The User Enrollment Module is used when a new user is added to the system. and perform and signal conditioning necessary. This model will be used later in the User Verification Module.1 High Level Design There are two main modules in this speaker recognition system: The User Enrollment Module and the User Verification Module.2 Feature Extraction Subsystem The feature extraction subsystem analyzes the user‟s digitized voice signal and creates a series of values to use as a model for the user‟s speech pattern. 11. 11. the module outputs a model that parameterizes the user‟s voice. Signal Preprocessing Subsystem The signal preprocessing subsystem conditions the raw speech signal and prepares it for subsequent manipulations and analysis.1. This module is used to essentially “teach” the system the new user‟s voice. The input of this module is a voiceprint of the user along with other details. 40 | P a g e .1.

the system will take the similarity factor found in the Feature Comparison Subsystem. This sensitivity value is called the threshold and needs to be generated whenever a new user is enrolled. The extracted speech parameterization data is then compared to the stored model. a form of data compression is used.3 Verification Module The User Verification Module is used when the system tries to verify a user.” The module performs the same signal pre-processing and feature extraction as the User Enrollment Module.2 Threshold Generation Module This module is used to set the sensitivity level of the system for each user enrolled in the system. Based on the similarity. implementation of this module is suspended due to timing constraint. 11. running this module will essentially invoke a user verification session. 11. However. a verdict will be given to indicate whether the user has passed or failed the voice verification test. In order to store this data effectively.1. This module can also be invoked when a user feels they are receiving too many false rejections and wants to re-calculate an appropriate sensitivity level.11. 41 | P a g e .3 Feature Data Compression Subsystem The disk size required for the model created in the Feature Extraction subsystem will be significant when many users are enrolled in the system. Scaling the value up will hopefully account for any variances in future verification sessions.2. This similarity factor will be scaled-up and then saved as the threshold value.1. After the model is compressed. After a user enrolls with the system.1. instead of receiving a pass or fail verdict.1. This utterance is referred to as the “testing speech.1. The system will then prompt the user to say something. As of now.1 Threshold Generation Subsystem This subsystem will set the user threshold to a scaled-up version of the similarity factor determined in the Feature Comparison Subsystem. 11. and use it to determine the threshold value. The user informs the system that he or she is a certain user. it will be stored for later use in the User Verification Module. This module is required for speaker verification functionality.

The silence at the beginning and end of the speech sample will be removed. 11. 11. 11. After comparing all the data.2 Feature Extraction Subsystem Input: Digital speech signal (one vector containing all sampled values) Output: A set of acoustic vectors Figure 29: Feature Extraction Subsystem Low-Level Block Diagram Mel-Cepstral Coefficients will be used to parameterize the speech sample and voice.2 Low Level Design The following section describes the information used for implementation of each subsystem.1 Feature Comparison Subsystem After the Feature Extraction Subsystem parameterizes the training speech.3. 42 | P a g e .2. a similarity factor will be produced.1.2 Decision Subsystem Based on the similarity factor produced by the Feature Comparison Subsystem.1 Signal Preprocessing Subsystem Input: Raw speech signal Output: Digitized and conditioned speech signal (one vector containing all sampled values) Figure 28: Signal Preprocessing Subsystem Low-Level Block Diagram The sampling will produce a digital signal in the form of a vector or array.1. this data is compared to the model of the user stored on disk.3. a verdict will be given by this subsystem to indicate whether the user has passed or failed the voice verification test. 11. and the user‟s threshold value.11.2.

Each block will be windowed to minimize spectral distortion and discontinuities.2.The original vector of sampled values will be framed into overlapping blocks. A Hamming window will be used. or Distortion Factor. The Fast Fourier Transform will then be applied to each windowed block as the beginning of the Mel-Cepstral Transform. the Discrete Cosine Transform will be applied to each Mel Spectrum to convert the values back to real values in the time domain.2. This minimum Euclidean Distance. 11. the spectral coefficients of each block are generated.3 Feature Compression Subsystem Inputs: A set of acoustic vectors Output: Codebook Figure 30: Feature Data Compression Subsystem Low-Level Block Diagram The K Means Vector Quantization Algorithm will be used. The mel-scale is a logarithmic scale similar to the way the human ear perceives sound. is then stored until the 43 | P a g e .4 Feature Data Comparison Subsystem Inputs: Set of acoustic vectors from testing speech. 4. The Mel Frequency Transform will then be applied to each spectral block to convert the scale to a mel-scale. The codeword closest to each test vector is found based on Euclidean distance. codebook Outputs: average distortion factor Figure 31: Comparison Subsystem Low-Level Block Diagram The acoustic vectors generated by the testing voice signal will be individually compared to the codebook. After this stage. Finally.

2. The Average Distortion Factor is then found and normalized.Distortion Factor for each test vector has been calculated.5 Decision Subsystem Inputs: Average distortion factor. User specific threshold Outputs: Verdict 44 | P a g e . Figure 32: Distortion Calculation Algorithm Flow Chart 11.

Figure 33: Comparison Subsystem Low-Level Block Diagram 45 | P a g e .

This toolbox provides functions that calculate MelFrequency Coefficients and performs vector quantization. Alternative Options There is more than one way to perform speaker recognition. AI based: Hidden Markov Models.12.2 Feature Matching Alternatives Dynamic Time Warping: Accounts for inconsistencies in the rate of speech by stretching or compressing parts of the signal in the time domain. The list of alternatives below is in no way a complete listing. The methods chosen for this project were mostly chosen because of their implementability and low complexity. and Neural Networks. Implementation 13.1 Platform Matlab was chosen as the platform for ease of implementation. 13. Voicebox. Discrete Wavelet Transform Delta-Cepstrum: Analyses changing tones 12. was used. 46 | P a g e . Used for text-independent recognition. 12. A third party GNU Matlab toolbox.1 Feature Extraction Alternatives Linear Prediction Cepstrum: Identifies the vocal track parameters. Gaussian Mixture Models.

Our speaker verification system consists of two sections: (i) Enrolment section to build a database of known speakers and (ii) Unknown speaker identification system. The second part involves pulling each person's data together into a single. each registered speaker has to provide samples of their speech so that the system can build or train a reference model for that speaker. we have developed a text-independent speaker identification system that is a system that identifies a person who speaks regardless of what he/she is saying. Enrollment session is also referred to as training phase while the unknown speaker identification system is also referred to as the operation session or testing phase. 47 | P a g e .Conclusion In this project. Future Work Currently. Other techniques may be used for implementing this application to minimize the false-acceptance and false-rejection rate. The first part consists of processing each persons input voice sample to condense and summarize the characteristics of their vocal tracts. easily manipulated matrix. above calculated matrix is used recognition. this application lacks easy-to-use user interface. In the training phase. This application can be extended to provide user interface and also this application can be fine tuned to meet realtime constraint. In the testing phase. It consists of two main parts.

Snapshots Figure 34: Matlab Command Window 48 | P a g e .

Figure 35: Matlab Editor 49 | P a g e .

50 | P a g e .

com/issues_v13/issue_2/EL_13_2_9.REFERENCES 1. Jerica Urias. Miguel Lopez.engineeringletters.htm 2. Daniel Solano. Patricia Melin. Bromba. Miguel Soto.2.speakerrecognition. Manfred U.pdf Code 51 | P a g e .bromba. the free encyclopedia.com/faq/biofaqe. "Biometrics: Frequently Asked Questions" http://www. Engineering Letters.wikipedia. “Voice Recognition with Neural Networks. Type-2 Fuzzy Logic and Genetic Algorithms”. A. EL_13_2_9. 13:2. Wikipedia. and Oscar Castillo. http://www. Douglas A.pdf 4.edu/publications/journal/pdf/vol08_no2/8. Reynolds.org/wiki/Biometrics 3.mit. “Automatic Speaker Recognition Using Gaussian Mixture Speaker Models” http://www. " Biometrics" http://en.4.ll.

kmeans. % Display results . melcepst.cc. % sampling frequency % number of centroids % Load data disp('Reading data for training:') [train. and find the lowest distortion %fprintf('\nComputing a distance measure for each codebook.ac.name.cc) with all the codebooks in database and identify the person with % the lowest distance fprintf('\nDisplaying the result:\n') dispresult(name. test.kmeans.m. Free toolbox for MATLab.m function speakerTest(a) voiceboxpath='C:/Users/test/voicebox'.fs.index) 52 | P a g e .m and rfft.C).\n') [result index] = distmeasure(train.cc with all the codebooks in % database.test.result.speakerTest.data.'x').data. % name of people in the database fs = 16000.'anand'].m from % VOICEBOX are used in this program.uk/CPACTsoftware/MatlabLinks. addpath(voiceboxpath).m. melbankm.data = wavread(a). % Calculate mel-frequecny cepstral coefficients for training set fprintf('\nCalculating mel-frequency cepstral coefficients for training set:\n') [train. % read the test file name = ['vimal'.m.kmeans] = kmean(train. % Compute average distances between test.cc = melcepst(test.m. a is a string of the filename to be tested against the % database of sampled voices and it will be evaluated whose voice it is. % Calculate mel-frequecny cepstral coefficients for test set %fprintf('\nCalculating mel-frequency cepstral coefficients for test set:\n') test. C = 8. % Mike Brooks. VOICEBOX.fs).cc). % Perform K-means algorithm for clustering (Vector Quantization) fprintf('\nApplying Vector Quantization(K-means) for feature extraction:\n') [train.html % disteusq. % www. rdct.m.average distances between the features of unknown voice % (test. % A speaker recognition program.ncl.data] = Load_data(name). enframe.cc] = mfcc(train.

temp = sum(min(dist{i}))/size(dist{i}.1:k). end end dispresult.index] = distmeasure(x. if temp < mins mins = temp. mins = inf. 'c'.Load_data.1).1:k).:) '.m function dispresult(x.1). 'b'. for i = 1:size(x.1).Load all the wave files to database (codebooks) % data = cell(size(name. result{i} = temp.2).1).m function [result.wav'].1) temp = [name(i.1).y(:. data{i} = tempwav. 'g'.1).y. k=size(x. end distmeasure. for i = 1:size(x. index = i. dist = cell(size(x. 'k'].2). tempwav = wavread(temp).z) disp('The average of Euclidean distances between database and test wave file') color = ['r'.:)) disp(y{i}) end disp('The test voice is most likely from') 53 | P a g e .m function [data] = Load_data(name) % Training mode . for i=1:size(name.1) dist{i} = disteusq(x{i}(:.1) disp(x(i.'x'). 'm'.y) result = cell(size(x.

m function [cepstral] = mfcc(x.1).1).fs.1) disp(y(i.FL.1:12).kmeans.1). melcepst. 0th cepstral coef.fs.FH) % % % Simple use: c=melcepst(s.'x'). for i = 1:size(x. train.'e0dD') % include log energy.INC.inc.fs) % calculate mel cepstrum with 12 coefs.fs) % Calculate mfcc's with a frequency(fs) and store in ceptral cell.x{i}] = kmeans(x{i}(:.C).j{i} train. for i = 1:size(x.FS.p.m function c=melcepst(s.m function [data] = kmean(x. Display % y at a time when x is calculated cepstral = cell(size(x.kmeans.:)) mfcc.fl.x = cell(size(x.y.fh) %MELCEPST Calculate the mel cepstrum of a signal C=(S.x.kmeans. end kmean. end data = train.nc.kmeans.NC.:)) cepstral{i} = melcepst(x{i}. 256 sample frames % c=melcepst(s.j = cell(size(x.kmeans.1).W. delta and delta-delta coefs % % Inputs: % s speech signal 54 | P a g e .esql = cell(size(x.fs.w.N.kmeans.n.1). train.1).1) [train.P.1).C) % Calculate k-means for x with C number of centroids train.disp(x(z.1).

% if nargin<2 fs=11025.% fs sample rate in Hz (default 11025) % nc number of cepstral coefficients excluding 0'th coefficient (default 12) % n length of frame in samples (default power of 2 < (0. the total power in the fft is preserved.5.1 per ocatave) % inc frame increment (default n/2) % fl low end of the lowest filter as a fraction of fs (default = 0) % fh high end of highest filter as a fraction of fs (default = 0.03*fs)) % p number of filters in filterbank (default: floor(3*log(fs)) = approx 2. if requested.5) % % w any sensible combination of the following: % % 'R' rectangular window in time domain % 'N' Hanning window in time domain % 'M' Hamming window in time domain (default) % % 't' triangular shaped filters in mel domain (default) % 'n' hanning shaped filters in mel domain % 'm' hamming shaped filters in mel domain % % 'p' filters act in the power domain % 'a' filters act in the absolute magnitude domain (default) % % '0' include 0'th order cepstral coefficient % 'E' include log energy % 'd' include delta coefficients (dc/dt) % 'D' include delta-delta coefficients (d^2c/dt^2) % % 'z' highest and lowest filters taper down to zero (default) % 'y' lowest filter remains at 1 down to 0 frequency and % highest filter remains at 1 up to nyquist freqency % % If 'ty' or 'ny' is specified. end if nargin<3 w='M'. end if nargin<6 n=pow2(floor(log2(0. end if nargin<5 p=floor(3*log(fs)). Log energy. if nargin<8 55 | P a g e . is the % first element of each row followed by the delta and then the delta-delta % coefficients.03*fs))). end if nargin<9 fh=0. % % Outputs: c mel cepstrum output: one frame per row. end if nargin<4 nc=12.

1).fh.').nc+1:end)=[].fl.hamming(n).inc). y=log(max(m*abs(f(a:b. end c=rdct(y). end f=rfft(z.n.nc-p)]. pth=max(pw(:))*1E-20. if p>nc c(:.' c].w). nc=nc-1.'. else z=enframe(s. end if any(w=='R') z=enframe(s.hanning(n).1)=[].ath)).:). end end end if isempty(w) w='M'. if any(w=='p') y=log(max(m*pw. nf=size(c.*conj(f(a:b.inc). pw=f(a:b. [m.pth)).fl=0. elseif any (w=='N') z=enframe(s. if nargin<7 inc=floor(n/2).fs. else ath=sqrt(pth).a. end % calculate derivative 56 | P a g e . nc=nc+1.:)). end if ~any(w=='0') c(:. nc=nc+1.b]=melbankm(p.:)). elseif p<nc c=[c zeros(nf.n. end if any(w=='E') c=[log(sum(pw)).inc).

vx=reshape(filter(vf.nc).:)=[].ci. end 57 | P a g e .').1.nf+10. colorbar. ww=ones(5. cx=[c(ww.if any(w=='D') vf=(4:-1:-4)/60. c=[c vx].1).nc). end if nargout<1 [nf. c(nf*ww. c.nf+2. axis('xy').:)=[]. if any(w=='d') c=[c vx ax].:).1. end elseif any(w=='d') vf=(4:-1:-4)/60.:)].nc). vx(1:8.1).nc]=size(c). map = (0:63)'/63. c(nf*ww. cx=[c(ww. vx(1:8. vx([1 nf+2].c.:). else c=[c ax].:)=[]. ylabel('Mel-cepstrum coefficient'). ww=ones(4. xlabel('Time (s)').vx(:)). af=(1:-1:-1)/2.:)].cx(:)).cx(:)). t=((0:nf-1)*inc+(n-1)/2)/fs. ax(1:2. colormap([map map map]).nf+8.:)=[].1. c. ci=(1:nc)-any(w=='0')-any(w=='E'). ax=reshape(filter(af. imh = imagesc(t. vx=reshape(filter(vf.

You're Reading a Free Preview

/*********** DO NOT ALTER ANYTHING BELOW THIS LINE ! ************/ var s_code=s.t();if(s_code)document.write(s_code)//-->