SOFTWARE REQUIREMENT SPECIFICATION

17 | P a g e

6. Introduction
6.1 Purpose The purpose of this document is to present the detailed description of the Speaker Recognition System. This report will discuss each stage of the project, the Requirements Specification Phase, the System Design Phase, the Implementation Phase, and the Testing Phase. This report will also discuss Recommendations to the project. 6.2 Intended Audience The different types of reader that the document is intended for can be developers, project managers, testers, documentation writers and users which include the faculty members, the institute staff, students and the alumni of the institute. It is advisable for all audiences except users to have a look through the complete document for a better understanding of this software product. 6.3 Scope of the Project Speaker recognition system is a standalone application. It can be used to restrict access to confidential information. It can be integrated into other systems to provide security. 6.4 References IEEE. IEEE Std 830-1998 IEEE Recommended Practice for Software Requirements Specifications. IEEE Computer Society, 1998.

18 | P a g e

7. Requirement Model
Requirement model is used in the systems and software engineering to refer to a process of building up a specification of the system. In this we will find actors and will make use cases, interface descriptions and problem domain objects. To draw all the relevant diagrams and notations we use UML (Unified Modeling Language). 7.1 User Requirements 7.1.1Functional Requirements a. The user should be able to enter records. b. Each record represents information about a person and contains his/her voice sample. c. Records may consist of: i. First name. ii. Last name. iii. Phone. iv. Address (city-street address). v. Voiceprint. vi. ID-number.

d. The system can do noisy filter on voice signal, which may have come from environment noise or sensitivity of the microphones. e. The system must be able to take voiceprint and user-id (in case of speaker verification) as an input and search for a match the database, and then show the result. f. The result should be viewed by showing the user-id‟s matching the input. g. The user should be able to see his/her full information upon successful identification/verification. 7.1.2 Non Functional Requirements a. Records are maintained in a database. b. Every record shall be allocated a unique identifier (id-number).

19 | P a g e

c. User should be able to retrieve data by entering id and voiceprint on successful identification/verification. d. To improve the performance, the database should store the compressed codebook for each user instead of voiceprint. Voiceprint of user is discarded after calculating codebook.

7.2 System Requirements 7.2.1 Actors with their description Users: Provides the system with a voice sample and expects the system to show match and details about the user. Administrator: Manages the entire speaker recognition system.

20 | P a g e

Enroll User Request Match Edit Information Remove Add Users Administrator Remove Users View Statistics Figure 1: Use Case Diagram 21 | P a g e .

2.7. phone number. such as name. Administrator can view the performance of the system. ID.2.2 Use Cases with their description Use Case Add Records Description Administrator adds the new users to the system. The user must provide his/her details and voice sample to the system during enrollment.2.1 Administrator Use case Add Users Administrator Remove Users View Statistics Figure 12: Administrator Use Case Diagram 22 | P a g e . Allows the user to add or update (remove) the records in the system. Request Match Update Records Remove User/Records View Statistics 7. etc (on successful verification) System allows the administrator to remove user. The user requests a voice sample to be matched with a voiceprint in the database and retrieve details about it (on successful verification).

b. c. b. Administrator should be logged into the system. User is removed. Alternate Flow: a. 3. Preconditions: a. Notification appears for confirmation. 1. b. User is enrolled. Administrator once again enters user-id. Special Requirements: none 23 | P a g e . Post conditions: a. The administrator inputs user details into the system. 4. Administrator should be logged into the system. 3. User-id given by administrator is not found in the system. Preconditions: a. administrator selects no. The administrator inputs user-id into the system. Main flow of events: a. Main flow of events: a. 1. 2. The administrator inputs user‟s voice sample. 4. c. 2. User is not removed from the system. b. On confirmation for removal of user. Notification appears that user is enrolled. A user-id with relevant details is displayed. The system must be fully functional and connected to the Database.Use Case Name: Add Users Brief Description: Administrator enrolls the users into system. b. 5. b. Special Requirements: none Use Case Name: Remove Users Brief Description: Administrator removes the users from system. The system must be fully functional and connected to the Database. Post conditions: System no longer contains any information about the user.

The system must be fully functional and connected to the Database.Use Case Name: View Statistics Brief Description: Administrator views the performance statistics of the system. Statistics is shown. Post conditions: None. b. Alternate Flow: None 5. b. Special Requirements: none 24 | P a g e . 1. Preconditions: a. 3. Main flow of events: a. 2. Administrator should be logged into the system. 4. Administrator selects to see the performance statistics.

7. Post conditions: c. Special Requirements: none 25 | P a g e . d. A user-id with relevant details is displayed.2. The user inputs his/her details into the system. 3. Preconditions: The system must be fully functional and connected to the Database. c.2.2 User Use case Enroll User Request Match Edit Information Remove Figure 13: User use case diagram Use Case Name: Enroll Brief Description: User enrolls into system. Main flow of events: a. 1. Notification appears that user is enrolled. The user inputs his/her voice sample. b. 2. 4. User is enrolled.

4. 3. 4. c. Post conditions: User is allowed to login into the system. User is not removed from the system. User is removed. System asks user to enter his/her user-id and voice sample. The user inputs his/her user-id into the system. Preconditions: The system must be fully functional and connected to the Database. b. b. Matching is done and result is shown to the user. On confirmation for removal of user. Preconditions: The system must be fully functional and connected to the Database. Post conditions: System no longer contains any information about the user. Notification appears for confirmation. user selects no. user once again enters user-id. User-id given by user is not found in the system. Alternate Flow: None 26 | P a g e . Main flow of events: a. User selects to test. Main flow of events: a.Use Case Name: Remove Brief Description: User removes himself/herself from system. c. 1. Special Requirements: none Use Case Name: Request Match Brief Description: User enters his/her voice sample and runs the test phase. 2. Alternate Flow: a. 1. 2. 5. 3. b.

Remove Users and View Statistics button. he/she is also presented with Edit Information and Remove buttons. the user will be presented with several buttons. User selects to edit. System displays full detail of the user. Main flow of events: a. or harm that could result from the use of Speaker Recognition System. User edits his/her information and selects save. b. Special Requirements: none Use Case Name: Edit Information Brief Description: User edits his/her information stored in the system. 1. User must be logged into the system. 7. Administrator is presented with Add Users. 7. Voiceprint Test. c. The dialog box will have following fields: 27 | P a g e .3 Safety Requirements There are no safety requirements that concerned. Alternate Flow: None 5. 2. 4.4 User Interfaces In the main menu. User is not allowed to edit his/her already stored voice sample. 3. such as possible loss. After the user logs in. Special Requirements: none 7. Preconditions: The system must be fully functional and connected to the Database. Post conditions: System contains updated user information. These will be Enroll.5.1 Enroll Clicking on the new user button will cause a dialog box to open with the title New User. damage.4.

4. the only requirements are that performance statistics be available. At this time. First name. Enroll will prompt the user to speak so as to record his/her voice and give a countdown starting from 2 seconds.4. At the end of the recording. the program will respond with a success. Last name. A dialogue box will pop up with the title Voiceprint Test. Note: the New User dialogue box will remain in the foreground when recording begins. There will also be the buttons Cancel and Delete. fail.i. 28 | P a g e . iii.4 Statistics The display of statistics is an element that will be given flexibility. Hitting „n‟ will return the user to the dialogue box without deleting the profile. Hitting „y‟ and then enter will delete the profile and return the user to the main menu. Cancel will bring the user back to the main menu. If Delete is clicked.3 Remove The Remove option will delete a user‟s profile (user-id and voiceprint). Upon clicking on Remove a dialog box will pop up with the title Remove User for confirmation. iv. Two buttons will give the option to return to the main menu (OK button) or perform the test. 7. or error. 7. 7. After the countdown is complete. ii. Recording will be carried out in the same way specified for enrollment but the responses at the end will be different. If the recording was successful then the user will be returned to the main menu. If an error occurred during recording (for example silence) a descriptive message will be displayed (for example no sound recorded) and the dialogue box will remain. Phone. It will also contain the two buttons. the system will begin recording from the microphone for 10 second. Cancel will return the user to the main menu with no user created.4. Address (city-street address).2 Voiceprint Match Clicking on Test will allow users to test their voiceprint with the implemented verification algorithm. Note: the Voiceprint Test dialogue box will remain in the foreground when recording begins. the user will be prompted with “Are you sure?” The user will then have to either hit „y‟ for yes or „n‟ for no. Enroll and Cancel.

Since the software is built on Matlab.5 Hardware Interfaces Speaker Recognition System requires access to system‟s microphone to capture user voice. It requires Microsoft XP Service Pack 3 and above to run. 7.7. 29 | P a g e . it requires Matlab runtime to function properly.6 Software Interfaces Speaker Recognition System is built for windows operating system.

Problem Domain Object (PDO) User Administrator VoicePrint Figure 14: Problem Domain Object 30 | P a g e .

Example: Calculate taxes using several different factors. 31 | P a g e . Example: User interface functionality for requesting information about a person. The analysis model identifies the main classes in the system and contains a set of use case realizations that describe how the system will be built. doing some computations and then returning the result to an interface object. It can be used as the foundation of the design model since it describes the logical structure of the system. Analysis Model The analysis model describes the structure of the system or application that you are modeling. These use case realizations model how the parts of the system interact within the context of a specific use case. Sequence diagrams realize the use cases by describing the flow of events in the use cases when they are executed.8. but not how it will be implemented. It aims to structure the system independently of the actual implementation environment. Control objects: Models functionality that is not naturally tied to any other object. Example: A person with the associated data and behavior. Focus is on the logical structure of the system The following are the three types of analysis objects into which the use cases can be classified: Analysis Objects Interface Objects Entity Objects Control Objects Figure 15: Types of Analysis Object Entity objects: Information to be held for a longer time all behavior naturally coupled to information. Interface objects: Models behavior and information that is dependent on the interface to the system. Behavior consists of operating on several different entity objects.

Figure 16: Interface Objects 32 | P a g e .Interface Objects MICROPHONE IF.

Entity Objects User Information Voice Sample Figure 17: Entity Objects Control Objects: Figure 18:Control Objects 33 | P a g e .

Start Panel Receive Information User Information <includes> Generate Result Voice Sample Request Match Add/Remove/Edit User Admin Interface User Panel Figure 19: Analysis Model View Statistics 34 | P a g e .

Sequence diagrams typically are associated with use case realizations in the Logical View of the system under development. Sequence Diagram: User Enrollment Enrollment Profile Feature Extract Codebook Calculation Users Create Request for Voice Sample Voice Sample/Training Speech Voice Sample Acoustic Vectors Acoustic Vectors Codebook Add To User List Return User Id Return User Id Figure 20: Sequence diagram for user enrollment 35 | P a g e . event scenarios. It is a construct of a Message Sequence Chart.9. SEQUENCE DIAGRAMS A sequence diagram in a Unified Modeling Language (UML) is a kind of interaction diagram that shows how processes operate with one another and in what order. It depicts the objects and classes involved in the scenario and the sequence of messages exchanged between the objects needed to carry out the functionality of the scenario. and timing diagrams. A sequence diagram shows object interactions arranged in time sequence. Sequence diagrams are sometimes called event diagrams.

Sequence Diagram: Voice Match User Match Voice Feature Extractor Feature Comparator Codebook Request to initiate match Requests for voice and user id input VoicePrint and User Id VoicePrint Acoustic Vector Codebook Result is returned Requests for user's codebook. Input: UserId Result Result Figure 21: Sequence diagram for Voice Match 36 | P a g e .

Sequence Diagram: Edit Information User Authenticator Edit User Database Voice Sample and UserId Successfully logged in User Id & Request to Retrieve Information Information Updated Information UserId User Information Updated Information Success Success Figure 22: Sequence Diagram for Editing Information 37 | P a g e .

such as the completion of an activity. most transitions are caused by internal events.10. Request Matching Figure 24: Activity Diagram for voice matching 38 | P a g e . An activity diagram is used to understand the flow of work that an object or component performs. Activity Diagrams An activity diagram is like a state diagram. however‟ in an activity diagram. 1. Enroll new user Figure 23: Activity Diagram for enrolling new user 2. In a state diagram. most transitions are caused by external events. It can also be used to visualize the interaction between different use cases. except that it has a few additional symbols and is used in a different context.

Update user information Figure 26: Activity Diagram for updating user information 39 | P a g e . Remove User Figure 25: Activity Diagram for removing user 4.3.

the module outputs a model that parameterizes the user‟s voice.1. The input of this module is a voiceprint of the user along with other details.1 Signal Preprocessing Subsystem The signal preprocessing subsystem conditions the raw speech signal and prepares it for subsequent manipulations and analysis. This model will be used later in the User Verification Module. 11. Figure 27: High Level Block Diagram of Speaker Verification System 11.2 Feature Extraction Subsystem The feature extraction subsystem analyzes the user‟s digitized voice signal and creates a series of values to use as a model for the user‟s speech pattern. and perform and signal conditioning necessary.1. By analyzing this training speech.1. 40 | P a g e .11. This module is used to essentially “teach” the system the new user‟s voice.1 User Enrollment Module The User Enrollment Module is used when a new user is added to the system.1.1 High Level Design There are two main modules in this speaker recognition system: The User Enrollment Module and the User Verification Module.1. This subsystem performs analog-to-digital conversion. 11. Design Model 11.

The user informs the system that he or she is a certain user. The system will then prompt the user to say something. In order to store this data effectively.2 Threshold Generation Module This module is used to set the sensitivity level of the system for each user enrolled in the system. As of now. 41 | P a g e .1 Threshold Generation Subsystem This subsystem will set the user threshold to a scaled-up version of the similarity factor determined in the Feature Comparison Subsystem. 11. it will be stored for later use in the User Verification Module.1.1.11.1. This utterance is referred to as the “testing speech. This module can also be invoked when a user feels they are receiving too many false rejections and wants to re-calculate an appropriate sensitivity level. Scaling the value up will hopefully account for any variances in future verification sessions. the system will take the similarity factor found in the Feature Comparison Subsystem.1. After the model is compressed. instead of receiving a pass or fail verdict.3 Verification Module The User Verification Module is used when the system tries to verify a user. a verdict will be given to indicate whether the user has passed or failed the voice verification test.3 Feature Data Compression Subsystem The disk size required for the model created in the Feature Extraction subsystem will be significant when many users are enrolled in the system.2. and use it to determine the threshold value. The extracted speech parameterization data is then compared to the stored model. This similarity factor will be scaled-up and then saved as the threshold value. a form of data compression is used. This module is required for speaker verification functionality. However.” The module performs the same signal pre-processing and feature extraction as the User Enrollment Module. After a user enrolls with the system. 11. running this module will essentially invoke a user verification session.1. Based on the similarity. implementation of this module is suspended due to timing constraint. 11. This sensitivity value is called the threshold and needs to be generated whenever a new user is enrolled.

The silence at the beginning and end of the speech sample will be removed.2 Decision Subsystem Based on the similarity factor produced by the Feature Comparison Subsystem.1. 11. this data is compared to the model of the user stored on disk.3. 11.1 Feature Comparison Subsystem After the Feature Extraction Subsystem parameterizes the training speech. After comparing all the data.11.2 Feature Extraction Subsystem Input: Digital speech signal (one vector containing all sampled values) Output: A set of acoustic vectors Figure 29: Feature Extraction Subsystem Low-Level Block Diagram Mel-Cepstral Coefficients will be used to parameterize the speech sample and voice.1.2 Low Level Design The following section describes the information used for implementation of each subsystem. 11.2.2. 11.1 Signal Preprocessing Subsystem Input: Raw speech signal Output: Digitized and conditioned speech signal (one vector containing all sampled values) Figure 28: Signal Preprocessing Subsystem Low-Level Block Diagram The sampling will produce a digital signal in the form of a vector or array.3. a verdict will be given by this subsystem to indicate whether the user has passed or failed the voice verification test. 42 | P a g e . a similarity factor will be produced. and the user‟s threshold value.

codebook Outputs: average distortion factor Figure 31: Comparison Subsystem Low-Level Block Diagram The acoustic vectors generated by the testing voice signal will be individually compared to the codebook. or Distortion Factor.3 Feature Compression Subsystem Inputs: A set of acoustic vectors Output: Codebook Figure 30: Feature Data Compression Subsystem Low-Level Block Diagram The K Means Vector Quantization Algorithm will be used. the spectral coefficients of each block are generated. is then stored until the 43 | P a g e . Each block will be windowed to minimize spectral distortion and discontinuities. A Hamming window will be used. This minimum Euclidean Distance.4 Feature Data Comparison Subsystem Inputs: Set of acoustic vectors from testing speech.The original vector of sampled values will be framed into overlapping blocks. After this stage. Finally.2.2. 4. The Fast Fourier Transform will then be applied to each windowed block as the beginning of the Mel-Cepstral Transform. The mel-scale is a logarithmic scale similar to the way the human ear perceives sound. The Mel Frequency Transform will then be applied to each spectral block to convert the scale to a mel-scale. the Discrete Cosine Transform will be applied to each Mel Spectrum to convert the values back to real values in the time domain. The codeword closest to each test vector is found based on Euclidean distance. 11.

User specific threshold Outputs: Verdict 44 | P a g e . Figure 32: Distortion Calculation Algorithm Flow Chart 11.5 Decision Subsystem Inputs: Average distortion factor.2.Distortion Factor for each test vector has been calculated. The Average Distortion Factor is then found and normalized.

Figure 33: Comparison Subsystem Low-Level Block Diagram 45 | P a g e .

AI based: Hidden Markov Models. A third party GNU Matlab toolbox. Discrete Wavelet Transform Delta-Cepstrum: Analyses changing tones 12.2 Feature Matching Alternatives Dynamic Time Warping: Accounts for inconsistencies in the rate of speech by stretching or compressing parts of the signal in the time domain. Gaussian Mixture Models. 46 | P a g e .12. Voicebox. was used. The list of alternatives below is in no way a complete listing.1 Platform Matlab was chosen as the platform for ease of implementation. Used for text-independent recognition. 12. This toolbox provides functions that calculate MelFrequency Coefficients and performs vector quantization. The methods chosen for this project were mostly chosen because of their implementability and low complexity. 13. and Neural Networks. Alternative Options There is more than one way to perform speaker recognition. Implementation 13.1 Feature Extraction Alternatives Linear Prediction Cepstrum: Identifies the vocal track parameters.

each registered speaker has to provide samples of their speech so that the system can build or train a reference model for that speaker. In the training phase. Enrollment session is also referred to as training phase while the unknown speaker identification system is also referred to as the operation session or testing phase. The first part consists of processing each persons input voice sample to condense and summarize the characteristics of their vocal tracts. easily manipulated matrix. It consists of two main parts. 47 | P a g e . This application can be extended to provide user interface and also this application can be fine tuned to meet realtime constraint. The second part involves pulling each person's data together into a single. Future Work Currently.Conclusion In this project. we have developed a text-independent speaker identification system that is a system that identifies a person who speaks regardless of what he/she is saying. In the testing phase. Other techniques may be used for implementing this application to minimize the false-acceptance and false-rejection rate. this application lacks easy-to-use user interface. Our speaker verification system consists of two sections: (i) Enrolment section to build a database of known speakers and (ii) Unknown speaker identification system. above calculated matrix is used recognition.

Snapshots Figure 34: Matlab Command Window 48 | P a g e .

Figure 35: Matlab Editor 49 | P a g e .

50 | P a g e .

Miguel Soto. Patricia Melin.ll. the free encyclopedia. Reynolds.speakerrecognition.com/faq/biofaqe.pdf 4. 13:2. " Biometrics" http://en.REFERENCES 1.bromba.org/wiki/Biometrics 3.edu/publications/journal/pdf/vol08_no2/8. and Oscar Castillo.pdf Code 51 | P a g e .4. “Voice Recognition with Neural Networks. A. Miguel Lopez. Manfred U. Jerica Urias.mit. Daniel Solano. http://www. Type-2 Fuzzy Logic and Genetic Algorithms”. Douglas A. "Biometrics: Frequently Asked Questions" http://www.wikipedia. “Automatic Speaker Recognition Using Gaussian Mixture Speaker Models” http://www.engineeringletters. EL_13_2_9. Engineering Letters. Wikipedia. Bromba.htm 2.2.com/issues_v13/issue_2/EL_13_2_9.

% Display results . % Mike Brooks.m function speakerTest(a) voiceboxpath='C:/Users/test/voicebox'. kmeans.ncl. Free toolbox for MATLab.cc with all the codebooks in % database.name. a is a string of the filename to be tested against the % database of sampled voices and it will be evaluated whose voice it is.cc) with all the codebooks in database and identify the person with % the lowest distance fprintf('\nDisplaying the result:\n') dispresult(name. VOICEBOX. % Compute average distances between test.m.cc). % Perform K-means algorithm for clustering (Vector Quantization) fprintf('\nApplying Vector Quantization(K-means) for feature extraction:\n') [train.kmeans] = kmean(train. and find the lowest distortion %fprintf('\nComputing a distance measure for each codebook.test. % read the test file name = ['vimal'.m and rfft. % A speaker recognition program.ac.data = wavread(a).cc = melcepst(test. % sampling frequency % number of centroids % Load data disp('Reading data for training:') [train. % Calculate mel-frequecny cepstral coefficients for training set fprintf('\nCalculating mel-frequency cepstral coefficients for training set:\n') [train.speakerTest. rdct.data] = Load_data(name).data. melbankm.m from % VOICEBOX are used in this program.m.'x').kmeans.cc] = mfcc(train. % name of people in the database fs = 16000.fs). melcepst. % www.uk/CPACTsoftware/MatlabLinks.m.'anand'].m.result. test.C).\n') [result index] = distmeasure(train. % Calculate mel-frequecny cepstral coefficients for test set %fprintf('\nCalculating mel-frequency cepstral coefficients for test set:\n') test.cc.m. C = 8.fs.average distances between the features of unknown voice % (test. enframe.data.html % disteusq.index) 52 | P a g e . addpath(voiceboxpath).

dist = cell(size(x.1). for i = 1:size(x.1).wav'].1). 'm'.2).1:k). for i=1:size(name.1). 'g'. 'b'.y. 'c'. result{i} = temp.m function dispresult(x.2).1).y(:.1) disp(x(i. end distmeasure.'x').z) disp('The average of Euclidean distances between database and test wave file') color = ['r'. for i = 1:size(x.m function [data] = Load_data(name) % Training mode .index] = distmeasure(x.1) temp = [name(i. if temp < mins mins = temp.1).:)) disp(y{i}) end disp('The test voice is most likely from') 53 | P a g e .y) result = cell(size(x. end end dispresult.Load all the wave files to database (codebooks) % data = cell(size(name. mins = inf. temp = sum(min(dist{i}))/size(dist{i}.1:k).m function [result.1) dist{i} = disteusq(x{i}(:. data{i} = tempwav. k=size(x. 'k'].Load_data. index = i. tempwav = wavread(temp).:) '.

:)) mfcc.1).kmeans.'e0dD') % include log energy.fs. 0th cepstral coef.fs) % Calculate mfcc's with a frequency(fs) and store in ceptral cell.FL.kmeans.fl. end kmean.j{i} train.x{i}] = kmeans(x{i}(:.1:12).x.nc. for i = 1:size(x.fs.NC.W.1).inc.y.'x').m function [data] = kmean(x.n.disp(x(z. end data = train. for i = 1:size(x.1).w. train.1).x = cell(size(x.fs) % calculate mel cepstrum with 12 coefs.p.N.1). 256 sample frames % c=melcepst(s.C).esql = cell(size(x.kmeans.kmeans.m function c=melcepst(s.FS.1).fh) %MELCEPST Calculate the mel cepstrum of a signal C=(S.1).kmeans.j = cell(size(x.m function [cepstral] = mfcc(x.1) [train.C) % Calculate k-means for x with C number of centroids train.1) disp(y(i.fs. melcepst.P.:)) cepstral{i} = melcepst(x{i}.FH) % % % Simple use: c=melcepst(s. Display % y at a time when x is calculated cepstral = cell(size(x. train.kmeans.1).INC. delta and delta-delta coefs % % Inputs: % s speech signal 54 | P a g e .

% fs sample rate in Hz (default 11025) % nc number of cepstral coefficients excluding 0'th coefficient (default 12) % n length of frame in samples (default power of 2 < (0. end if nargin<5 p=floor(3*log(fs)).5) % % w any sensible combination of the following: % % 'R' rectangular window in time domain % 'N' Hanning window in time domain % 'M' Hamming window in time domain (default) % % 't' triangular shaped filters in mel domain (default) % 'n' hanning shaped filters in mel domain % 'm' hamming shaped filters in mel domain % % 'p' filters act in the power domain % 'a' filters act in the absolute magnitude domain (default) % % '0' include 0'th order cepstral coefficient % 'E' include log energy % 'd' include delta coefficients (dc/dt) % 'D' include delta-delta coefficients (d^2c/dt^2) % % 'z' highest and lowest filters taper down to zero (default) % 'y' lowest filter remains at 1 down to 0 frequency and % highest filter remains at 1 up to nyquist freqency % % If 'ty' or 'ny' is specified. the total power in the fft is preserved. % % Outputs: c mel cepstrum output: one frame per row. if requested. end if nargin<4 nc=12. end if nargin<6 n=pow2(floor(log2(0. if nargin<8 55 | P a g e .03*fs)) % p number of filters in filterbank (default: floor(3*log(fs)) = approx 2. is the % first element of each row followed by the delta and then the delta-delta % coefficients. end if nargin<3 w='M'. end if nargin<9 fh=0. % if nargin<2 fs=11025.5. Log energy.03*fs))).1 per ocatave) % inc frame increment (default n/2) % fl low end of the lowest filter as a fraction of fs (default = 0) % fh high end of highest filter as a fraction of fs (default = 0.

').*conj(f(a:b. else ath=sqrt(pth). end c=rdct(y).:)). if any(w=='p') y=log(max(m*pw. if nargin<7 inc=floor(n/2).n.w). end if any(w=='E') c=[log(sum(pw)).fl. end if any(w=='R') z=enframe(s. [m.fl=0. else z=enframe(s.hamming(n). end if ~any(w=='0') c(:.inc).n.a. pth=max(pw(:))*1E-20.fh.:). elseif any (w=='N') z=enframe(s.:)). end f=rfft(z.' c].b]=melbankm(p.pth)). end % calculate derivative 56 | P a g e . pw=f(a:b.nc+1:end)=[]. nc=nc-1. y=log(max(m*abs(f(a:b. end end end if isempty(w) w='M'.inc). nf=size(c.hanning(n).nc-p)].1). nc=nc+1.'.inc). nc=nc+1.ath)). elseif p<nc c=[c zeros(nf. if p>nc c(:.fs.1)=[].

ww=ones(5. end 57 | P a g e .nc]=size(c).vx(:)). af=(1:-1:-1)/2.nf+8. ylabel('Mel-cepstrum coefficient').1). vx([1 nf+2].:)=[].ci.1.:).1.1). ax=reshape(filter(af. vx(1:8. t=((0:nf-1)*inc+(n-1)/2)/fs.cx(:)).nc). cx=[c(ww.'). map = (0:63)'/63.:)].:)=[]. imh = imagesc(t. vx=reshape(filter(vf. c.nc). c(nf*ww. end if nargout<1 [nf.:)=[]. ww=ones(4.cx(:)). axis('xy').nf+10. c(nf*ww. end elseif any(w=='d') vf=(4:-1:-4)/60. vx=reshape(filter(vf.if any(w=='D') vf=(4:-1:-4)/60.c.nc). vx(1:8. else c=[c ax]. xlabel('Time (s)'). c=[c vx]. c.1.:)=[]. ci=(1:nc)-any(w=='0')-any(w=='E').:). ax(1:2. if any(w=='d') c=[c vx ax]. colormap([map map map]).:)].nf+2. colorbar. cx=[c(ww.

Sign up to vote on this title
UsefulNot useful