Speaker Recognition

REAL TIME SPEAKER RECOGNITION
Topics
1. 2. 3. 4. 5. 6. 7. 8. 9. 10.
Introduction Methodology Identification Background Speech Feature Extraction Vector Quantization K-Means Euclidean Distance Speaker matching Applications Limitations
Introduction
Voice Basic identification parameter for human being Comparing a speech signal from an unknown speaker to a database of known speakers to recognize who is speaking.
Methodology
Identification Background
DSP fundamental
Human speech production model
Human Speech Production Model
Speech feature extraction

Discriminate between speakers while being tolerant of intra-speaker variability.
Easy to measure.
Stable over time.

Occur naturally and frequently in speech. Change little from one speaking environment to
another. Not be susceptible to mimicry.
Speaker Identification
Speaker Enrollment : A speaker database is created by collecting speech samples
Speaker Identification: Test sample is compared
against speaker database
Enrollment Phase
Identification Phase
Steps Involved in the Project

Record the test sound through microphone. Convert the recorded sound into .wav format. Load recorded sound files from database. Extract features from test file for recognition. Extract features from all the files stored in the database.
What is Clustering?
Clustering deals with finding a structure in a
collection of unlabeled data.

A loose definition of clustering could be the process
of organizing objects into groups whose members are similar in some way.
A cluster is therefore a collection of objects which are
similar between them and are dissimilar to the objects belonging to other clusters.
Conceptual clustering
Two or more objects belong to the same cluster if this
one defines a concept common to all that objects.

In other words, objects are grouped according to their
fit to descriptive concepts, not according to simple similarity measures. K-means is an exclusive clustering algorithm
Simple graphical example

In this case we easily identify the 4 clusters into which
the data can be divided; the similarity criterion is distance: two or more objects belong to the same cluster if they are close according to a given distance (in this case geometrical distance).
This is called distance-based clustering.
Simple graphical example
Vector Quantization
Producing a small set of vectors from a large set of
feature vectors that represent the centroids

Set of centroids is codebook
Achieved by K-means algorithm
VQ for two speakers
K-means
A clustering algorithm
Chooses M cluster-centroids among T feature vectors Each feature vector is assigned to nearest centroid.
K means clustering
The procedure follows a simple and easy way to classify
a given data set through a certain number of clusters (assume k clusters) fixed a priori. The main idea is to define k centroids, one for each cluster. These centroids should be placed in a cunning way because of different location causes different result. So, the better choice is to place them as much as possible far away from each other.
K means clustering continued

After we have these k new centroids, a new binding has to
be done between the same data set points and the nearest new centroid. A loop has been generated. As a result of this loop we may notice that the k centroids change their location step by step until no more changes are done. In other words centroids do not move any more. Finally, this algorithm aims at minimizing an objective function, in this case a squared error function. The objective function.
Distance Measure
An important component of a clustering algorithm is
the distance measure between data points. If the components of the data instance vectors are all in the same physical units then it is possible that the simple Euclidean distance metric is sufficient to successfully group similar data instances. However, even in this case the Euclidean distance can sometimes be misleading.
MINKOWSKI METRIC
For higher dimensional data, a popular measure is the
Minkowski metric,
where d is the dimensionality of the data. The Euclidean distance is a special case where p=2,
while Manhattan metric has p=1. However, there are no general theoretical guidelines for selecting a measure for any given application.
The algorithm is composed of the following steps:

1.
Place K points into the space represented by the objects that are being clustered. These points represent initial group centroids.
Assign each object to the group that has the closest centroid. When all objects have been assigned, recalculate the positions of the K centroids.
2. 3.
4.
Repeat Steps 2 and 3 until the centroids no longer move. This produces a separation of the objects into groups from which the metric to be minimized can be calculated.
Speaker Matching
Matching score is computed between extracted feature vectors
and every speaker codebook enrolled in the system

Euclidean distance is calculated for matching score It is the sum of squared distances between vector and its
representative (centroid).
Speaker Matching contd

Find out the centroids of the test file by any clustering
algorithm. Find out the centroids of the sample files stored in database so that codebooks can be generated. Calculate the Euclidean distance between the test file and individual samples of the database. Find out the sample having minimum distance with the test file. The sample corresponding to the minimum distance is most likely the author of the test sound.
Mel-Frequency cepstrum (MFC)

In sound processing, the mel-frequency cepstrum (MFC) is a
representation of the short-term power spectrum of a sound, based on a linear cosine transform of a log power spectrum on a nonlinear mel scale of frequency.
Mel-frequency cepstral coefficients (MFCCs) are coefficients
that collectively make up an MFC. They are derived from a type of cepstral representation of the audio clip (a nonlinear "spectrumof-a-spectrum").
The difference between the cepstrum and the mel-frequency
cepstrum is that in the MFC, the frequency bands are equally spaced on the mel scale, which approximates the human auditory system's response more closely than the linearly-spaced frequency bands used in the normal cepstrum.
MFCCs are commonly derived as follows:

1. 2. 3. 4.
5.
Take the Fourier transform of (a windowed excerpt of) a signal. Map the powers of the spectrum obtained above onto the mel scale, using triangular overlapping windows Take the logs of the powers at each of the mel frequencies. Take the discrete cosine transform of the list of mel log powers, as if it were a signal. The MFCCs are the amplitudes of the resulting spectrum.
Block diagram for calculating MFCC
MFCC
MFCC values are not very robust in the presence of
additive noise, and so it is common to normalise their values in speech recognition systems to lessen the influence of noise. Some researchers propose modifications to the basic MFCC algorithm to improve robustness - e.g. by raising the log-mel-amplitudes to a suitable power (around 2 or 3) before taking the DCT, which reduces the influence of low-energy components.[7]
Applications
Security system for lockers Proofing Identity Security of confidential data Password for computers Pin code for ATM cards
Limitations
Large speaker database Time constraint Not always accurate

Speaker Recognition

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Speaker Recognition

Uploaded by

Copyright:

Available Formats

REAL TIME SPEAKER RECOGNITION

Human speech production model

Human Speech Production Model

Speech feature extraction

Stable over time.

another. Not be susceptible to mimicry.

against speaker database

Steps Involved in the Project

collection of unlabeled data.

one defines a concept common to all that objects.

Simple graphical example

Simple graphical example

feature vectors that represent the centroids

Achieved by K-means algorithm

VQ for two speakers

K means clustering continued

The algorithm is composed of the following steps:

and every speaker codebook enrolled in the system

Speaker Matching contd

Mel-Frequency cepstrum (MFC)

MFCCs are commonly derived as follows:

Block diagram for calculating MFCC

You might also like