Yesno Classification - Info

YES OR NO CLASSIFICATION
The customer service section of the company wants to create an automatic telephone operator
system that asks some questions, which can be answered by saying either yes or no and classifies
people's answers (voices) behind the telephone saying "YES" or "NO".
The main steps are as follows:
• Generally look at the problem and state it

• Get the data
• Load the data with scipy library
• Add labels and split train-test
• Implement Classifiers and evaluate the models
Before going furthure, let's install any required packages and libraries if not installed on our
system. Simply execute the cell below. It will do the rest for you!
************
If the cell did not work, we install the packages manually using cmd and pip command.
Get the data

We are given a dataset containing prerecorded "YES"/"NO" wavefiles, and we are asked to
build a classifier with to detect people's answers saying either yes or no.
The dataset has been formatted as .wav files with 16 kHz sampling rate, so first of all we're
going to get the array of each instance and then computing Fast Fourier Transform (fft) of
them for model input.
Dataset: Extract the .ZIP file and give directories <.\dyes> and <.\dno>.
********
Import the libraries:
********
Load the Data

The function prepare_the_data will get the wavefiles in the given directory, and reads each of
the wave files (voices yes/no) returning two values, the signal and the sampling frequency. The
function will then take the fourier transform of these signals individually.
In mathematics, a Fourier transform is a mathematical transform that decomposes functions

depending on space or time into functions depending on spatial or temporal frequency, such as
the expression of a musical chord in terms of the volumes and frequencies of its constituent
notes. Simply, the fourier transform will conert the signal from time-domain to frequency-
domain.
It is not necessary to understand how fourier transform works. We should only know two things:
• First, that the fourier transform will give us an array containing real and imaginary
values (complex numbers). We only need to work with the magnitude of the fourier
transform and not with the phase which are calculated as follows:
• Second, the output of the fourier transform is a symmetric array, that means the first half
and the second half of the array are the same and we can delete the second half.
The output will be like:
• a and b are real numbers and i equals to √(−1)

the magnitude and phase are computed as follows:
Assume we have
the Magnitude is:
and the phase is:
First, let's see how it works and see the plot of the signal:
Using scipy we can implement the fourier transform with the class fft in fftpack. fft is the Fast
Fourier Transform, which is an algorithm that computes the discrete Fourier transform (DFT) of
a sequence, or its inverse (IDFT).
We can read the documentation here:
• fft documentation
Execute the cell below to see the shape of the signal and it's fourier transform.
*********
We can plot the signal by executing the cell below:
*********
We can plot the signal's Fourier Transform by executing the cell below:
*********
Function
This is the function that loads the data according what we have explained earlier.
To calculate the magnitude of the fourier transform, we use abs() function at the end of the
function
********
Add Labels and Split to train-test set

Using the function that we have provided, we load both yes and no data voices into their
variables. Furthermore, we have to create another array representing the labels. (we can set 1 to
"YES" and zero to "NO" as it is a binary classification problem)
Using the functions below:
• np.ones()
• np.zeros()
*********
Shuffling The Data Instances and Then Split It Into Train and Test Sets
Now that our features and labels are ready, we have to split it into train and test sets before any
further look into the data.
*********
Implement Models and Evaluate them

From now on, we have to do several tasks to complete the project:
1. Implement the classifiers on the data (Logistic Regression, SGDClassifier, SVM, KNN)
2. Evaluate the models (we use crossvalidation with 3 folds)
o Confusion matrix for each model
o Plot the precision-recall for the performance of each model
o Calculate the f1 score for each model
o Plot the roc curve as well for all models in one figure
3. Function to find threshold for a specified recall level
OUTPUTS:
print(X_train.shape, X_test.shape, y_train.shape, y_test.shape)

(3404, 8000) (851, 8000) (3404,) (851,)
Knn score:
0.8296122209165687
Confusion Matrix:
array([[1318, 384],
[ 294, 1408]], dtype=int64)
f1_score:
0.8059530623926732
Svm score:
0.881316098707403
Confusion Matrix:
array([[1627, 75],
[ 306, 1396]], dtype=int64)
f1_score:
0.8799243618027105
Linear Regression score:

0.8918918918918919
Confusion Matrix:
array([[1519, 183],
[ 159, 1543]], dtype=int64)
f1_score:
0.9002333722287048
Sgd score:
0.9024676850763808
Confusion Matrix:
array([[1541, 161],
[ 147, 1555]], dtype=int64)
f1_score:
0.9098888238736103

Yesno Classification - Info

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Yesno Classification - Info

Uploaded by

Copyright:

Available Formats

YES OR NO CLASSIFICATION

The main steps are as follows:

• Generally look at the problem and state it

Get the data

Load the Data

In mathematics, a Fourier transform is a mathematical transform that decomposes functions

The output will be like:

• a and b are real numbers and i equals to √(−1)

the Magnitude is:

and the phase is:

We can read the documentation here:

We can plot the signal by executing the cell below:

Add Labels and Split to train-test set

Using the functions below:

Implement Models and Evaluate them

print(X_train.shape, X_test.shape, y_train.shape, y_test.shape)

Linear Regression score:

You might also like