You are on page 1of 7

YES OR NO CLASSIFICATION

The customer service section of the company wants to create an automatic telephone operator
system that asks some questions, which can be answered by saying either yes or no and classifies
people's answers (voices) behind the telephone saying "YES" or "NO".

The main steps are as follows:

• Generally look at the problem and state it


• Get the data
• Load the data with scipy library
• Add labels and split train-test
• Implement Classifiers and evaluate the models

Before going furthure, let's install any required packages and libraries if not installed on our
system. Simply execute the cell below. It will do the rest for you!

************

If the cell did not work, we install the packages manually using cmd and pip command.

Get the data


We are given a dataset containing prerecorded "YES"/"NO" wavefiles, and we are asked to
build a classifier with to detect people's answers saying either yes or no.

The dataset has been formatted as .wav files with 16 kHz sampling rate, so first of all we're
going to get the array of each instance and then computing Fast Fourier Transform (fft) of
them for model input.

Dataset: Extract the .ZIP file and give directories <.\dyes> and <.\dno>.
YES OR NO CLASSIFICATION

********
Import the libraries:
********

Load the Data


The function prepare_the_data will get the wavefiles in the given directory, and reads each of
the wave files (voices yes/no) returning two values, the signal and the sampling frequency. The
function will then take the fourier transform of these signals individually.

In mathematics, a Fourier transform is a mathematical transform that decomposes functions


depending on space or time into functions depending on spatial or temporal frequency, such as
the expression of a musical chord in terms of the volumes and frequencies of its constituent
notes. Simply, the fourier transform will conert the signal from time-domain to frequency-
domain.

It is not necessary to understand how fourier transform works. We should only know two things:

• First, that the fourier transform will give us an array containing real and imaginary
values (complex numbers). We only need to work with the magnitude of the fourier
transform and not with the phase which are calculated as follows:

• Second, the output of the fourier transform is a symmetric array, that means the first half
and the second half of the array are the same and we can delete the second half.

The output will be like:

• a and b are real numbers and i equals to √(−1)


the magnitude and phase are computed as follows:

Assume we have
YES OR NO CLASSIFICATION

the Magnitude is:

and the phase is:

First, let's see how it works and see the plot of the signal:

Using scipy we can implement the fourier transform with the class fft in fftpack. fft is the Fast
Fourier Transform, which is an algorithm that computes the discrete Fourier transform (DFT) of
a sequence, or its inverse (IDFT).

We can read the documentation here:

• fft documentation

Execute the cell below to see the shape of the signal and it's fourier transform.

*********

We can plot the signal by executing the cell below:

*********

We can plot the signal's Fourier Transform by executing the cell below:

*********

Function

This is the function that loads the data according what we have explained earlier.

To calculate the magnitude of the fourier transform, we use abs() function at the end of the
function

********
YES OR NO CLASSIFICATION

Add Labels and Split to train-test set


Using the function that we have provided, we load both yes and no data voices into their
variables. Furthermore, we have to create another array representing the labels. (we can set 1 to
"YES" and zero to "NO" as it is a binary classification problem)

Using the functions below:

• np.ones()
• np.zeros()

*********

Shuffling The Data Instances and Then Split It Into Train and Test Sets

Now that our features and labels are ready, we have to split it into train and test sets before any
further look into the data.

*********

Implement Models and Evaluate them


From now on, we have to do several tasks to complete the project:

1. Implement the classifiers on the data (Logistic Regression, SGDClassifier, SVM, KNN)
2. Evaluate the models (we use crossvalidation with 3 folds)
o Confusion matrix for each model
o Plot the precision-recall for the performance of each model
o Calculate the f1 score for each model
o Plot the roc curve as well for all models in one figure
3. Function to find threshold for a specified recall level
YES OR NO CLASSIFICATION

OUTPUTS:
YES OR NO CLASSIFICATION
YES OR NO CLASSIFICATION

print(X_train.shape, X_test.shape, y_train.shape, y_test.shape)


(3404, 8000) (851, 8000) (3404,) (851,)

Knn score:
0.8296122209165687

Confusion Matrix:
array([[1318, 384],
[ 294, 1408]], dtype=int64)
f1_score:
0.8059530623926732

Svm score:
0.881316098707403

Confusion Matrix:
array([[1627, 75],
[ 306, 1396]], dtype=int64)

f1_score:
0.8799243618027105

Linear Regression score:


0.8918918918918919

Confusion Matrix:
array([[1519, 183],
[ 159, 1543]], dtype=int64)

f1_score:
0.9002333722287048

Sgd score:
0.9024676850763808

Confusion Matrix:
array([[1541, 161],
[ 147, 1555]], dtype=int64)

f1_score:
0.9098888238736103

You might also like