Professional Documents
Culture Documents
Yesno Classification - Info
Yesno Classification - Info
The customer service section of the company wants to create an automatic telephone operator
system that asks some questions, which can be answered by saying either yes or no and classifies
people's answers (voices) behind the telephone saying "YES" or "NO".
Before going furthure, let's install any required packages and libraries if not installed on our
system. Simply execute the cell below. It will do the rest for you!
************
If the cell did not work, we install the packages manually using cmd and pip command.
The dataset has been formatted as .wav files with 16 kHz sampling rate, so first of all we're
going to get the array of each instance and then computing Fast Fourier Transform (fft) of
them for model input.
Dataset: Extract the .ZIP file and give directories <.\dyes> and <.\dno>.
YES OR NO CLASSIFICATION
********
Import the libraries:
********
It is not necessary to understand how fourier transform works. We should only know two things:
• First, that the fourier transform will give us an array containing real and imaginary
values (complex numbers). We only need to work with the magnitude of the fourier
transform and not with the phase which are calculated as follows:
• Second, the output of the fourier transform is a symmetric array, that means the first half
and the second half of the array are the same and we can delete the second half.
Assume we have
YES OR NO CLASSIFICATION
First, let's see how it works and see the plot of the signal:
Using scipy we can implement the fourier transform with the class fft in fftpack. fft is the Fast
Fourier Transform, which is an algorithm that computes the discrete Fourier transform (DFT) of
a sequence, or its inverse (IDFT).
• fft documentation
Execute the cell below to see the shape of the signal and it's fourier transform.
*********
*********
We can plot the signal's Fourier Transform by executing the cell below:
*********
Function
This is the function that loads the data according what we have explained earlier.
To calculate the magnitude of the fourier transform, we use abs() function at the end of the
function
********
YES OR NO CLASSIFICATION
• np.ones()
• np.zeros()
*********
Shuffling The Data Instances and Then Split It Into Train and Test Sets
Now that our features and labels are ready, we have to split it into train and test sets before any
further look into the data.
*********
1. Implement the classifiers on the data (Logistic Regression, SGDClassifier, SVM, KNN)
2. Evaluate the models (we use crossvalidation with 3 folds)
o Confusion matrix for each model
o Plot the precision-recall for the performance of each model
o Calculate the f1 score for each model
o Plot the roc curve as well for all models in one figure
3. Function to find threshold for a specified recall level
YES OR NO CLASSIFICATION
OUTPUTS:
YES OR NO CLASSIFICATION
YES OR NO CLASSIFICATION
Knn score:
0.8296122209165687
Confusion Matrix:
array([[1318, 384],
[ 294, 1408]], dtype=int64)
f1_score:
0.8059530623926732
Svm score:
0.881316098707403
Confusion Matrix:
array([[1627, 75],
[ 306, 1396]], dtype=int64)
f1_score:
0.8799243618027105
Confusion Matrix:
array([[1519, 183],
[ 159, 1543]], dtype=int64)
f1_score:
0.9002333722287048
Sgd score:
0.9024676850763808
Confusion Matrix:
array([[1541, 161],
[ 147, 1555]], dtype=int64)
f1_score:
0.9098888238736103