Professional Documents
Culture Documents
shiva.srkr@gmail.com
Abstract:
Now a day’s the person’s gender has become very important in the economic markets in the type of advertisements. The objective of this
project is to design a system that determines the speaker gender using the pitch of the speaker's voice. Identifying the gender from the
properties of voice data set i.e., pitch, median, frequency etc. can be possible by using machine learning. In this project, we are trying to
classify gender into male or female based on the dataset containing various attributes related to voice like pitch, frequency etc. Data pre-
processing steps should be performed to find the gender classification of voice data by using algorithms of machine learning. The
proposed system can be used to find the best algorithm among K-nearest neighbors (KNN), Random Forest, Logistic Regression, Decision
Tree, support vector machine and gradient boosting to detect the gender of the speaker with maximum possible efficiency and accuracy.
Keywords: Data pre-processing, gender classification, K-nearest neighbors (KNN), Random Forest, Logistic Regression, Decision Tree,
support vector machine and gradient boosting.
© 2020 by Advance Scientific Research. This is an open-access article under the CC BY license (http://creativecommons.org/licenses/by/4.0/)
DOI: http://dx.doi.org/10.31838/jcr.07.09.222
recognize voice of a gender. The model shows that acoustic Priyankamakwana et al [12] proposed that a human can easily
properties of voice and speech can be used to detect the gender. identify a gender male or female by voice but is very difficult for
An MLP deep learning algorithm has been used to obtain the a computer to identify the gender by voice. So, there should be
model for classification from dataset which have parameters of special learning or training like providing inputs, methodology
voice samples and proposed the model that achieves 96.74% for a computer that task.
accuracy.
Jerzy SAS, Alexander et al [13] proposed a gender recognition
Chiu Ying Hay, Ng Hian James Et Al [8] proposed a gender using ASR techniques and neural networks. It presents technique
classification system that can be used to identify the gender by to gender by using MFCC features. The speech signal is divided
analyzing voice samples. It analyses using a pitch detection into 20 Ms frames. Mel-Frequency Cepstral Coefficients are
algorithm. To process voice signal time-domain or frequency- extracted for each frame and the created feature vector is fetched
domain approaches are used. Gender of a voice sample can be into a neural network classifier, which classifies each frame as
determined using a simple weight scoring algorithm. male or female.
Kavitha Yadav, Moreshmukhedkar et al [9] proposed a MFCC Soonil kwon, Guiyoung son, and Neungsoo et al [14] proposed
Based Speaker Recognition using Mat lab. The proposed system Recurrent Neural Networks technique to classify Gender based
comprises of Speaker Identification and Speaker Verification. on the Non-Lexical Cues of Emergency calls. There are many
Pitch Detection Algorithm (PDA) is a set of steps used to detect researchers have been performed in the last two decades but still
pitch of speech signal. Feature Extraction and Feature matching need to improve. Recurrent neural networks and SVM are the
are two important modules in Gender Identification. For feature two machine learning methods used to classify the gender.
extraction MFCC, PLP methods are used for Feature matching
Dynamic time warping is used. Hadi Hard, Limening Chen et al [15] have done a research on
gender identification based on speech signal. Different
Bhagya Laxmi Jena, Beda Prakash and Panigrahi et al [10] have parameters of voice samples are analyzed to predict gender of
done their research in Gender classification by pitch analysis. It speaker in gender classification by speech analysis. The fusion of
mainly concentrates on speech signals pitch analysis. When features and classifiers performs better than any individual
compared average pitch value of male and average pitch value of classifier.
female are different. Comparison of different pitch values of male
and female voice samples are included in analysis. SYSTEM ARCHITECTURE
System architecture describes the overall structure of the system
T. Jayasankar, K. Vinothkumar, Arputha Vijayaselvi et al [11] and the way in which the structure provides integrity. From
proposed an automatic gender classification system to determine figure 1 initially the voice dataset is preprocessed using
the gender through speech signal. There are two levels to techniques like normalization. Now the preprocessed data is
generate gender recognition system namely front-end and back- spitted into training and testing data in the ratio of 80% and 20%
end. Set of vectors known as feature energy entropy (EE), zero respectively. The trained model is now tested by using the
crossing rate (ZCR) and short time energy (STE) are represented testing dataset to calculate the evaluation metrics like accuracy,
in front end. The back-end is a classifier. precision, Recall, F1score, Cohen Kappa Score. Then Compare the
accuracies produced by the different trained models and best
accuracy produced model is chosen best for classification.
ALGORITHMS
SVM
KNN
Decision Tree
Logistic Regression
Random Forest
Gradient Boosting
Trained
Model
METHODOLOGIES attributes. The attributes are sd, meanfreq, median, Q25, Q75,
Data Set: The voice dataset contains different properties which IQR, skew, Kurt, sp.ent, sfm, mode, centroid, meanfun, minfun,
are used to classify the gender into male and female. Training maxfun, meandom, maxdom, dfrange, modindx, label. The target
dataset consists of nearly 3000 voice samples with different attribute is label which has male and female values.
parameters and those are collected from kaggle.com. It has 21
Algorithms
Voice Data set Data Pre-processing
SVM
KNN
Decision tree
Random forest
Data Pre-processing Logistic regression
Gradient Boosting
Data Pre-processing: As shown in Fig 2, a data set was taken ns =Object for numpy
and pre-processed using techniques like data cleaning, data ns.min (xdata) =minimum value for particular row in dataset.
transformation and data reduction. The pre-processed data set np.max (xdata) =maximum value for particular row in dataset
was sent into different machine learning algorithms which
results outputs (either male or female). We consider the best Model Testing: After pre-processing the dataset will split into
accurate output as a final result. Data pre-processing is 80% which is used for training and 20% is used for testing. The
performed before storing the data in the database. It is a process trained model is now tested against the testing data set. Then
in which missing values are filled and avoid noisy data compare the results from different trained models and choose
(irrelevant data). Also, we are required to do scaling on numeric the best model. The present work intends to find best solution to
columns. To perform scaling Normalization is used on feature predict the gender by using machine learning techniques.
columns.
Histogram: This project contains 21 attributes such as
Normalization: It is a process used to change values which are meanfreq, sd, median, Q25, Q75, IQR, skew, Kurt, sp.ent, sfm,
in particular range of zero to one in dataset. mode, centroid, meanfun, minfun, maxfun, meandom, maxdom,
dfrange, modindx. Each attribute maintains a different
x= (xdata-ns.min (xdata))/ ((ns.max (xdata)) histogram. A Histogram consists of rectangles whose area is
proportional to the frequency of a variable and whose width is
Whereas xdata=Feature columns in the dataset. equal to the class interval as represented in the figure 3.
IMPLEMENTATION groups of data. The attribute values are split in the xy plane by
Algorithms Used: taking IQR as x-axis and mean fun as y-axis then the trained svm
SVM: model finds the best hyper plane. Here the training dataset is
In SVM the data points are split in the x-y axis. Then find the best trained by using model, and then the trained model is evaluated
hyper plane which has maximum margin i.e. distance to the against the testing data to calculate the evaluation metrics.
closet data point in each group should be maximum. As shown in
the fig 4, the line splits into the data between two different
Step1: In step1 packages are imported to perform Numerical calculations and visualization of graphs.
import numpy as ny
import pandas as ps
from sklearn.svm import SVC
from sklearn.model_selection import train_test_split
from sklearn metrics import accuracy_score, precision_score, recall_score, f1_score, cohen_kappa_score
Step 2: In step 2 pandas object pd is used to read the dataset into a variable named set. The function read_csv is used to read the dataset.
set = ps.read_csv("voicedataset.csv")
Step 3: In step 3 Categorical values are transformed to Numerical values in the label attribute through male and female which are taken
as ‘0’ and ‘1’ respectively.
Set. label = [1 if each == "female" else 0 for each in set. label]
Step 3.1: In step 3.1 divide the data into feature and target columns. y is the target column and xdata contains feature columns.
y =set.label.values
xdata = set.drop(["label"],axis=1)
Step 4: In step 4 Split the data into training data and testing data with testing data size is about 20% of the dataset. The function
train_test_split () divides the dataset into training data and testing data.
xtrain, xtest, ytrain, ytest = train_test_split (x,y,test_size=0.2,random_state = 30)
Step 5.2: Train the model with Train data.svm.fit () is used to train the model using training dataset.
svm.fit(xtrain, ytrain)
Step 6: Evaluate the performance of the Model. Calculate accuracy and other evaluation metrics.
f=svm.predict(xtest)
accscore = accuracy_score(ytest, f)
recscore = recall_score(ytest, f)
f1score = f1_score (ytest, f)
kappascore=cohen_kappa_score(ytest,f)
prescore = precision_score(ytest, f)
Step1: In step1 packages are imported to perform Numerical calculations and visualization of graphs.
import numpy as ns
import pandas as ps
from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import accuracy_score, precision_score, recall_score,f1_score, cohen_kappa_score
Step 2: In step 2 pandas object pd is used to read the dataset into a variable named data. The function read_csv is used to read the
dataset.
set = ps.read_csv("voicedataset.csv")
Step 3: In step 3 Categorical values are transformed to Numerical values in the label attribute through male and female which are taken
as ‘0’ and ‘1’ respectively.
set.label = [1 if each == "female" else 0 for each in set.label]
Step 3.1: In step 3.1 divide the data into feature and target columns.y is the target column and xdata contains feature columns.
y = set.label.values
xdata = set.drop(["label"],axis=1)
Step 4: In step 4 Split the data into training data and testing data with testing data size is about 20% of the dataset. The function
train_test_split () divides the dataset into training data and testing data.
xtrain, xtest, ytrain, ytest = train_test_split (x,y,test_size=0.2,random_state = 30)
Step 5.2: Train the model with Train data.dectree.fit () is used to train the model using training dataset.
dectree.fit(xtrain, ytrain)
Step 6: Evaluate the performance of the Model. Calculate accuracy and other evaluation metrics.
f=dec_tree.predict(xtest)
accscore = accuracy_score(ytest, f)
recscore = recall_score(ytest, f)
f1score = f1_score (ytest, f)
kappascore =cohen_kappa_score(ytest,f)
prescore = precision_score(ytest, f)
In these packages are imported to perform numerical It is a classification algorithm which is used to predict the
calculations and an object pandas is taken as pd to read the probability of a categorical dependent variable which contains 0
dataset. Here categorical values are transformed to numerical or 1. Here x-axis represents IQR and y-axis represents mean fun.
values through male and female which are taken as 0 or 1 First the dataset is loaded into a variable then dataset will split.
respectively. The data set is split into training and test datasets, The trained model is evaluated against the testing data to get
an object is created for decision tree algorithm to generate a metrics.
trained model. This trained model evaluates various scores when
compared with target values of test data. Algorithm for Logistic Regression
Input: Voice Dataset in csv Format.
LOGISTIC REGRESSION: Output: Trained Model with evaluation metrics like Accuracy,
precision, F1 score, Recall, Kappa score.
Step1: In step1 packages are imported to perform Numerical calculations and visualization of graphs.
import numpy as ns
import pandas as ps
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score, precision_score, recall_score,f1_score, cohen_kappa_score
Step 2: In step 2 pandas object pd is used to read the dataset into a variable named data. The function read_csv is used to read the
dataset.
set = ps.read_csv("voicedataset.csv")
Step 3: In step 3 Categorical values are transformed to Numerical values in the label attribute through male and female which are taken
as ‘0’ and ‘1’ respectively.
set.label = [1 if each == "female" else 0 for each in set.label]
Step 3.1: In step 3.1 divide the data into feature and target columns, y is the target column and x data contain feature columns.
y = set.label.values
xdata = set.drop(["label"],axis=1)
Step 4: In step 4 Split the data into training data and testing data with testing data size is about 20% of the dataset. The function
train_test_split () divides the dataset into training data and testing data.
xtrain, xtest, ytrain, ytest = train_test_split (x,y,test_size=0.2,random_state = 30)
Step 5.2: Train the model with Train data.logreg.fit () is used to train the model using training dataset.
logreg.fit(xtrain, ytrain)
Step 6: Evaluate the performance of the Model. Calculate accuracy and other evaluation metrics.
f=log_reg.predict(xtest)
accscore = accuracy_score(ytest, f)
recscore = recall_score(ytest, f)
f1score = f1_score (ytest, f)
kappascore =cohen_kappa_score(ytest,f)
precision = precision_score(ytest, f)
Step1: In step1 packages are imported to perform Numerical calculations and visualization of graphs.
import numpy as ns
import pandas as ps
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score, cohen_kappa_score
Step 2: In step 2 pandas object pd is used to read the dataset into a variable named data. The function read_csv is used to read the
dataset.
set = ps.read_csv("voicedataset.csv")
Step 3: In step 3 Categorical values are transformed to Numerical values in the label attribute through male and female which are taken
as ‘0’ and ‘1’ respectively.
set.label = [1 if each == "female" else 0 for each in set.label]
Step 3.1: In step 3.1 divide the data into feature and target columns.y is the target column and xdata contains feature columns.
y = set.label.values
xdata = set.drop(["label"],axis=1)
Step 3.2: In step 3.2 perform Normalization to the Feature columns.
x = (xdata - ns.min(xdata)) / (ns.max(xdata)) values
Step 4: In step 4 Split the data into training data and testing data with testing data size is about 20% of the dataset. The function
train_test_split () divides the dataset into training data and testing data.
xtrain, xtest, ytrain, ytest = train_test_split (x,y,test_size=0.2,random_state = 30)
Step 6: Evaluate the performance of the Model. Calculate accuracy and other evaluation metrics.
f=rand_forest.predict(xtest)
accscore = accuracy_score(ytest, f)
recscore = recall_score(ytest, f)
f1score = f1_score (ytest, f)
kappascore=cohen_kappa_score(ytest,f)
prescore = precision_score(ytest, f)
Step1: In step1 packages are imported to perform Numerical calculations and visualization of graphs.
import numpy as ns
import pandas as ps
from sklearn.neighbors import KNeighborsClassifier
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score, precision_score, recall_score,f1_score, cohen_kappa_score
Step 2: In step 2 pandas object pd is used to read the dataset into a variable named data. The function read_csv is used to read the
dataset
set = ps.read_csv("voicedataset.csv")
Step 3: In step 3 Categorical values are transformed to Numerical values in the label attribute through male and female which are taken
as ‘0’ and ‘1’ respectively.
set.label = [1 if each == "female" else 0 for each in set.label]
Step 3.1: In step 3.1 divide the data into feature and target columns, y is the target column and x data contain feature columns.
y = set.label.values
xdata = set.drop(["label"],axis=1)
Step 4: The function train_test_split () divides the dataset into training data and testing data.
Step 5.2: Train the model with Train data.knn.fit () is used to train the model using training dataset.
knn.fit (xtrain, ytrain)
Step 6: Evaluate the performance of the Model. Calculate accuracy and other evaluation metrics.
f=knn.predict(xtest)
accscore = accuracy_score(ytest, f)
recscore = recall_score(ytest, f)
f1score = f1_score (ytest, f)
kappascore=cohen_kappa_score(ytest,f)
prescore = precision_score(ytest, f)
Step1:
In step1 packages are imported to perform Numerical calculations and visualization of graphs.
import numpy as ns
import pandas as ps
from sklearn.model_selection import train_test_split
from sklearn.ensemble import GradientBoostingClassifier
from sklearn.metrics import accuracy_score, precision_score, recall_score,f1_score, cohen_kappa_score
Step 2: In step 2 pandas object pd is used to read the dataset into a variable named data. The function read_csv is used to read the
dataset.
set = ps.read_csv("voicedataset.csv")
Step 3: In step 3 Categorical values are transformed to Numerical values in the label attribute through male and female which are taken
as ‘0’ and ‘1’ respectively.
set.label = [1 if each == "female" else 0 for each in set.label]
Step 3.1: In step 3.1 divide the data into feature and target columns, y is the target column and x data contain feature columns.
y = set.label.values
xdata = set.drop(["label"],axis=1)
Step 4: In step 4 Split the data into training data and testing data with testing data size is about 20% of the dataset. The function
train_test_split () divides the dataset into training data and testing data.
xtrain, xtest, ytrain, ytest = train_test_split (x,y,test_size=0.2,random_state = 30)
Step 6: Evaluate the performance of the Model. Calculate accuracy and other evaluation metrics.
f=gbc.predict(xtest)
accscore = accuracy_score (ytest, f)
recscore = recall_score (ytest, f)
f1score = f1_score (ytest, f)
kappascore =cohen_kappa_score(ytest,f)
prescor = precision score (ytest, f)
In these packages are imported to perform numerical (P), false positive (Q), true negative (R) and false negative (S)
calculations and an object pandas is taken as pd to read the values.
dataset. Here categorical values are transformed to numerical Accuracy: It is the ratio of the number of correct predictions to
values through male and female which are taken as 0 or 1 the total number of input samples.
respectively. An object is created for boosting algorithm to Accuracy= (P + R)/ (P+R+S+Q)
generate a trained model. This trained model evaluates various Precision: It provides how relevant the positive detections are.
scores when compared with target values of test data. Precision=P/ (P+Q)
Recall: It is the number of correct results divided by the number
RESULTS& ANALYSIS of results that should have been returned.
The trained models are tested using testing data and gives Recall=P/ (P+S)
accuracy, precision. Kappa score, f1score, recall as output and the F1 score: It is a measure of test accuracy the f1 score is defined
best model is chosen from them based on accuracy. as the weighted harmonic mean of the test’s precision and recall.
F1 score=2 P / (2P+Q+S)
Confusion Matrix: It is a table with n rows and n columns Kappa score: It is a static used to measure inter-ratter
where n represents the no. of target classes. It is used to evaluate reliability.
the performance of a classification model by using True positive Kappa score= (Total Accuracy-Random
Accuracy)/ (1-Random Accuracy)
Fig. 8. Confusion matrix for KNN Fig. 9. Confusion matrix for Logistic regression
In the figure 8, true positive values are 331, false negative are 54, and false positives are 37 and true negative are 212 and in the figure 9,
true positive values are 341, false negative are 44, and false positives are 22 and true negative 227.
Fig. 10. Confusion matrix for SVM Fig .11. Confusion matrix for Decision tree
In the figure 10, true positive values are 335, false negative are 50, and false positives are 9 and true negative 240 and in the figure 11,
true positive values are 328, false negative are 57, and false positives are 14 and true negative 235.
Fig 12. Confusion matrix for Random forest Fig 13. Confusion matrix for Gradient Boosting
In the figure 12 true positive values are 332, false negative is 53, As show in Table.1, gradient boosting algorithm performs a
and false positives are 11 and true negative 238 and in the above better performance when compared to all other remaining
figure 13, true positive values are 334, false negative are 51, and algorithms. So, ensemble algorithms work better for
false positives are 16 and true negative 233. classification.
Scale: on x-axis 1cm=1 unit on y-axis 1cm=0.05unit 7. Mucahit Buyukyilmaz, Ali Osman Cibikdiken (2016) "Voice
In figure 15, accuracy is plotted as red colour, precision is Gender Recognition Using Deep Learning ", 2016
plotted as green colour, recall is plotted as blue colour, F1 score International Conference on Modelling, Simulation and
is plotted as yellow colour and Kappa is plotted as black colour. Optimization Technologies and Applications ,volume 58,
By training and testing the six algorithms KNN, Random Forest, pp.409-411.
Gradient Boosting, SVM, Decision Tree, and Logistic Regression 8. chiuYing Hay, NgHianjames(2015)" Gender Classification
with voice dataset. Gradient Boosting performs better than the from speech", International Journal of Science and Research
other algorithms based on the evaluation metric accuracy. Hence (IJSR),pp.2109-2112.
ensemble methods are best for classification problems. 9. Kavitha Yadav, Moreshmukhedkar (2014)"MFCC Based
Speaker Recognition using matlab", International Journal of
CONCLUSION VLSI and embedded systems (IJVES), Volume 05, pp .1011-
Classifying gender using voice dataset will be on top of the list of 1015.
uses for machine learning algorithms. In this project we 10. BhagyaLaxmi Jena & Beda Prakash and Panigrahi (2014)
proposed a model which classifies gender using voice dataset "Gender Classification by Pitch Analysis”, International
accurately and efficiently. We have attempted to classify gender Journal on Advanced Computer Theory and Engineering
using six trained models among them the gradient boosting (IJACTE), volume 1,pp. 106-108.
model performs better than the others. The Model which we 11. T. Jayasankar, K. Vinothkumar, ArputhaVijayaselvi (2017)
proposed has best accuracy and performance .Models with good "Automatic Gender Identification in Speech Recognition
performance will help to use and develop voice-based gender Using Genetic Algorithm ", Applied mathematics and
recognition systems more effectively in wide range of aspects. information sciences an International journal, volume 11,
pp .907-913.
REFERENCES 12. Priyanka Makwana (2016) “Gender Recognition by Voice “,
1. GhazaalaYasmin, SuchibrotaDutt, ArijitGhosal (2017) International Research Journal of Engineering and
"Discrimination of Male and Female Voice Using Occurrence Technology, volume 6, pp.1-5.
Pattern of Spectral Flux”, 2017 International Conference on 13. Jerzy SAS, Alexander (2013) “Gender Recognition using
Intelligent Computing, Instrumentation and Control neural networks and ASR techniques”, Journal of medical
Technologies (ICICICT), pp.576-581. Informatics and technologies, Volume 22/2013, pp .179-
2. Rami S. Alkhawaldeh (2019) "Gender Recognition of Human 187.
Speech Using One-Dimensional Conventional Neural 14. Guy Young son, soonil kwon and neungsoo park (2019)
Network”, Hindawi Scientific Programming, Volume 2019, "Gender Classification Based on The Non-Lexical Cues of
pp.1-11. Emergency Calls with Recurrent Neural Networks
3. Sarah ItaLevitan, Taniya Mishra, Srinivas Bangalore (2016) "Symmetry 2019, volume 525, pp .1-14.
“Automatic Identification of Gender from Speech” ,Speech 15. Hadi Hard, Limening Chen (2015) “Gender identification in
Prosody 2016, pp. 84-88. Multimedia applications", Journal of Intelligent Information
4. Ioannis E. Livieris, Emmanuel Pintelas and Systems, volume 24, pp.1-17.
PanagiotisPintelas (2019) "Gender Recognition by Voice
Using an Improved Self-Labeled Algorithm”,
MachineLearning and KnowledgeExtraction 2019, pp. 492-
503.
5. Mansour Alsulaiman, Zulfiqar Ali and Ghulam Muhammad
(2011) "Gender Classification with Voice Intensity Speech
Processing” ,2011 IEEE, pp.205-210.
6. Igor bisio, Ales sandrodelfino, Asciarrone
(2013)"Recognizing a person's emotional state starting
from audio signal ",2013 IEEE ,volume 1, pp. 244-257.