Gender Ut Report

A gender voice recognition using Linear regression
model of machine learning

MADANGOPAL, UTKARSH, VISMIT, NAVEEN,
Department Of Computer Science,
IIIT-NAGPUR,MAHARASHTRA,440006,
Email:madangopalsuthar2010@gmail.com
utkarshpatel500@gmail.com
vkumbhalkar20@gmail.com
nbnayak2408@gamil.com
Abstract-Gender Voice recognition is actually a very important
branch of research in the field of acoustics and speech processing
and also Gender identification is one of the major problem speech
analysis today.This paper is about an investigation on speech
signals to devise a gender classifier.Gender classification by
speech analysis basically aims to predict the gender of the speaker
by analyzing different parameters of the voice sample. Agenda is to
find the voice recognition using regression. Linear regression is a
machine learning algorithm based on supervised learning. The
main parameter in evaluating any algorithms is its
performance.Misclassification rate must be less in classification
problems, which says that the accuracy rate must be high.Here
with this comparative model algorithm, we are trying to assess the
different ML algorithms and find the best fit for gender
classification of acoustic data.
Keyword:- Kernelspec,Metadata,Colab,Codemirror_mode.
I. INTRODUCTION
Dimorphism is the property of voice that is highly observed in
human beings. Intonation, speech rate, and duration are certain
characteristics that distinguish human voices, mainly male and
female voices [1].In speech communication, the listener does not only
decode the linguistic message from the speech signal, but at the same time
he or she also infers paralinguistic information such as the age gender and
other properties of the speaker.
This type of information is termed as voice information. This gender
detection has application in many fields like
● Sorting telephone calls by gender for gender sensitive surveys.

● Identifying the gender and removing the gender specific components
gives higher compression rate and enhanced the bandwidth.

The perceived dimorphism accounts for 98.8% which consists of the gender
of the speaker and the respective frequencies. Variation in gender, however,
cannot be predicted by vocal speech. Some voice pitch may vary between
male and female so it is difficult to predict male and female accurately.
In terms of acoustic properties, voice information has been described by
several acoustic parameters. Two of these parameters are of perceptual
relevance:
● The fundamental frequency (f0) and

● The spectral formant frequencies.
Speech recognition helps in extracting the information about the gender,
their age and the dialect in which they speak. A large amount of work has
been done in this field. Certain speech statistics are being applied which
uses over time span, mean and maximum value for the detection of
gender.The voice data set is transformed into different parameters like
vocal strength, pitch, frequency, q21, q25, etc., these are then trained and
tested with different algorithms to predict the gender based on the
algorithms mentioned above. Male has both the frequencies lower than the
female. But formant frequency is something related to the vowels and hence
it is text dependent and this paper deals with text independency therefore
gender classification is done by using pitch/fundamental frequency
extracted from different methods. Pitch is a very important feature which
can be obtained from different methods in both time domain as well as
frequency domain and also by the combination of both time and frequency
domain. Time domain methods include all those methods in which we work
directly on the speech samples. The speech waveform is directly analyzed
and the methods like short time autocorrelation, modified autocorrelation
through clipping technique, normalized cross correlation function, average
magnitude difference function, square difference function etc. Similarly, in
frequency domain methods the frequency content of the signal is initially
computed and then the information is extracted from the spectrum. These
methods include harmonic product spectrum analysis, Harmonic to sub
harmonic ratio etc. There are also some methods which do not come under
either time domain or frequency domain like wavelets, LPC analysis etc.
II. RELATED WORK

Becker [2] used a frequency-based baseline model, linear regression model
[3], classification and regression tree (CART) model [4], random forest
model [5], boosted tree model [6], Support Vector Machine (SVM) model
[7], XGBoost model [8], stacked model [9] for recognition of voice data set
[10].Tin KamHo(Bell Labs) in 1995 proposed the concept of a random
forest. It is an ensemble classifier consisting many decision trees, further
providing the output by means of class's output by individual trees outputs
of the class. The method couples the bagging idea and the random selection.
Each tree is build up using a variant bootstrap data samples. In addition to
this random forest changes it also constructs a classification of regression
trees. In case of standard trees, the best node is split using its variables.
Moreover, each node is divided using the best subset of predictors which
are randomly chosen at that node. This method seems to be robust and
stable among the other algorithms including discriminate analysis, support
vector machines and neural networks and is robust against over
fitting.Supervised learning is the machine learning task for deducing
functions from a labeled training data that can be occupied for both
classification and regression. Support vector machines are a binary
classification algorithm. Support vectors are the data points adjacent to the
hyperplanes if the dataset is removed it will change the position of the
dividing hyperplane. In SVM, each example set is a pair having an input
object and a preferred output value. Supervised learning algorithm
analyses the data and result out the inferred function, which result in
mapping new outcomes.We can say, SVM is a machine learning tool used
for classification, approximation, etc.., it can be used in a generalized
manner that leads to the success in many fields. Metrics of SVM is to
minimize and generalization of upper bound error upon maximizing the
margin which is separating hyperplane and data set. Advantages of model
selection by means of both optimal number as well as the location of
functions are obtained automatically during training.
III. DATA SET AND SOFTWARE LIBRARIES
A.Data Set
Each voice sample format is a .WAV file. The .WAV format files have
been pre-processed for acoustic analysis using the specan function by
the WarbleR R package [11]. A specan function measures 22 acoustic
parameters on acoustic signals. These parameters are showed in
“Table I”.
TABLE I. MEASURED ACOUSTIC PROPERTIES.
Properties Description
duration length of signal
meanfreq mean frequency (in kHz)
sd standard deviation of frequency

median median frequency (in kHz)
mode mode frequency
centroid centroid frequency
meandom average of dominant frequency measured across
acoustic signal
Signal mindom minimum of dominant frequency measured across
acoustic signal

maxdom maximum of dominant frequency measured across
acoustic signal
dfrange range of dominant frequency
measured across acoustic signal

B. Software Libraries
Python; is an interpreted, interactive, object-oriented, dynamic type, easy

to learn and open source programming language. Python combines
remarkable power with very clear syntax [12]. Keras; “is a high-level
neural networks library, written in Python and capable of running on top
of either TensorFlow or Theano” [13]. TensorFlow™ is an open source
software library for numerical computation using data flow graphs. Nodes
in the graph represent mathematical operations, while the graph edges
represent the multidimensional data arrays (tensors) communicated
between them [14]. NumPy is the open source fundamental package for
scientific computing with Python. It contains powerful capabilities such as
N-dimensional array objects, sophisticated (broadcasting) functions, tools
for integrating C/C++ and Fortran code, useful linear algebra, Fourier
transform, and random number capabilities [15].
IV References
[1] Ericsdotter C and Ericsson A M 2001 Gender differences in vowel

duration in read Swedish: Preliminary results Working Papers - Lund
University Department of Linguistics 34 – 37.
[2] K. Becker, “Identifying the Gender of a Voice using Machine
Learning”,2016, unpublished.
[3] J. M. Hilbe, Logistic Regression Models, CRC Press, 2009.
[4] L. Breiman, J.H. Friedman, R.A. Olshen, C.J. Stone, Classification and
Regression Trees, CRC Press, 1984.
[5] L. Breiman, “Random forests”, Machine Learning, Springer US,
45:5–32, 2001.
[6] J.H. Friedman, Stochastic Gradient Boosting, 1999.
[7] C. Cortes, V. Vapnik, “Support-vector networks”, Machine Learning, 20
(3): 273–297, 1995.
[8] J.H. Friedman, Greedy Function Approximation: A Gradient Boosting

Machine, 1999.
[9] L. Breiman, “Stacked regressions”, Machine Learning, Springer US,

45:5–32, 2001.
[10]Dataset,https://raw.githubusercontent.com/primaryobjects/voice-gende
r/master/voice.csv
[11] M. Araya-Salas, G. Smith-Vidaurre, warbleR: an R package to

streamline analysis of animal acoustic signals. Methods Ecol Evolution,
2016, doi:10.1111/2041-210X.12624.
[12] Python, https://docs.python.org/3/faq/general.html
[13] Keras, Chollet, François, 2015, https://github.com/fchollet/keras
[14] M. Abadi, A. Agarwal, TensorFlow: Large-scale machine learning on
heterogeneous systems, 2015. Software available from tensorflow.org.
[15] S. van der Walt, S.C. Colbert, G. Varoquaux. The NumPy Array: A
Structure for Efficient Numerical Computation, Computing in Science &
Engineering, 13, 22-30, 2011, doi:10.1109/MCSE.2011.37.

Gender Ut Report

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Gender Ut Report

Uploaded by

Copyright:

Available Formats

A gender voice recognition using Linear regression

model of machine learning

● Sorting telephone calls by gender for gender sensitive surveys.

● The fundamental frequency (f0) and

II. RELATED WORK

III. DATA SET AND SOFTWARE LIBRARIES

TABLE I. MEASURED ACOUSTIC PROPERTIES.

Python; is an interpreted, interactive, object-oriented, dynamic type, easy

[1] Ericsdotter C and Ericsson A M 2001 Gender differences in vowel

[3] J. M. Hilbe, Logistic Regression Models, CRC Press, 2009.

[6] J.H. Friedman, Stochastic Gradient Boosting, 1999.

[8] J.H. Friedman, Greedy Function Approximation: A Gradient Boosting

[9] L. Breiman, “Stacked regressions”, Machine Learning, Springer US,

[11] M. Araya-Salas, G. Smith-Vidaurre, warbleR: an R package to

[12] Python, https://docs.python.org/3/faq/general.html

[13] Keras, Chollet, François, 2015, https://github.com/fchollet/keras

You might also like