Professional Documents
Culture Documents
Gender Ut Report
Gender Ut Report
The perceived dimorphism accounts for 98.8% which consists of the gender
of the speaker and the respective frequencies. Variation in gender, however,
cannot be predicted by vocal speech. Some voice pitch may vary between
male and female so it is difficult to predict male and female accurately.
In terms of acoustic properties, voice information has been described by
several acoustic parameters. Two of these parameters are of perceptual
relevance:
Speech recognition helps in extracting the information about the gender,
their age and the dialect in which they speak. A large amount of work has
been done in this field. Certain speech statistics are being applied which
uses over time span, mean and maximum value for the detection of
gender.The voice data set is transformed into different parameters like
vocal strength, pitch, frequency, q21, q25, etc., these are then trained and
tested with different algorithms to predict the gender based on the
algorithms mentioned above. Male has both the frequencies lower than the
female. But formant frequency is something related to the vowels and hence
it is text dependent and this paper deals with text independency therefore
gender classification is done by using pitch/fundamental frequency
extracted from different methods. Pitch is a very important feature which
can be obtained from different methods in both time domain as well as
frequency domain and also by the combination of both time and frequency
domain. Time domain methods include all those methods in which we work
directly on the speech samples. The speech waveform is directly analyzed
and the methods like short time autocorrelation, modified autocorrelation
through clipping technique, normalized cross correlation function, average
magnitude difference function, square difference function etc. Similarly, in
frequency domain methods the frequency content of the signal is initially
computed and then the information is extracted from the spectrum. These
methods include harmonic product spectrum analysis, Harmonic to sub
harmonic ratio etc. There are also some methods which do not come under
either time domain or frequency domain like wavelets, LPC analysis etc.
A.Data Set
Each voice sample format is a .WAV file. The .WAV format files have
been pre-processed for acoustic analysis using the specan function by
the WarbleR R package [11]. A specan function measures 22 acoustic
parameters on acoustic signals. These parameters are showed in
“Table I”.
Properties Description
duration length of signal
meanfreq mean frequency (in kHz)
sd standard deviation of frequency
median median frequency (in kHz)
mode mode frequency
centroid centroid frequency
meandom average of dominant frequency measured across
acoustic signal
Signal mindom minimum of dominant frequency measured across
acoustic signal
maxdom maximum of dominant frequency measured across
acoustic signal
dfrange range of dominant frequency
measured across acoustic signal
B. Software Libraries
IV References
[2] K. Becker, “Identifying the Gender of a Voice using Machine
Learning”,2016, unpublished.
[4] L. Breiman, J.H. Friedman, R.A. Olshen, C.J. Stone, Classification and
Regression Trees, CRC Press, 1984.
[5] L. Breiman, “Random forests”, Machine Learning, Springer US,
45:5–32, 2001.
[7] C. Cortes, V. Vapnik, “Support-vector networks”, Machine Learning, 20
(3): 273–297, 1995.
[10]Dataset,https://raw.githubusercontent.com/primaryobjects/voice-gende
r/master/voice.csv
[14] M. Abadi, A. Agarwal, TensorFlow: Large-scale machine learning on
heterogeneous systems, 2015. Software available from tensorflow.org.
[15] S. van der Walt, S.C. Colbert, G. Varoquaux. The NumPy Array: A
Structure for Efficient Numerical Computation, Computing in Science &
Engineering, 13, 22-30, 2011, doi:10.1109/MCSE.2011.37.