Chapter To End2

Gender Recognition from Audio Signal by Using MATLAB
CHAPTER 1
PREAMBLE
This chapter provides an overview of “Gender recognition from audio signals by using
MATLAB”. Also describe the literature survey. Objectives, methodology, advantages,
resource and reference required for this project.
1.1 General Introduction
Gender recognition from audio signals is a task that involves identifying the gender of a
person based on their voice. This can be accomplished using a variety of techniques, including
machine learning algorithms, signal processing methods, and feature extraction techniques. we
will focus on using MATLAB to perform gender recognition from audio signals. MATLAB is
a powerful programming language and toolset that is widely used in the fields of signal
processing, machine learning, and audio analysis.
To get started with gender recognition from audio signals using MATLAB, we will first
need to acquire an audio dataset that includes recordings of male and female voices. Once we
have our dataset, we will need to preprocess the audio signals to remove any noise or artifacts
that might interfere with our analysis.
Next, we will need to extract relevant features from the audio signals. There are many
different features that can be extracted from audio signals, including pitch, energy, formants,
and cepstral coefficients. We will need to experiment with different feature sets to determine
which features are most informative for gender recognition.
Once we have extracted our features, we can use machine learning algorithms to train a
gender recognition model. There are many different machine learning algorithms that can be
used for gender recognition, including support vector machines, decision trees, and neural
networks. We will need to experiment with different algorithms and parameter settings to
determine which approach works best for our dataset.
Finally, we will need to evaluate our gender recognition model using a separate test dataset
to determine its accuracy and performance. We can use various metrics such as accuracy,
precision, recall, and F1 score to evaluate the performance of our model.
Dept. of ECE, RIT Hassan 2022-2023 Page 1

1.2 Literature Survey

Gender recognition from audio signals is an area of research that has gained a lot of
attention in recent years. In this field, researchers use various techniques and algorithms to
identify the gender of a speaker from their speech or audio signal.
Kanishk-K-U (2022) [1]- “Gender recognition system using deep learning techniques”
like convolutional neural networks (CNN) and long short-term memory (LSTM) networks.
The system was trained and tested on a database of speech recordings and achieved an accuracy
of 98%.
“B. C. Kim et al., (2021) [2]- “Gender recognition using deep neural network with transfer
learning and data augmentation” this paper proposed a deep learning-based approach for
gender recognition using transfer learning with VGG network and data augmentation.
Ehsan et al. (2020) [3]- "Gender recognition Using Transfer Learning from Large-scale
Pre-trained Convolutional Neural Networks (CNNs)" this paper proposed a gender recognition
system using (CNNs) like VGG16 and ResNet50.The system was trained and tested on a
database of speech recordings and achieved an accuracy of 99%.
Yildirim et al. (2019) [4]- "Gender recognition from Speech using Deep Learning
Techniques" this paper proposed a gender recognition system using deep learning techniques
like convolutional neural networks (CNN) and long short-term memory (LSTM) networks.
The system was trained and tested on a database of speech recordings and achieved an accuracy
of 98%.
Hemant A. Patil et al. (2016) [5]- "Gender recognition using pitch and formants" this
paper proposed a method for gender recognition using pitch and formants extracted from
speech signals. The authors achieved an accuracy of 89.16% using this method.
Jyoti Agrawal et al. (2015) [6]- "Gender recognition from speech using Mel frequency
cepstral coefficients and support vector machines" this paper proposed a method for gender
recognition using Mel frequency cepstral coefficients (MFCC) and support vector machines
(SVM). The authors achieved an accuracy of 92.8% using this method.

1.3 Problem Formulation

There are already various algorithms and techniques present on “Gender recognition audio
signal using MATLAB” which are under development and usage. What are the key issues
those algorithms or techniques faces and challenges they come across? The major thing would
be pitch analysis, insufficiently accurate for a wide range speaker, voice features extraction,
filtering background noise is too much difficult task, which leads to majority of it to fail? In
our current work we are working on these issues to overcome the challenges we are facing.
The project is designed to develop a module that can accurately predict the gender of a
speaker based on their audio signal.
1.4 Scope & Objectives

1.4.1 Scope:
Gender recognition from audio signals has various applications in fields such as speech
recognition, forensics, and human-computer interaction. Here are some examples of its scope:
Speech recognition: Gender recognition can be used as a pre-processing step in speech
recognition systems to improve accuracy. By identifying the gender of the speaker, the system
can adapt its parameters and models accordingly.
 Forensics: Gender recognition can be used in forensic investigations to identify the
gender of a speaker in recorded conversations. This can help in criminal
investigations and court proceedings.
 Human-computer interaction: Gender recognition can be used to personalize user
interfaces and improve the user experience. For example, a voice assistant could
adapt its responses based on the gender of the user.
 Healthcare: Gender recognition can be used in healthcare applications such as
telemedicine to identify the gender of the patient or caregiver. This information can
be used to provide more personalized care.

1.4.2 Objectives:
 Accurate identification of the gender of a speaker: The primary objective of gender
recognition is to accurately identify the gender of a speaker from an audio signal.
This can be done using various machine learning techniques and feature extraction
methods.
 Improving speech recognition accuracy: By identifying the gender of the speaker,
speech recognition systems can adapt their models and parameters to improve
accuracy.
 Personalizing user interfaces: Gender recognition can be used to personalize user
interfaces and provide a more tailored experience for users.
 Supporting forensic investigations: Gender recognition can be used in forensic
investigations to identify the gender of a speaker in recorded conversations, which
can be useful in criminal investigations and court proceedings.
 Improving healthcare applications: Gender recognition can be used in healthcare
applications to identify the gender of the patient or caregiver, which can be used to
provide more personalized care.

1.5 Methodology
Speech
Derive data from recorded file
FFT
Power Spectrum
Energy
If I (I, J) Yes
Threshold
Male
No
Female
Fig 1.1: Flow chart of Proposed method

The energy-based thresholding technique is used as a classifier: -
 First, the voice samples of both male and female are recorded in a file and it is
stored in a recorded file.
 Then the feature (energy) is extracted from the voice sample and it is referred as a
known value. Then the unknown voice sample is taken for analysis and the feature
is extracted.
 The extracted feature is referred as unknown value. The unknown value is
compared with the known value. If it is matched, then we conclude that whether
the speaker is a male or female.
 Feature Extraction Energy is used as a feature and it is extracted by estimating the
power spectrum. The voices samples of both male and females are recorded in a
file.
 Then voice samples are given as the input to the recognition system.
 Then FFT is applied to the voice samples. Then the power spectrum is also
estimated from the FFT applied signal.
 Then the energy is extracted from the power spectrum.
 From that the threshold energy is calculated. For the unknown voice sample,
energy is extracted by the same method and it is compared with the estimated
threshold energy. The procedure for extracting the energy is shown in above
figure 1.1.
1.6 Advantages & Application

1.6.1 Advantages:
There are several advantages to developing a gender recognition system using audio
signals in MATLAB, including:
 Accuracy: With the use of advanced signal processing techniques and machine
learning algorithms, gender recognition systems can achieve high accuracy in
identifying the gender of a speaker from an audio signal.
 Non-intrusive: Audio-based gender recognition systems are non-intrusive and can
be used without requiring any physical contact with the speaker, making them more
comfortable and convenient for users.
 Real-time processing: MATLAB provides tools and functions for real-time
processing of audio signals, which allows gender recognition systems to operate in
real-time applications such as speech recognition or personalizing user interfaces.
 Cost-effective: MATLAB is an open-source platform that provides free access to a
range of signal processing and machine learning functions, making it a cost-
effective option for developing gender recognition systems.
 Versatility: Gender recognition systems developed using MATLAB can be applied
in various domains such as healthcare, security, and entertainment, to provide more
personalized and customized services based on the gender of the user.
 Overall, developing gender recognition systems using audio signals in MATLAB
offers several advantages such as accuracy, non-intrusiveness, real-time

processing, cost-effectiveness, and versatility, making them useful for a wide range
of applications
1.6.2 Applications:
Gender recognition systems developed using audio signals in MATLAB have a wide
range of applications, including:
 Speech recognition: Gender recognition can be used to adapt speech recognition
systems to the speaker's gender, which can improve their accuracy.
 User interface personalization: Gender recognition can be used to personalize user
interfaces based on the user's gender, providing a more tailored and customized
experience.
 Healthcare applications: Gender recognition can be used to identify the gender of
patients or caregivers, which can help to provide more personalized care in
healthcare applications.
 Security and forensics: Gender recognition can be used in security and forensic
applications to identify the gender of a speaker in recorded conversations, which
can be useful in criminal investigations and court proceedings.
 Entertainment: Gender recognition can be used in entertainment applications such
as video games and virtual reality systems to provide a more immersive and
personalized experience.
 Marketing and advertising: Gender recognition can be used in marketing and
advertising applications to target specific products and services to male or female
audiences.
 Overall, gender recognition systems developed using audio signals in MATLAB
have a wide range of applications, spanning across various domains, including
speech recognition, user interface personalization, healthcare, security,
entertainment, marketing, and advertising.
1.7 Software requirements

 MATLAB 2023a

CHAPTER 2
BACKGROUND THEORY
2.1 History of Gender recognition from audio signal
The history of gender recognition from audio signals is an interesting topic for a report.
Here's a detailed account of the historical developments in this field:
 Early Years (1970s-1990s):
- In the early years, research on gender recognition from audio signals primarily focused on
fundamental frequency analysis.
- Studies found that fundamental frequency exhibited distinct differences between male and
female voices.
- Techniques like autocorrelation and pitch tracking algorithms were used to estimate
fundamental frequency and classify voices based on gender.
 Feature Extraction and Classification (1990s-2000s):
- During this period, researchers expanded the analysis to include additional acoustic features
beyond fundamental frequency.
- Features like formant frequencies, voice quality, energy distribution, and spectral/temporal
characteristics were explored.
- Machine learning algorithms such as Gaussian Mixture Models (GMMs) and Hidden
Markov Models (HMMs) were introduced for gender classification based on these features.
- Research also investigated the impact of language and dialect on gender recognition.
 3. Advancements in Machine Learning (2000s-2010s):
- The 2000s witnessed significant advancements in machine learning techniques, particularly
with the rise of deep learning algorithms.
- Neural network models, including feedforward neural networks and recurrent neural
networks, were applied to gender recognition.
- These models could automatically learn complex patterns and representations from the
extracted acoustic features, improving accuracy.
 Large Datasets and Deep Learning (2010s-Present):
- The availability of large labeled datasets, such as the VoxCeleb dataset, played a crucial
role in advancing gender recognition.

- Deep learning models like Convolutional Neural Networks (CNNs) and Recurrent Neural
Networks (RNNs) gained prominence.
- Models such as Long Short-Term Memory (LSTM) networks were utilized to extract
features and make gender predictions.
- Transfer learning techniques, where pre-trained models on large audio datasets were fine-
tuned, also became popular.
 Ongoing Research and Challenges:
- Gender recognition from audio signals continues to be an active area of research.
- Challenges include variations in speech patterns across languages and dialects, background
noise, and mitigating biases.
- Efforts focus on improving robustness, accuracy, and addressing ethical considerations.
- Research aims to develop fair and responsible gender recognition systems that respect
privacy, consent, and minimize biases.

CHAPTER 3
DESIGN AND IMPLEMENTATION
3.1 Design Requirements
 Method 1 - Using GUI based MATLAB App
Fig 3.1.1: Using GUI based MATLAB App
It straightway returns the fundamental frequency, f0 of any input audio file with
sampling frequency fs. Thus, we can directly compare it with 165 Hz and report whether the
voice is of a male or a female.
 Source button- Used to select the source file. By default, it is set to take in only
.wav files (can be changed either in source file)
 Go button- After the selection from source button to finally select the file name
shown in box, press this button.
 Another way is directly writing the address of that file and pressing Go button.
 In the Message box where in the “Upload New File” is shown provides step by

step
 Procedure and errors made by the user while selecting the file and processing it
further.
 Generate Graph- Generates the FFT graph of that audio file.
 Calculate- This button calculates the frequency using the algo proposed and
using the inbuilt algo. Also, it tells the gender.
 Note- Even though the frequency does not match, the gender predicted is almost same
in all the test cases.
 Method 2 - Using MATLAB Simulink
Fig 3.1.2: Flow chart of Proposed method

 Simulink Implementation
Fig 3.1.3 : Simulink Implementation

 Using the From Multimedia File block the sample audio is taken as input with 3500
samples per audio channel. This is passed to the next block where frequency of each
frame is calculated.
 Block Persistent is used in order to save previous result in order to use next time. This
is essential for the calculation of mean frequency.
 In the final block values are compared with the threshold values.
 As final result frequency is displayed also
1) 0 – Male 2) 1 – Female

3.2 Implementation Steps

 Method 1 - Using GUI based MATLAB App
Step 1: - Create a GUI layout
Fig 3.2.1 : GUI layout
Step 2: - Select the .wav file
Fig 3.2.2 : .wav file

Step 3: -.wav file uploaded
Fig 3.2.3 :upload the .wav file
Step 4: - Generate graph by using audio signal
Fig 3.2.5 :Generation of fft waveform

Step 5: - Calculate the output from user-defined and output from in-built function
Fig 3.2.6 : calculate the output
 Method 2 - Using MATLAB Simulink

Step 1: - Create Simulink blocks
Fig 3.2.7 : simulink block

Step 2: - Double click on “From Multimedia file”
Fig 3.2.8 : Browsing .wav file
Step 3: - Browse .wav audio file
Fig 3.2.9 : selecting .wav file

Step 4: - Simulate the model
Fig 3.3.1 :
Step 5: - As final result frequency is displayed (1. Male -0, 2. Female-1)
Fig 3.3.2 : Final result

CHAPTER 4
RESULT AND DISCUSSIONS
4.1 Result
Method-1 output
Fig 4.1 : Output
Method-2 Output
Fig 4.2 : Output

CHAPTER 5
CONCULSION
This process of translating speech in systems is known as gender recognition using
voice. It was created to allow a person to authenticate or verify the identity of a speaker as part
of a security measure. In this project, the speaker is identified by using energy estimation as a
threshold value. This calculated energy is then compared to the threshold energy. If the energy
is greater than the threshold, the male produces the voice sample. If it is less than the threshold,
the female produces the voice sample.
FUTURE ENHANCEMENT: -
 Emotional analysis
 Sentiment analysis
 Speech synthesis
 Speech recognition
 Customer segmentation
 Recommendation of products to customer in online shopping

References
[1]. “Gender Identification from Speech Signal using MATLAB" by Anurag Saxena and
Preethi Gupta, International Journal of Scientific & Engineering Research, Volume 3, Issue
10, October-2012.
[2]. "Speech-Based Gender Recognition Using MATLAB" by Varsha S. Patil and Lata R.
Raghu, International Journal of Engineering Research and Technology, Volume 5, Issue 5,
May-2016.
[3]. "Gender Recognition from Speech Using MFCC and SVM with Different Kernel
Functions" by Mohamed R. El-Melegy and Hossam M. Zawbaa, Journal of Applied Sciences,
Volume 18, Issue 3, March-2018.
[4]. J. Quinone Ro-Candela, M. Sugiyama, A. Schwaighofer, and N. D. Lawrence, "Dataset
Shift in Machine Learning," The MIT Press, 2009.
[5]. P. Burget, L. Ferrer, and J. Cernocky, "Voice activity detection in adverse conditions using
the generalised likelihood ratio test," Speech Communication, vol. 49, no. 3, pp. 160-172,
2007.
[6]. D. Talkin, "A robust algorithm for pitch tracking (RAPT)," Speech Coding and Synthesis,
pp. 495-518, 1995.
[7]. P. Lanchantin and A. Laptev, "Formant analysis using weighted linear prediction with a
relative spectral distortion measure," IEEE International Conference on Acoustics, Speech and
Signal Processing (ICASSP), pp. 2391-2395, 2015.
[8]. A. de Cheveigné and H. Kawahara, "YIN, a fundamental frequency estimator for speech
and music," The Journal of the Acoustical Society of America, vol. 111, no. 4, pp. 1917-1930,
2002.
[9]. J. Han, M. Kamber, and J. Pei, "Data Mining: Concepts and Techniques," Morgan
Kaufmann Publishers, 2011.

PROJECT ASSOCIATE
Name: Dhamini S L
USN: 4RA20EC006
Email: dhaminigowda74@gmail.com
Mobile No: 7619496008
Address: D/O Lakshmana S.R, S M Krishna Nagar,
near Karnataka state open university
Hassan -573118
Name: Bhoomika N
USN:4RA21ECC400
Email: bhoomikagowdan2000@gmail.com
Mobile No:7090944731
Address: D/O Ningarajegowda, narasinakuppe village,
Arkalgud(T), Hassan(D)-573201
Name: Shashikala M K
USN: 4RA21EC405
Email: shashikala.m.k.000@gmail.com
Mobile No:9632674820
Address: D/O Kempananjegowda, Channarayapatna,
Hassan-573201
UNDER THE GUIDANCE OF
Name: Ravi L S
Asst. Professor,
Department of E & C,
R.I.T, Hassan

Chapter To End2

Uploaded by

Copyright:

Available Formats

You might also like

Chapter To End2

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Chapter To End2

Uploaded by

Copyright:

Available Formats

Gender Recognition from Audio Signal by Using MATLAB

Dept. of ECE, RIT Hassan 2022-2023 Page 1

1.2 Literature Survey

Dept. of ECE, RIT Hassan 2022-2023 Page 2

1.3 Problem Formulation

1.4 Scope & Objectives

Dept. of ECE, RIT Hassan 2022-2023 Page 3

Dept. of ECE, RIT Hassan 2022-2023 Page 4

Derive data from recorded file

Fig 1.1: Flow chart of Proposed method

1.6 Advantages & Application

Dept. of ECE, RIT Hassan 2022-2023 Page 6

1.7 Software requirements

Dept. of ECE, RIT Hassan 2022-2023 Page 7

Dept. of ECE, RIT Hassan 2022-2023 Page 8

Dept. of ECE, RIT Hassan 2022-2023 Page 9

Fig 3.1.1: Using GUI based MATLAB App

Dept. of ECE, RIT Hassan 2022-2023 Page 10

 Method 2 - Using MATLAB Simulink

Fig 3.1.2: Flow chart of Proposed method

Dept. of ECE, RIT Hassan 2022-2023 Page 11

Fig 3.1.3 : Simulink Implementation

Dept. of ECE, RIT Hassan 2022-2023 Page 12

3.2 Implementation Steps

Step 1: - Create a GUI layout

Fig 3.2.1 : GUI layout

Step 2: - Select the .wav file

Fig 3.2.2 : .wav file

Dept. of ECE, RIT Hassan 2022-2023 Page 13

Step 3: -.wav file uploaded

Fig 3.2.3 :upload the .wav file

Step 4: - Generate graph by using audio signal

Fig 3.2.5 :Generation of fft waveform

Dept. of ECE, RIT Hassan 2022-2023 Page 14

Fig 3.2.6 : calculate the output

 Method 2 - Using MATLAB Simulink

Fig 3.2.7 : simulink block

Dept. of ECE, RIT Hassan 2022-2023 Page 15

Step 2: - Double click on “From Multimedia file”

Fig 3.2.8 : Browsing .wav file

Step 3: - Browse .wav audio file

Fig 3.2.9 : selecting .wav file

Dept. of ECE, RIT Hassan 2022-2023 Page 16

Step 4: - Simulate the model

Fig 3.3.2 : Final result

Dept. of ECE, RIT Hassan 2022-2023 Page 17

Fig 4.1 : Output

Fig 4.2 : Output

Dept. of ECE, RIT Hassan 2022-2023 Page 19

Dept. of ECE, RIT Hassan 2022-2023 Page 20

UNDER THE GUIDANCE OF

Dept. of ECE, RIT Hassan 2022-2023 Page 21

You might also like