You are on page 1of 4

PROGRESS REPORT OF INTERNSHIP

IN
CSDT IT SOLUTION PVT. LTD.

DEPARTMENT OF ELECTRONICS & COMMUNICATION


ENGINEERING
BIRLA INSTITUTE OF TECHNOLOGY, MESRA, PATNA CAMPUS, BIHAR-
800014, INDIA

Submitted by: -

Md Shams Alam (BTECH/15206/19)

Dr. Rajeev Ranjan Dr. M. Dadel


(Project Guide) (Project Co-Ordinator)
Objective
Currently my work is to create a “Speech Recognition” model for company
website. “Speech Recognition” that enable computers or machines to
accurately transcribe spoken words into text or to interpret spoken commands
given by humans. The technology of speech recognition involves capturing
and processing spoken words using algorithms that can recognize patterns
and convert them into written text or machine commands. Speech recognition
has a wide range of applications, including virtual assistants, voice-controlled
devices, automated customer service systems, transcription software, and
language translation services. The goal of speech recognition is to make
human-computer interaction more natural, intuitive, and efficient by enabling
users to communicate with machines using their voices instead of typing or
clicking.

Motivation
Speech recognition technology has a wide range of applications across
various industries, including healthcare, finance, telecommunications, and
entertainment. The primary motivation for developing speech recognition
technology is to provide a more natural and efficient way for humans to
interact with computers and other devices. This project makes heavy use of
NumPy, Pandas, and Data Visualization Libraries.

Literature review
Speech recognition technology has been an active area of research and
development for several decades. Here is a brief literature review of some of
the important research papers and trends in the field:

Speech recognition technology has been an active area of research and


development for several decades. Here is a brief literature review of some of
the important research papers and trends in the field:
1."A Statistical Approach to Machine Translation" by Fred Jelinek (1997):
This paper introduced the concept of using hidden Markov models (HMMs)
for speech recognition. This technique has become a cornerstone of modern
speech recognition systems.

1. "Deep Speech: Scaling up end-to-end speech recognition" by Awni Hannun


et al. (2014): This paper presented a novel approach to speech recognition
that uses deep neural networks to directly map audio signals to text. This end-
to-end approach has since become increasingly popular and has achieved
state-of-the-art results.

2. "Attention-Based Models for Speech Recognition" by Jan Chorowski et al.


(2015): This paper introduced the use of attention mechanisms in speech
recognition, allowing the model to focus on relevant parts of the audio signal.
This has led to improved performance on challenging speech recognition
tasks.

3. "Speech Recognition with Deep Recurrent Neural Networks" by Alex Graves


and Jürgen Schmidhuber (2013): This paper proposed the use of recurrent
neural networks (RNNs) for speech recognition, allowing the model to take
into account the temporal nature of speech. This approach has since become
a popular choice for speech recognition.

4. "Transformer-Based End-to-End Speech Recognition" by Shiyu Zhou et al.


(2020): This paper introduced the use of transformer-based models, which
have become popular in natural language processing, for speech recognition.
This has led to improved performance and more efficient training.

Proposed Work
Real-time speech recognition: Many speech recognition systems require a
significant amount of processing time, which can be a challenge for real-time
applications, such as live transcription or virtual assistants. One potential area
of research could be to develop techniques for real-time speech recognition,
which can process speech in real-time with minimal delay.

Methodology
The methodology of speech recognition involves several key steps:

1. Data collection: The first step in speech recognition is to collect a large


dataset of audio recordings and their corresponding transcriptions. This
dataset is typically used to train and evaluate the speech recognition model.

2. Feature extraction: The audio signals in the dataset are then preprocessed to
extract features that are relevant for speech recognition. This typically
involves using techniques such as Fourier transforms or Mel-frequency
cepstral coefficients (MFCCs) to convert the audio signals into a set of
features that can be used by the speech recognition model.

3. Acoustic modeling: The next step is to develop an acoustic model, which is


a mathematical representation of the relationship between the audio features
and the corresponding text transcriptions. This is typically done using
machine learning techniques such as hidden Markov models (HMMs) or deep
neural networks (DNNs).

4. Language modeling: In addition to the acoustic model, a language model is


also developed to capture the probability of different word sequences in the
transcriptions. This is typically done using statistical language models, such
as n-gram models or recurrent neural networks (RNNs).

5. Decoding: Once the acoustic and language models are trained, the speech
recognition system uses a decoding algorithm to find the most likely
transcription for a given audio input. This typically involves searching for the
sequence of words that has the highest probability given the audio features
and language model.

6. Evaluation: Finally, the performance of the speech recognition system is


evaluated using metrics such as word error rate (WER) or sentence error rate
(SER), which measure the accuracy of the transcriptions compared to the
ground truth.

You might also like