Automatic Speech Recognition Using Deep Learning-1

AUTOMATIC SPEECH RECOGNITION
USING DEEP LEARNING

NEHA ACHARYA (20CS051) GUIDANCE :
PRATHUASHA K B (20CS058) DR. CHANNAKRISHNARAJU
INTRODUCTION
■ Now-a-days many modern devices and text-focused programs have speech recognition
functions in them to allow for easier or hands-free use of a device.
■ Speech recognition uses a broad array of research in Computer Science and Computer
Engineering.
■ Automatic speech recognition (speech-to-text system) is technique that processes human
speech into readable text.
■ Deep Learning approach is used to process and interpret speech recognition that allow
users to customize the present features to their desired needs.
■ On demand of this technology, one can build a very simple and specific speech
recognition system via Facebook’s Wav2Vec2.0 model and Transformers by Hugging
Face.
LITERATURE SURVEY
■ The use of a speech recognition model has become extremely important as Speech
Control has become an important type.
■ Over the past decade, the ASR systems are used in a wide range of applications
extending from finance to healthcare.
■ To illustrate the application, via Deep Learning, we will be using PyTorch, an open-
source machine learning framework for this operation.
■ We will use Transformers, a state-of-the-art Natural Language Processing library by
Hugging Face.
■ We will be using Wav2Vec2.0, a speech model by Facebook for self-supervised learning
of speech representations that masks the speech input in the latent space and solves a
contrastive task defined over a quantization of the jointly learnt latent representations.
OBJECTIVE
■ The main objective of the project on Automatic Speech Recognition is to allow
machines to recognize sounds and act on them.
■ It allows machines to attain the ability to identify “receive and interpret” speech and
translate it into readable form or text.
■ Exclusively, using Wav2Vec2.0 model must address speech recognition with 100x less
labelled training data and surpass the best-semi supervised approaches.
■ This approach should lead to advancements in speech recognition technology for
upcoming, as well as current systems.
PROBLEM DEFINITION
■ Automatic Speech Recognition (ASR) data usually contains noise, causing machines to
misunderstand specific words or phrases.
■ Hence, there are many unnecessary words in a sentence, which will affect the
interpretation.
■ Misspelled or misused words can create problems for text analysis. Autocorrect and
grammar correction applications can handle common mistakes, but don’t always
understand the writer’s intention.
■ However, as language databases grow and smart assistants are trained by their
individual users, these issues can be minimized.
■ Hence, we use deep learning to allow a multitude of NLP techniques, algorithms, and
models to work progressively, much like the human mind does.
PLANNING
Install the required Loading the pre-trained

packages inside a Importing necessary model (Wav2Vec2.0)
virtual environment libraries. and the corresponding
(like Pytorch,Librosa) tokenizer.
Creating a function that

Feed the input audio to
makes sure that the Result to be stored in
tokenizer for
speech input has a transcription variable
processing.
sampling rate of 16kHz.
PROPOSED CONCLUSION
■ Automatic speech recognition is the method of translating a speech signal into a series
of words using a computer program and its algorithms.
■ Wav2Vec2.0 shows great potential when it comes to creating speech recognition models
for settings where there is very little labelled training data.
■ The projects results show that deep neural networks have ability to solve speech
recognition challenges.
REFERENCES
■ Deep Learning and it’s applications : a review by NF Hordri, SS Yuhaniz

-https://www.researchgate.net/profile/Nur-
Hordri/publication/317185136_Deep_Learning_and_Its_Applications_A_Re
view/links/592beb5faca27295a80c9c91/Deep-Learning-and-Its-
Applications-A-Review.pdf
■ Speech Recognition using Deep Neural Networks: A systematic review by AB Nassif, I
Shahin, I Attili, M Azzeh, K Shaalan
-https://ieeexplore.ieee.org/abstract/document/8632885
-https://ieeexplore.ieee.org/abstract/document/8887564

Automatic Speech Recognition Using Deep Learning-1

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Automatic Speech Recognition Using Deep Learning-1

Uploaded by

Copyright:

Available Formats

AUTOMATIC SPEECH RECOGNITION

USING DEEP LEARNING

Install the required Loading the pre-trained

Creating a function that

■ Deep Learning and it’s applications : a review by NF Hordri, SS Yuhaniz

You might also like