Professional Documents
Culture Documents
Automatic Speech Recognition Using Deep Learning-1
Automatic Speech Recognition Using Deep Learning-1
■ The use of a speech recognition model has become extremely important as Speech
Control has become an important type.
■ Over the past decade, the ASR systems are used in a wide range of applications
extending from finance to healthcare.
■ To illustrate the application, via Deep Learning, we will be using PyTorch, an open-
source machine learning framework for this operation.
■ We will use Transformers, a state-of-the-art Natural Language Processing library by
Hugging Face.
■ We will be using Wav2Vec2.0, a speech model by Facebook for self-supervised learning
of speech representations that masks the speech input in the latent space and solves a
contrastive task defined over a quantization of the jointly learnt latent representations.
OBJECTIVE
■ The main objective of the project on Automatic Speech Recognition is to allow
machines to recognize sounds and act on them.
■ It allows machines to attain the ability to identify “receive and interpret” speech and
translate it into readable form or text.
■ Exclusively, using Wav2Vec2.0 model must address speech recognition with 100x less
labelled training data and surpass the best-semi supervised approaches.
■ This approach should lead to advancements in speech recognition technology for
upcoming, as well as current systems.
PROBLEM DEFINITION
■ Automatic Speech Recognition (ASR) data usually contains noise, causing machines to
misunderstand specific words or phrases.
■ Hence, there are many unnecessary words in a sentence, which will affect the
interpretation.
■ Misspelled or misused words can create problems for text analysis. Autocorrect and
grammar correction applications can handle common mistakes, but don’t always
understand the writer’s intention.
■ However, as language databases grow and smart assistants are trained by their
individual users, these issues can be minimized.
■ Hence, we use deep learning to allow a multitude of NLP techniques, algorithms, and
models to work progressively, much like the human mind does.
PLANNING
■ Automatic speech recognition is the method of translating a speech signal into a series
of words using a computer program and its algorithms.
■ Wav2Vec2.0 shows great potential when it comes to creating speech recognition models
for settings where there is very little labelled training data.
■ The projects results show that deep neural networks have ability to solve speech
recognition challenges.
REFERENCES