You are on page 1of 2

PROJECT PROPOSAL

RISHI GUPTA
M.TECH AI(19729)
PROJECT TITLE : SPEECH TO TEXT CONVERSION
PROBLEM STATEMENT:
In this project we will develop an Automatic speech recognition
model (ASR), which will detect speech signal given as input to
model and convert it into text.

IMPLEMENTATION DETAIL:
In this we will model deep learning based Automatic speech
Recognition model wav2vec2 for converting speech signal
into text. Using wav2vec2 we will train our ASR model which
will give output as text. For getting an accurate text from
speech we will use any language model like kenlm, N-gram
model. Implementing language model over wav2vec2 will
increase probability of getting accurate text.

EXPERIMENTS:
In this we will try to implement various language model by using
builtin function and by self implementation both to get output text
as accurate as possible.
DATASET DETAIL:
Here we are using only one accent language from many accent
language present in data set whose link is given as:
https://commonvoice.mozilla.org/en/datasets
or else we can use buitin hugging face facebook AI dataset that is:

" Facebook/wav2vec2-base-960h" .

TOOLS AND PROGRAM:


• Buitin library function for wav2vec2 that is Wav2vec2Processor
and Wav2vec2ForCTC import from transformers.
• From NLTK, we use N-gram library function.

You might also like