Professional Documents
Culture Documents
B. Test Data
For testing we have used recorded audio sentences from
similar set of friends who recorded test data.
C. Implementation stages
There are three stages of implementation of stages which
are as- 1. We give raw input to our wav2vec2 model which
will convert raw audio into latent speech representation and
then perform quantization
2. Some vectors of quantized speech are removed before
it is given to any language model called masking.Masked
input is given to Language model.
3.Language model predict vectors which was masked in
previous step and gives the output text and accurately that
will determine WER
D. Language model
We have integrated 5-gram language model over
wav2vec2.For training our language model we have
used hugging face medical dialog dataset using KenLM
model.This medical dialog data consist of millions of dia-
logue realted to question and answering between doctor and
patient.