You are on page 1of 13

SPEECH QUALITY

ASSESSMENT

PRESENTED BY :-
CHINKU KUMAR VEHERA(CRF202575)
CONTAINS

• Introduction
• Model description
• Data bases
• Model training and results
• Conclusion
• References
INTRODUCTION

• Speech quality of voice communication services has rapidly increased in the last decades.
• The reasons for the improved quality is the extension of the transmission bandwidth from
narrowband (NB), with a bandwidth from 300 - 3400 Hz, to wideband (WB) with 100 -
7000 Hz.
• Here we present a non-intrusive speech quality assessment model NISQA, which – in
contrast to current state which can predict the quality of super-wideband speech
transmission.
• Recently, the quality was further improved with the introduction of super-wideband
(SWB) transmission to speech communication networks, with a bandwidth of 50 - 14000
Hz.
• Signal-based models can be divided into two groups:
• a. Intrusive models b. Non-intrusive model
• Intrusive models require the degraded output signal of the transmission system and the
clean original input signal.
• Non-intrusive or single-ended models rely only on the degraded output signal of the
transmission system.
• The long-term standard for NB and WB speech quality assessment by the International
Telecommunication Union (ITU-T) has been PESQ and WB-PESQ.
• They are now replaced by P.OLQA , the current recommendation by the ITU-T, which
also considers SWB transmission.
MODEL DESCRIPTION
• The SWB speech quality estimator NISQA is based on a convolutional neural network
(CNN) that estimates the speech quality for each frame of the input signal.
• The estimated per-frame quality values are then aggregated over time by using a recurrent
neural network (RNN).
• Advantage of RNNs is that they allow time sequences with different lengths as input.
NISQA MODEL
• The advantage of the CNN-LSTM approach is twofold: Firstly, the per-frame quality
gives some insight into the cause of a quality degradation.
• Secondly, this approach helps to regularize the training of the RNN.
• As on a per frame basis, we have a large amount of training data, but on a file basis only
limited data, it is important to minimize the input feature size of the RNN.
DATABASES

• Overall, 29 different databases with typical P.800 double sentences with a duration of 6 -
12 s were available.
• All SWB test set databases from the P.OLQA pool were chosen for our test set and all
SWB training sets were included in our training set.
• Many of the databases are using the same reference signals.
MODEL TRAINING AND RESULTS

• To train the model , first calculating the per-frame similarity between the
degraded and the original signal with POLQA v2 in the SQuadAnalyzer
implementation.
• Then we aligned the per-frame similarity with the spectrogram segments,
using a nearest neighbor interpolation.
CORRELATION DIAGRAM OF THE BEST- AND WORST-CASE
RESULTS
CONCLUSION
• Here presented a new non-intrusive speech quality assessment model NISQA for SWB
transmission.
• We showed that the proposed model is able to give good prediction results over the same
test set that was used for the P.OLQA validation, with an average RMSE*3rd of 0.29 and
a worst case RMSE*3rd of 0.37.
• NISQA is able to predict the speech quality of packet loss concealment conditions of
modern speech codecs.
REFERENCES

• ITU-T Rec. P863, “Perceptual objective listening quality assessment,”.


• D. Kim and A. Tarraf, “Anique+: A new american national standard for non-intrusive
estimation of narrowband speech quality,” Bell Labs Technical Journal, vol. 12, no. 1, pp.
221–236, Spring 2007.
• Szu wei Fu, Yu Tsao, Hsin-Te Hwang, and HsinMin Wang, “Quality-net: An end-to-end
non-intrusive speech quality assessment model based on BLSTM,” in Proc. Interspeech
2018, 2018, pp. 1873–1877.

You might also like