Processing Background Noise in Neural Speech Generation: (Domain: Machine Learning)

PROCESSING
BACKGROUND NOISE IN NEURAL

SPEECH GENERATION (domain : machine learning)
GUIDED BY: TEAM:

HAFEEZ UDDIN M .MADHURI (18K91A05D3)
BATCH NUMBER:C14 P . ANIRUDH (18K91A05G1)
PROJECT NUMBER:53 R .VARDHAN (19K95A0515)
R .MAHESH (18K91A05H9)
ABSTRACT
 Recent advances in neural-network based modeling of speech has shown great potential
for speech coding.
 However, the performance of such models drops when the input is not clean speech , in
the presence of background noise.
 Placing a denoising preprocessing stage to target clean speech during training is shown to
be the best performing strategy.
INTRODUCTION
 Coding real-world speech signals with a neural vocoder requires solving a number of
problems simultaneously.
 The difficulty of coding noisy signals can perhaps be explained by the signal structure,
where the signal to be coded is the sum of a clean speech signal and an interfering signal
 The autoregressive architecture is a good match but the addition of a second signal
removes this match.
 Resisting the urge to increase the model size we have to find out strategy to improve
robustness without changing the network configuration.
NOISE HANDLING STRATEGIES
 To establish best-practices for handling noise in a neural vocoder system we have the
different strategies.
 It is common practice to apply augmentations to training data .
 This allows the model to train on a wider variety of realistic scenarios in order to get
effective results.
 As an alternative, we can include a denoising model in the encoder.
 we denote different training methods as X2Y, where X describes the conditioning input
and Y describes the autoregressive input. The regimes we consider are:
 • c2c: ‘Clean-to-Clean:’ Trained with clean audio for both the conditioning and the
autoregressive inputs.
 • n2n: ‘Noisy-to-Noisy:’ Trained with noisy inputs. This noisy data is used both for
conditioning and the autoregressive inputs.
 • dc2c: ‘Denoised Clean-to-Clean:’ The same training method as c2c, but applies a
denoiser during inference.
 • dn2n: ‘Denoised Noisy-to-Noisy:’ The same training method as n2n, but applies a
denoiser during inference.
INPUT:
A speech signal with a combination noise and clean speech.

OUTPUT:
 A clean speech without any disturbance.

ALGORITHM:
 RECURRENT NEURAL NETWORK :

 Recurrent Neural Network(RNN) are a type of Neural Network where the output from
previous step are fed as input to the current step. In traditional neural networks, all the inputs
and outputs are independent of each other, but in cases like when it is required to predict the next
word of a sentence, the previous words are required and hence there is a need to remember the
previous words. Thus RNN came into existence, which solved this issue with the help of a Hidden
Layer. The main and most important feature of RNN is Hidden state, which remembers some
information about a sequence.
 RNN have a “memory” which remembers all information about what has been calculated. It uses
the same parameters for each input as it performs the same task on all the inputs or hidden layers
to produce the output. This reduces the complexity of parameters, unlike other neural networks.
 Training through RNN
1. A single time step of the input is provided to the network.
2. Then calculate its current state using set of current input and the previous state.
3. The current ht becomes ht-1 for the next time step.
4. One can go as many time steps according to the problem and join the information from all the
previous states.
5. Once all the time steps are completed the final current state is used to calculate the output.
6. The output is then compared to the actual output i.e the target output and the error is generated.
7. The error is then back-propagated to the network to update the weights and hence the network
(RNN) is trained.
 Given an input speech signal containing noise.
 We need to find the point where we the clean speech with the correct accuracy is
generated.
 Hence the output with clean speech is generated without any back ground disturbance.

Processing Background Noise in Neural Speech Generation: (Domain: Machine Learning)

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Processing Background Noise in Neural Speech Generation: (Domain: Machine Learning)

Uploaded by

Copyright:

Available Formats

PROCESSING

BACKGROUND NOISE IN NEURAL

GUIDED BY: TEAM:

A speech signal with a combination noise and clean speech.

 A clean speech without any disturbance.

 RECURRENT NEURAL NETWORK :

You might also like