You are on page 1of 36

Group 17

EURECOM AML 2022:: Challenge 2


Anomalous Sound Detection
CHALLENGE:
Anomalous Sound Detection
Anomalous Sound Detection

How to solve it?


1. Data exploration
Introduction
Wave form
Spectograms
Chromagram
Mel scale
2. Model selection: Auto Encoder
CNN
Batch consideration
3. Model performance
Data exploration: introduction
Each recording is a single-channel (approximately) 10-sec length audio that
includes both a target machine's operating sound and environmental noise.

The data come from ToyADMOS and the MIMII Dataset consisting of the
normal/anomalous operating sounds of six types of toy/real machines.

We will work only on Slide rail.


Data exploration: introduction
Other types of toy/real machines:
Toy-car (ToyADMOS)
Toy-conveyor (ToyADMOS)
Valve (MIMII Dataset)
Pump (MIMII Dataset)
Fan (MIMII Dataset)
Slide rail (MIMII Dataset)
Data exploration: introduction
The audio files are time-series data, so they give information about the
amplitude of the sound over time.

To visualize the sequence, let's plot the waveplot for these signals is in the
next slide.
Data exploration: wave form
Data exploration: wave form
However, in deep learning models the common practice is to convert the
audio into a spectrogram which is:
a concise snapshot of an audio wave;
an image → well suited to being input to CNN-based architectures
developed for handling images.

The spectrograms of the two signals (normal and anomalous) are visible in the
next two slides
Data exploration: spectrogram
Data exploration: spectrogram
Data exploration: chromagram
Another way to get information about the differences of the two kind of signals
is the Chromagram.

It describes perceptual 'differences'/'distances' of pitches within an octave


and the perceptual sameness of pitches separated by one or more full octaves.

Pitch can be understood as a relative highness/lowness of a sound. The


higher the sound, the higher the pitch and the lower the sound, the lower the
pitch.
Data exploration: chromagram
Data exploration: chromagram
While normal sounds produce a
chromagram that spans all notes,
in the chromagrams of anomalous
sounds we note that the graph
focuses almost exclusively on notes
B and G.

This is a confirmation that


anomalous sounds are characterized
by higher frequencies.
Data exploration: Mel scale
Humans are better at detecting differences in lower frequencies than higher
frequencies.
For example, we can easily tell the difference between 500 and 1000 Hz
but we will hardly be able to tell a difference between 10,000 and 10,500 Hz

The mel scale is a scale of pitches judged by listeners to be equal in distance


one from another.

We can use the librosa library to produce a linear transformation matrix,


then use it to plot a new spectrogram (next two slides).
Data exploration: Mel scale
Data exploration: Mel scale
Data exploration: test vs train
In this challenge it is
important to take
into consideration
the fact that the
train is composed
only of normal
sounds, while the
test of anomalous
sounds as can be
seen from the
spectograms.
Model selection: Autoencoder
Autoencoder

CNN Autoencoder
2 flavour
Model selection: Autoencoder
In this challenge the difficulty lies in learning unlabeled data.

Fortunately, this can be solved through the use of the autoencoder.

Autoencoder is an unsupervised artificial neural network that learns


how to efficiently compress and encode data
then learns how to reconstruct the data back from the reduced
encoded representation to a representation that is as close to the
original input as possible.
Model selection: Autoencoder
The autoencoder method works very well with anomaly detection because
the encoding operation relies on the correlated features to compress the
data.
Model selection: Autoencoder
As mentioned before, we will use spectrogram in our CNN AE.

To get CNN to process them, we defined and fixed some parameters of


the spectrogram:
n_mels;
n_ftt;
hop_length

That way, we were able to get as much information as possible.


Model selection: Autoencoder
Model selection: Autoencoder
But why Are the Convolutional Autoencoders Suitable for Image Data?

Instead of stacking the data, the Convolution Autoencoders keep the


spatial information of the input image data as they are

and extract information in what is called the Convolution layer.


Model selection: Autoencoder
Noise reduction
The idea is to train a model with noisy data as the inputs, and their
respective clear data the output. Here we can see a normal signal:
Model selection: Autoencoder
Noise reduction
While here we have an anomalous signal
Model selection: Autoencoder
It involves the:
Encoder
Conv2d (Convolutional layer)
BatchNorm2d (Batch normalization)
ReLU (Activation layer)
Latent space
Decoder
ConvTranspose2d (Transposed Convolutional layer)
BatchNorm2d (Batch normalization)
ReLU (Activation layer)
Model selection: Autoencoder
The encoding process compresses the input values to get to the core layer.
The decoding process reconstructs the information to produce the
outcome.
The decoding process mirrors the encoding process
Model selection: Autoencoder
Noise reduction: convolutional layers
Creates many small pieces called the feature maps or features;
These squares preserve the relationship between pixels in the input
image.
After scanning through the original image, each feature produces a
filtered image with high scores and low scores:

perfect match high score in that square;

low match or no match low or zero score.
Hyperparameters:
Padding
Strides
Model selection: Autoencoder
Noise reduction: convolutional layers hyperparameters
Padding:
Controls the kernel size and the output size in an "independent" way.
Without padding some part of the input image will receive “less
attention” than central parts.
In some other cases we would like to preserve the size of the input
and if we use a convolutional filter without padding we are doomed to
reduced output size of the filter, so the output size of the filter will be
smaller than the input image.
One way to solve it is to add some fake pixels that contributes to this
solution.
Model selection: Autoencoder
Noise reduction: convolutional layers hyperparameters
Padding:
Model selection: Autoencoder
Noise reduction: convolutional layers hyperparameters
Strides:
The idea is to we some position of the kernel when it slides on top of
the image
useful to reduce the computational cost.
extraction of the feature more coarse, less fined-grained.
Model selection: Autoencoder
Noise reduction: ReLUs
Rectified Linear Unit (ReLU);

Is the step that is the same as the step in the typical neural networks;

It rectifies any negative value to zero so as to guarantee the math will


behave correctly.
Model selection: Autoencoder
Batch normalization
Batch normalization it’s important, especially with networks that have no
shortcut connections because it helps to smooth out the geometry of the
loss function at the expense of some computation

It makes the loss landscape significantly more smooth;

Larger range of learning rates;

Faster convergence.
Model selection: Autoencoder
Batch size
When we do optimization we consider the whole data to be used for the
optimization. This can be very costly. So instead we can consider a portion of
the data. This is what we call mini-batch.

Small batches promote flatness of the loss.

Flat minimizers correlate well to smaller test error.

Smaller test error is a good proxy for good generalization.


Model performances
We tried three flavour of autoencoder:
Vanilla (?) Autoencoder
CNN Autoencoder
CNN Autoencoder with batch normalization and mini-batch

Considering what we said before, it is quite obvioust that the CNN


Autoencoder with batch normalization is the one that performed the best.

Accuracy:
Autoencoder 76%
CNN Autoencoder 88%
CNN Autoencoder Batch Normalization 92%
THE TEAM

Nour Thlijani Giulio Corallo Yash Agarwalla Valentina Lonardo

You might also like