Module - 3 - Auto Encoders

You might also like

You are on page 1of 13

Module–III: Auto Encoders

Overview on auto-encoders, note on biases, training an auto encoder, over complete


hidden layers, sparse auto encoders, de-noising auto encoders, contractive auto-
encoders, stacked auto-encoders, deep auto encoders, building an auto-encoder, tuning
and optimizing and applications of auto encoders

https://www.simplilearn.com/tutorials/deep-learning-tutorial/what-are-
autoencoders-in-deep-learning#types_of_autoencoders
What Are Autoencoders?

Autoencoders are very useful in the field of unsupervised machine learning. You can
use them to compress the data and reduce its dimensionality.

The main difference between Autoencoders and Principle Component Analysis (PCA)
is that while PCA finds the directions along which you can project the data with
maximum variance, Autoencoders reconstruct our original input given just a
compressed version of it.

If anyone needs the original data can reconstruct it from the compressed data using
an autoencoder.

Architecture

An Autoencoder is a type of neural network that can learn to reconstruct images,


text, and other data from compressed versions of themselves.

The typical flow within an autoencoder:

1. Encoder: The encoder part of the autoencoder takes the original data as
input and processes it through a series of layers, reducing its dimensionality
and capturing essential features. It produces the encoded representation in
the latent space.
2. Latent Space: The encoded representation is a compressed version of the
original data, residing in a lower-dimensional latent space. This
representation encodes the essential information needed for reconstruction.
3. Decoder: The decoder part of the autoencoder takes the encoded
representation from the latent space and transforms it through another series
of layers to reconstruct the original data. The decoder's output should ideally
closely resemble the original input.

So, indeed, the decoder plays a crucial role in the reconstruction of the original data
from the compressed data, and together with the encoder, it forms the complete
autoencoder architecture.
Representation of Autoencoder Architecture:

L1, L2 (Encoder Layers): These are usually the first and second layers in the
encoder part of the autoencoder. They are responsible for processing the input data
and gradually reducing its dimensionality. The specific operations and number of
neurons in each of these layers can vary depending on the architecture of the
autoencoder.

L3 (Latent Space Layer): L3 typically represents the layer in the encoder that
produces the encoded or latent representation of the input data. This is where the
dimensionality of the data is significantly reduced, and the compressed
representation is formed. It's the bottleneck layer of the autoencoder.

L4, L5 (Decoder Layers): These layers are part of the decoder section of the
autoencoder. They take the encoded representation from the latent space and
progressively reconstruct the original data. L4 and L5 gradually increase the
dimensionality of the data to match the dimensions of the input data.

The specific architecture and number of neurons in each layer can vary widely
depending on the autoencoder's design and the task it's intended for. The labels "L1,"
"L2," etc., are just identifiers to distinguish different layers within the encoder and
decoder, and the actual neural network architecture can be much more complex, with
many more layers and neurons.

Training Autoencoders

When you're building an autoencoder, there are a few things to keep in mind.

1. First, the code or bottleneck size is the most critical hyperparameter to tune
the autoencoder. It decides how much data has to be compressed. It can also
act as a regularisation term.
2. Secondly, it's important to remember that the number of layers is critical
when tuning autoencoders. A higher depth increases model complexity, but a
lower depth is faster to process.
3. Thirdly, you should pay attention to how many nodes you use per layer. The
number of nodes decreases with each subsequent layer in the autoencoder as
the input to each layer becomes smaller across the layers.
4. Finally, it's worth noting that there are two famous losses for reconstruction:
MSE Loss (Mean Squared Error (MSE) Loss and L1 Loss ((Mean Absolute
Error or MAE). They are commonly used loss functions for measuring the
difference between the reconstructed output of an autoencoder and the
original input data.

Nodes in each layer

The number of nodes (neurons) in each layer of an autoencoder, including the


encoder and decoder, plays a crucial role in determining the model's capacity and its
ability to learn a meaningful representation of the data.

Here's why we need these nodes and why they often decrease in number across
layers:

• Feature Extraction: The nodes in the first few layers of the encoder are
responsible for extracting relevant features from the input data. Having more
nodes allows the model to capture a broader range of features. These features
become increasingly abstract as you move deeper into the network.
For example, in an image autoencoder, the initial layers may capture
basic shapes and colors, while deeper layers might capture higher-level
patterns or objects.
• Dimensionality Reduction: As you progress through the encoder, the
number of nodes typically decreases. This reduction in dimensionality is a key
aspect of autoencoders. It forces the model to compress the information in the
data into a lower-dimensional representation (the latent space). This
dimensionality reduction is a form of data compression and feature selection.
• Bottleneck Layer: The bottleneck layer (the layer with the fewest nodes) is
often where the most critical information is concentrated. It's the compressed
representation of the data in the latent space. The reduced dimensionality in
this layer encourages the model to capture only the most salient features and
patterns in the data.
• Overfitting Prevention: Limiting the number of nodes in each layer can
help prevent overfitting, which occurs when the model learns to memorize the
training data rather than generalize from it. By reducing the capacity of the
model as you move deeper into the network, you encourage it to learn a more
compact and generalizable representation.

In summary, the number of nodes in each layer of an autoencoder is carefully chosen


to strike a balance between capturing relevant features and patterns in the data and
achieving dimensionality reduction. This design helps autoencoders create
informative, compressed representations while preventing overfitting and
maintaining the model's ability to generalize to new, unseen data.

Loss Functions:

Mean Squared Error (MSE) loss and L1 loss (also known as Mean Absolute Error or
MAE) are commonly used loss functions for measuring the difference between the
reconstructed output of an autoencoder and the original input data. They are used in
training autoencoders, but they have different characteristics:
Mean Squared Error (MSE) Loss:

Formula: MSE loss is computed as the average of the squared differences between
the corresponding elements of the original input and the reconstructed output.
Mathematically, for a single data point, it's calculated as:

MSE = (1/n) * Σ(original_i - reconstructed_i)^2

Here, "original_i" and "reconstructed_i" represent the values of the i-th data point in
the original and reconstructed data, respectively, and "n" is the total number of data
points.

Properties: MSE loss heavily penalizes large errors. Squaring the differences
magnifies the impact of outliers, making it sensitive to extreme values. This makes it
suitable for tasks where you want the autoencoder to pay strong attention to
reducing large errors, but it can sometimes be overly sensitive to outliers.

L1 Loss (Mean Absolute Error or MAE) Loss:

Formula: L1 loss is computed as the average of the absolute differences between


the corresponding elements of the original input and the reconstructed output.
Mathematically, for a single data point, it's calculated as:

L1 Loss = (1/n) * Σ|original_i - reconstructed_i|

Here, "original_i" and "reconstructed_i" represent the values of the i-th data point in
the original and reconstructed data, respectively, and "n" is the total number of data
points.

Properties: L1 loss is less sensitive to outliers compared to MSE because it doesn't


magnify errors through squaring. It's more robust when dealing with data that may
have extreme values or when you want to encourage the autoencoder to focus on
reducing errors in a balanced way.
The choice between MSE and L1 loss depends on the specific requirements of your
autoencoder task:

• Use MSE loss when you want the model to strongly penalize large errors and
when the distribution of errors is approximately Gaussian (bell-shaped). MSE
can be appropriate for tasks like image reconstruction.
• Use L1 loss when you want a more robust loss function that is less affected by
outliers and when you want a more balanced emphasis on all errors. L1 loss is
often used in scenarios where errors might have a skewed distribution.
• In practice, you can experiment with both loss functions to see which one
works better for your particular autoencoder application.

Types of Autoencoders

1. Under Complete Autoencoders

• Under complete autoencoders is an unsupervised neural network that you can


use to generate a compressed version of the input data.
• It is done by taking in an image and trying to predict the same image as
output, thus reconstructing the image from its compressed bottleneck region.
• The primary use for autoencoders like these is generating a latent space or
bottleneck, which forms a compressed substitute of the input data and can be
easily decompressed back with the help of the network when needed.
Use Cases:
• Undercomplete autoencoders are often used for feature extraction and data
compression tasks, where you want to reduce the dimensionality of the data
while preserving essential information.

2. Sparse Autoencoders

• Sparse autoencoders are controlled by changing the number of nodes at each


hidden layer.
• Since it is impossible to design a neural network with a flexible number of
nodes at its hidden layers, sparse autoencoders work by penalizing the
activation of some neurons in hidden layers.
• It means that a penalty directly proportional to the number of neurons
activated is applied to the loss function.
• As a means of regularizing the neural network, the sparsity function prevents
more neurons from being activated.

There are two types of regularizers used:

1. The L1 Loss method is a general regularizer we can use to add magnitude


to the model.

2. The KL-divergence method considers the activations over a collection of


samples at once rather than summing them as in the L1 Loss method. We
constrain the average activation of each neuron over this collection.

3. Contractive Autoencoders

• The input is passed through a bottleneck in a contractive autoencoder and


then reconstructed in the decoder. The bottleneck function is used to learn a
representation of the image while passing it through.
• The contractive autoencoder also has a regularization term to prevent the
network from learning the identity function and mapping input into output.
• To train a model that works along with this constraint, we need to ensure that
the derivatives of the hidden layer activations are small concerning the input.
Use Cases:
• Contractive autoencoders are useful when you want to enforce a model to
learn meaningful, stable representations and reduce overfitting.
• They are commonly employed in scenarios where feature learning is crucial,
such as denoising and image reconstruction tasks.

4. Denoising Autoencoders

• Have you ever wanted to remove noise from an image but didn't know where
to start? If so, then denoising autoencoders are for you!
• Denoising autoencoders are similar to regular autoencoders in that they take
an input and produce an output. However, they differ because they don't have
the input image as their ground truth. Instead, they use a noisy version.
• It is because removing image noise is difficult when working with images.
• You'd have to do it manually. But with a denoising autoencoder, we feed the
noisy idea into our network and let it map it into a lower-dimensional
manifold, where filtering out noise becomes much more manageable.
• The loss function usually used with these networks is L2 or L1 loss.

5. Variational Autoencoders

• Variational autoencoders (VAEs) are models that address a specific problem


with standard autoencoders. When you train an autoencoder, it learns to
represent the input just in a compressed form called the latent space or the
bottleneck. However, this latent space formed after training is not necessarily
continuous and, in effect, might not be easy to interpolate.
• Variational autoencoders deal with this specific topic and express their latent
attributes as a probability distribution, forming a continuous latent space that
can be easily sampled and interpolated.

Example:
Standard Autoencoder (AE):
Imagine you have a collection of colourful balloons of various shapes and
sizes. You want to represent these balloons in a compressed form, so you use a
standard autoencoder. The autoencoder takes pictures of these balloons and
learns to encode them into a compact storage space where each balloon is
represented as a unique point. However, the encoded points don't have a
specific order or relationship in this compressed space. They are like balloons
randomly scattered in a room.

Variational Autoencoder (VAE):


Now, consider using a Variational Autoencoder (VAE) for the same
task. Instead of simply compressing the balloons into points, the VAE
represents each balloon as a cloud (it represents a probability distribution in
the latent space) of balloons.
Each cloud represents the probability distribution of where the balloon
could be located in the room. This means that in the VAE's latent space, each
point is not just a single balloon but a cloud of balloons with a defined center
and spread. These clouds form a continuous and smooth distribution in the
room.

Real-Life Analogy:

• The standard autoencoder compresses the balloons into points without any
specific arrangement or smooth transitions between them.
• The Variational Autoencoder (VAE) represents the balloons as clouds,
creating a continuous and orderly arrangement of balloons in the room.
In this analogy, the VAE's continuous latent space allows for easy
interpolation between balloons. You can generate new balloon images by
sampling from these clouds in the latent space, effectively creating a variety of
balloons with attributes that smoothly transition from one to another.

Another Analogy:
For example, consider a VAE trained on images of faces. If you have
two latent representations, one corresponding to a smiling face and another
corresponding to a neutral face, interpolation in the latent space would allow
you to generate new representations that smoothly transition from a neutral
face to a smiling face. The resulting images would be faces with smiles of
varying degrees.
Interpolation in the latent space is a powerful feature of VAEs because
it enables the generation of new data points that are not present in the original
dataset but are consistent with the learned data distribution. This can be used
for creative tasks like image generation, data synthesis, and exploring the
potential variations within the data manifold.

The steps include:

• Training Phase: During the training phase of a VAE, you feed


individual data points (e.g., images of faces) to the encoder, which
maps each data point to a point in the shared latent space. These points
are the latent representations (latent codes) for the input data.
• Shared Latent Space: All data points, regardless of whether they are
images of smiling faces, neutral faces, or any other type of data, share
the same latent space. This shared latent space is where the VAE learns
to represent the essential features of the data in a continuous and
smooth manner.
• Interpolation: To perform interpolation, you choose two latent
representations from the shared latent space—one for a neutral face
and another for a smiling face. You then interpolate between these two
latent representations in the shared latent space to generate new latent
representations that smoothly transition from one to the other.
• Generation: Finally, you use the interpolated latent representations
to generate new images by passing them through the VAE's decoder.
The resulting images are a blend of the characteristics of the original
input images.

So, to clarify, there is a single shared latent space in a VAE, and the
interpolation process occurs within this shared space to generate new data
points that are a mix of the features learned from the entire dataset.

This continuous latent space is one of the key advantages of VAEs, making
them useful for generating diverse and smooth variations of data, such as
generating realistic faces with varying expressions or handwriting with
different styles.
Use Cases

Autoencoders have various use-cases like:

• Anomaly detection: Example: Imagine you're a bank security analyst.


Your job is to spot unusual activity in financial transactions. Autoencoders
are like detectives trained to identify suspicious behavior. They study
typical transaction patterns and can flag anything that doesn't fit the norm.
For instance, if someone suddenly spends a large amount in an unfamiliar
location, the autoencoder might raise a red flag, helping prevent fraud.

• Data denoising image and audio: Example: Think of autoencoders as


audio editors or Photoshop for pictures. Just like you use photo-editing
software to remove unwanted elements from photos, autoencoders can
clean up noisy images and audio. For instance, if you have an old, scratched
vinyl record or a blurred photograph, autoencoders can help restore them
to their former glory.

• Image inpainting: Example: Imagine you have a jigsaw puzzle, but a few
pieces are missing. Autoencoders work like puzzle-solving experts. They
study the picture on the existing pieces and use that information to fill in
the gaps accurately. So, if you're trying to restore an old family portrait with
torn edges, autoencoders can help complete the missing parts.

• Information retrieval (Content-Based Image Search): Example:


Consider you're building a visual search engine, much like Google Images.
Autoencoders serve as the search engine's memory. They've learned what
various images look like and can find similar images for you. For instance, if
you want to find pictures of different types of dogs, the autoencoder-
powered search can quickly locate relevant dog images based on their
content.
Questions and Answers:

1. What are autoencoders used for?

Using an autoencoder, you can compress input data, encode it, and then reconstruct
the data in a different format to reduce dimensionality. Autoencoders, help you focus
on only the most critical areas of your data.

2. How do autoencoders work?

Autoencoders are neural networks that you can use to compress and reconstruct
data. The encoder compresses input, and the decoder attempts to recreate the
information from this compressed version.

3. Is autoencoder a CNN?

Convolutional Autoencoders are autoencoders that use CNNs in their


encoder/decoder parts. The term “convolutional” refers to convolving an image with
a filter to extract information, which happens in a CNN.

4. Is the autoencoder supervised or unsupervised?

Autoencoders can be used to learn a compressed representation of the input.


Autoencoders are unsupervised, although they are trained using supervised learning
methods.

5. When should we not use autoencoders?

An autoencoder could misclassify input errors that are different from those in the
training set or changes in underlying relationships that a human would notice.
Another drawback is you may eliminate the vital information in the input data.

You might also like