Professional Documents
Culture Documents
A Brief Overview
The target of this model is such that the Input is equivalent to the
Reconstructed Output. To achieve this we minimize a loss function
named Reconstruction Loss. Basically, Reconstruction Loss is given
by the error between the input and the reconstructed output. It is
usually given by the Mean Square error or Binary Crossentropy
between the input and reconstructed output. Binary Crossentropy is
used if the data is binary.
Undercomplete Autoencoder
The first model is the decoder, the second is the full autoencoder and
the third is the encoder model. The bottleneck layer is the place where
the encoded image is generated.
We use the autoencoder to train the model and get the weights that can
be used by the encoder and the decoder models.
If we send image encodings through the decoders, we will see that the
images are reconstructed back.
Image by author
The upper row is the original images and the lower row is the images
created from the encodings by the decoder.
Now, the images are of dimensions 28x28, and we have created
encodings of dimensions of 32. if we represent the encodings as 16x2, it
will look something like this:
Image by author
As we can see here, we have built a very shallow network, we can build
a deep network, as the shallow networks may not be able to uncover all
the underlying features, but we need to be very careful about
restricting the number of hidden nodes.
input_l=Input(shape=(784,))encoding_1=Dense(256, activation='relu')(input_l)
encoding_2=Dense(128, activation='relu')(encoding_1)bottleneck=Dense(32, activation='relu')
(encoding_2)decoding_1=Dense(128, activation='relu')(bottleneck)
decoding_2=Dense(256, activation='relu')(decoding_1)output_l=Dense(784, activation='sigmoid')
(decoding_2)autoencoder=Model(inputs=[input_l],outputs=[output_l])encoder=Model(inputs=[input_
l],outputs=[bottleneck])encoded_input=Input(shape=(32,))decoded_layer_1=autoencoder.layers[-3]
(encoded_input)
decoded_layer_2=autoencoder.layers[-2](decoded_layer_1)decoded=autoencoder.layers[-1]
(decoded_layer_2)decoder=Model(inputs=[encoded_input],outputs=[decoded])
Sparse Autoencoders
When we were talking about the undercomplete autoencoders, we told
we restrict the number of nodes in the hidden layer to restrict the data
flow. But often this approach creates issues because the limitation on
the nodes of the hidden layers and shallower networks prevent the
neural network to uncover complex relationships among the data
items. So, we need to use deeper networks with more hidden layer
nodes. Again, if we use more hidden layer nodes, the network may just
memorize the input and overfit, which will make our intentions void.
So, to solve this we use regularizers. The regularizers prevent the
network from overfitting to the input data and prevent the
memorization problem.
Now, one thing to note is, the activations are dependent on the input
data ad will change with the change in input. So, we let our model
decide the activations and penalize their activation values. We usually
do this in two ways:
L1 Regularization: L1 regularizers restrict the activations as
discussed above. It forces the network to use only the nodes of the
hidden layers that handle a high amount of information and block the
rest.
It is given by:
Now, the question is how does the KL divergence help. For this, we will
need to know what is a Bernoulli Distribution.
Image by author
The above image shows the light red nodes do not fire.
Denoising Autoencoders
Image by author
Image by author
Contractive Autoencoders
There exists another type of autoencoders that are a bit different from
the above-stated ones which are called Variational Autoencoders.
Variational Autoencoders
Source
We have seen that the values of the latent attributes are always
discrete. This is where the variational autoencoders are different.
Instead of considering to pass discrete values, the variational
autoencoders pass each latent attribute as a probability distribution.
Some thing as shown below.
Source
Source
Source
The above image defines the situation. Now, to create a distribution for
each latent vector, the encoder in place of passing the value, pass the
mean and standard deviation of the distribution, which is used to
create construct the normal distribution.
Source
Now, as the z or the latent values are sampled randomly, they are
unknown and hence called hidden variables. Again, we know the goal
is such that our reconstructed output is equivalent to the input. So, our
goal is to find out what is the probability of a value to be in z or the
latent vector given that it is similar to x, P(z|x), because actually we
need to reconstruct x from z. In simpler words, we can see x but we
need to estimate z.
We obtain the above equation, using bayes theorem. This method
requires finding p(x), given by:
The first term is the reconstruction error and the second term is the KL
divergence between the two distributions. It ensures that distributions
are similar, as it minimizes the KL divergence to minimize the loss. We
use a trick called the Reparameterization Trick to resample latent
points from distributions without using back propagation.