You are on page 1of 2

Embedding – projecting large input vectors onto another space so that machine learning is made

easy. It is done to treat large sparse inputs.

Principal component analysis – selection of features that best describe the data present at hand.
This is a technique of dimensionality reduction and helps to simplify the machine learning process.
But it is highly ineffective in capturing the non-linear or piecewise linear relationships in the dataset.

Here is a catch. If the number of parameters is large in number then, the training time and
complexity of the model will be large. But if the number of parameters are very low in number
then, the model will overfit and cannot be used for other datasets.

Regularization techniques – L2, L1, Dropout,

Optimization techniques – AdaGrad, Adam, RmsProp

Normalization techniques – these are used to reduce the covariance shifts in the hidden layers.
These shifts make it hard for the model to be generalized. Using the techniques can also help to
increase the learning rates as they make sure no activation is very high or low. One of the techniques
is batch normalization.

Batch normalization – the output of the previous layer is normalized before it is fed into the next
layer (subtracting the batch mean and dividing by standard deviation). But wouldn’t it affect the
weights of the next layer? To counter this, batch norm adds two parameters to the normalized
output, multiplying it with the standard deviation and adding the batch mean. Now, the stochastic
gradient descent (SGD) can optimize the layer weights by changing these parameters and not all the
weights.

https://miro.medium.com/max/547/1*Hiq-rLFGDpESpr8QNsJ1jg.png

Autoencoder – the intuition is similar to that of PCA. The input is reduced into low dimensions. This
part of the network is called encoder. But there is also a decoding part where the initial input is
reconstructed using the encoder. It’s like reducing the dimensions using the neural network. Now
the flow of development of an autoencoder is

INPUT -> ENCODER -> DECODER -> OUTPUT/RECONSTRUCTION

Batch normalization is used to increase the learning rate during the training. Similar to other models,
there is a need of an objective function to measure the performance. The objective function in this
case is the average of the below function over the minibatch.

|| I-O || = ROOT (SUM (Ii – Oi)2)

Denoising – it is used to make the model robust towards noise. This is done with denoising
autoencoder. It is similar to the vanilla autoencoder, but the input pixels are corrupted and then the
model is forced to interpolate the corrupted pixels to reconstruct the original image. How does this
work?

Suppose there is two-dimensional dataset with different labels. Now we select a set of points of a
particular label and use this subset for training. Now, here we have considered that for a particular
label, there has to be a structure. This structure is called manifold. This manifold is the structure that
the autoencoder implicitly learning to reconstruct the original data. Autoencoder must figure out
that a point belongs to which manifold.
Interpretability – one of the problems with deep learning. It is the property of the model that
measures how easy it is inspect the process and the output. Since the number of parameters and the
hidden layers is very high, interpretability takes a hit. It affects the use cases of the model. For eg: if
the neural network predicts the presence of cancer cell, the doctor might want to inspect the
process through which the model predicts the probabilities.

Word2Vec Framework – two strategies: Continuous Bag Of Words, Skip-Gram Model

CBOW – takes the context as input and tries to predict the target word

Skip-Gram – takes the target word as the input and tries to predict other word in the context

You might also like