DL - Unit - 1 - Foundations of Deep Learning

SPPU FOP B.
E COMPUTER(2022)
DEEP LEARNING
Mr.Jameer Kotwal
Asst. Prof.
Pimpri Chinchwad College Of Engg. & Research, Ravet.
1
UNIT 1: Foundations of Deep learning
Topics:
1) What is machine learning and deep learning?
2) Bias variance tradeoff.
3) Hyper parameters, under/over fitting regularization.
4) History of deep learning.
5) Common Architectural Principles of Deep Network.
6) Architecture Design
2
What is machine learning and deep learning
•Deep learning is a subtype of machine learning. With machine learning, you manually extract the
relevant features of an image. With deep learning, you feed the raw images directly into a deep
neural network that learns the features automatically.
•Deep learning often requires hundreds of thousands or millions of images for the best results. It’s
also computationally intensive and requires a high-performance GPU.
Machine Learning Deep Learning
+ Good results with small – Requires very large data

data sets sets
+ Quick to train a model – Computationally intensive
– Need to try different + Learns features and

features and classifiers classifiers automatically
to achieve best results
– Accuracy plateaus + Accuracy is unlimited
3
History Of Deep Learning
The history of deep learning dates back to 1943 when Warren McCulloch and Walter Pitts
created a computer model based on the neural networks of the human brain. Warren
McCulloch and Walter Pitts used a combination of mathematics and algorithms they
called threshold logic to mimic the thought process.
The development of the basics of a continuous Back Propagation Model is credited to

Henry J. Kelley in 1960. Stuart Dreyfus came up with a simpler version based only on the
chain rule in 1962. The concept of back propagation existed in the early 1960s but only
became useful until 1985.
4
5
Deep learning
•Deep learning is a type of machine learning in which a model learns to perform classification
tasks directly from images, text, or sound. Deep learning is usually implemented using a neural
network architecture. The term “deep” refers to the number of layers in the network—the more
layers, the deeper the network. Traditional neural networks contain only 2 or 3 layers, while
deep networks can have hundreds.
•Deep learning is a subset of machine learning that uses an algorithm inspired by the human
brain. It learns from experiences. Its main motive is to simulate human-like decision making.
6
How deep learning works in three
figures
• The inspiration for deep learning is the way that the human brain filters the information. Its
main motive is to simulate human-like decision making. Neurons in the brain pass the signals
to perform the actions. Similarly, artificial neurons connect in a neural network to perform
tasks clustering, classification, or regression. The neural network sorts the unlabeled data
according to the similarities of the data. That’s the idea behind a deep learning algorithm.
• Neurons are grouped into three different types of layers:
• a) Input layer
b) Hidden layer
c) Output layer
7
Deep learning Application
Here are just a few examples of deep learning at work: A self-driving vehicle slows down as it
approaches a pedestrian crosswalk.
• An ATM rejects a counterfeit bank note.
• A Smartphone app gives an instant translation of a foreign street sign.
• Deep learning is especially well-suited to identification applications such as face recognition,
text translation, voice recognition, and advanced driver assistance systems, including, lane
classification and traffic sign recognition
8
BIBLIOT
ECA
LIB A Y
R R
1 2 3 CANCEL
4 5 6 CLEAR
7 8 9 ENTER
9
In a word, accuracy. Advanced tools and techniques have dramatically improved deep learning
algorithms—to the point where they can outperform humans at classifying images, win against
the world’s best GO player, or enable a voice-controlled
UCLA researchers built an advanced microscope that yields a high-dimensional data set used to
train a deep learning network to identify cancer cells in tissue samples.
assistant like Amazon Echo® and Google Home to find and download that new song you like.
10
UNIT 1: Foundations of Deep learning
UCLA researchers built an advanced microscope that yields a high-dimensional

data set used to train a deep learning network to identify cancer cells in tissue
samples.
11
Three technology enablers make this degree of accuracy possible:
Easy access to massive sets of labeled data
Data sets such as ImageNet and PASCAL VoC are freely available, and are useful for training on
many different types of objects.
Increased computing power

High-performance GPUs accelerate the training of the massive amounts of data needed for
deep learning, reducing training time from weeks to hours.
12
Pretrained models built by experts
Models such as AlexNet can be retrained to perform new recognition tasks using a technique
called transfer learning. While AlexNet was trained on 1.3 million high-resolution images to
recognize 1000 different objects, accurate transfer learning can be achieved with much smaller
datasets.
13
Deep Neural network
A deep neural network combines multiple nonlinear processing layers, using simple elements
operating in parallel and inspired by biological nervous systems. It consists of an input layer,
several hidden layers, and an output layer. The layers are interconnected via nodes, or neurons,
with each hidden layer using the output of the previous layer as its input.
14
How A Deep Neural Network Learns
How A Deep Neural Network Learns

Let’s say we have a set of images where each image contains one of four different categories of
object, and we want the deep learning network to automatically recognize which object is in each
image. We label the images in order to have training data for the network.
15
•Using this training data, the network can then start to understand the object’s specific features
and associate them with the corresponding category.
•Each layer in the network takes in data from the previous layer, transforms it, and passes it on.
The network increases the complexity and detail of what it is learning from layer to layer.
•Notice that the network learns directly from the data—we have no influence over what features
are being learned.
FILTERS
INPUTS
IUTPUT
16
Architecture of Neural Networks
Biological Neural Network Artificial Neural Network Fig:BNN

Fig:ANN
Dendrites Inputs
Cell nucleus Nodes
Synapse Weights
Axon Output
17
Deep Learning Architecture
18
Convolutional neural networks
• A CNN is a multilayer neural network that was biologically inspired by the animal visual
cortex. The architecture is particularly useful in image-processing applications. The first CNN
was created by Yann LeCun; at the time, the architecture focused on handwritten character
recognition, such as postal code interpretation. As a deep network, early layers recognize
features (such as edges), and later layers recombine these features into higher-level attributes of
the input.
• The LeNet CNN architecture is made up of several layers that implement feature extraction and
then classification (see the following image).
19
Recurrent neural networks
• The RNN is one of the foundational network architectures from which other deep learning
architectures are built. The primary difference between a typical multilayer network and a
recurrent network is that rather than completely feed-forward connections, a recurrent
network might have connections that feed back into prior layers (or into the same layer). This
feedback allows RNNs to maintain memory of past inputs and model problems in time.
• RNNs consist of a rich set of architectures (we'll look at one popular topology called LSTM
next). The key differentiator is feedback within the network, which could manifest itself from
a hidden layer, the output layer, or some combination thereof.
20
LSTM networks
• The LSTM was created in 1997 by Hochreiter and Schimdhuber, but it has grown in
popularity in recent years as an RNN architecture for various applications. You'll find LSTMs
in products that you use every day, such as smartphones. IBM applied LSTMs in IBM
Watson® for milestone-setting conversational speech recognition.
• The LSTM departed from typical neuron-based neural network architectures and instead
introduced the concept of a memory cell. The memory cell can retain its value for a short or
long time as a function of its inputs, which allows the cell to remember what's important and
not just its last computed value.
21
GRU networks
• In 2014, a simplification of the LSTM was introduced called the gated recurrent unit. This
model has two gates, getting rid of the output gate present in the LSTM model. These gates are
an update gate and a reset gate. The update gate indicates how much of the previous cell
contents to maintain. The reset gate defines how to incorporate the new input with the previous
cell contents. A GRU can model a standard RNN simply by setting the reset gate to 1 and the
update gate to 0.
22
Self-organized maps
• Self-organized map (SOM) was invented by Dr. Teuvo Kohonen in 1982 and was popularly
known as the Kohonen map. SOM is an unsupervised neural network that creates clusters of
the input data set by reducing the dimensionality of the input. SOMs vary from the traditional
artificial neural network in quite a few ways.
Next, in an SOM, no activation function is applied, and because there are no target labels to
compare against there is no concept of calculating error and back propogation.
23
Autoencoders
• Though the history of when autoencoders were invented is hazy, the first known usage of
autoencoders was found to be by LeCun in 1987. This variant of an ANN is composed of 3
layers: input, hidden, and output layers.
• First, the input layer is encoded into the hidden layer using an appropriate encoding function.
The number of nodes in the hidden layer is much less than the number of nodes in the input
layer. This hidden layer contains the compressed representation of the original input. The
output layer aims to reconstruct the input layer by using a decoder function.
24
Restricted Boltzmann Machines
• Though RBMs became popular much later, they were originally invented by Paul Smolensky
in 1986 and was known as a Harmonium.
• An RBM is a 2-layered neural network. The layers are input and hidden layers. As shown in
the following figure, in RBMs every node in a hidden layer is connected to every node in a
visible layer. In a traditional Boltzmann Machine, nodes within the input and hidden layer are
also connected. Due to computational complexity, nodes within a layer are not connected in
a Restricted Boltzmann Machine.
•During the training phase, RBMs calculate the

probabilty distribution of the training set using a
stochastic approach. When the training begins, each
neuron gets activated at random. Also, the model
contains respective hidden and visible bias. While the
hidden bias is used in the forward pass to build the
activation, the visible bias helps in reconstructing the
input.
•Because in an RBM the reconstructed input is
always different from the original input, they are also
known as generative models.
25
Deep belief networks
• The DBN is a typical network architecture, but includes a novel training algorithm. The DBN
is a multilayer network (typically deep and including many hidden layers) in which each pair
of connected layers is an RBM. In this way, a DBN is represented as a stack of RBMs.
• In the DBN, the input layer represents the raw sensory inputs, and each hidden layer learns
abstract representations of this input. The output layer, which is treated somewhat differently
than the other layers, implements the network classification. Training occurs in two steps:
unsupervised pretraining and supervised fine-tuning.
Full network training is then applied by

using either gradient descent learning or
back-propagation to complete the training
process.
26
Deep stacking networks
• The final architecture is the DSN, also called a deep convex network. A DSN is different from
traditional deep learning frameworks in that although it consists of a deep network, it's
actually a deep set of individual networks, each with its own hidden layers. This architecture
is a response to one of the problems with deep learning, the complexity of training. Each layer
in a deep learning architecture exponentially increases the complexity of training, so the DSN
views training not as a single problem but as a set of individual training problems.
The DSN consists of a set of modules, each of which is a

subnetwork in the overall hierarchy of the DSN. In one
instance of this architecture, three modules are created for
the DSN. Each module consists of an input layer, a single
hidden layer, and an output layer. Modules are stacked
one on top of another, where the inputs of a module
consist of the prior layer outputs and the original input
vector. This layering allows the overall network to learn
more complex classification than would be possible given
a single module.
27
Bias tradeoff
A supervised Machine Learning model aims to train itself on the input variables(X) in such a
way that the predicted values(Y) are as close to the actual values as possible. This difference
between the actual values and predicted values is the error and it is used to evaluate the model.
The error for any supervised Machine Learning algorithm comprises of 3 parts:
•Bias error
•Variance error
•The noise
While the noise is the irreducible error that we cannot eliminate, the other two i.e. Bias and
Variance are reducible errors that we can attempt to minimize as much as possible.
Code
We can make the following conclusions from the above plot:

For low values of k, the training score is high, while the testing score is low
As the value of k increases, the testing score starts to increase and the training score starts to
decrease.
However, at some value of k, both the training score and the testing score are close to each
other.
This is where Bias and Variance come into the picture.
28
Bias
Bias is the difference between the Predicted Value and the Expected Value.
In our model, if we use a large number of nearest neighbors, the model can totally decide that some
parameters are not important at all. For example, it can just consider that the Glusoce level and the
Blood Pressure decide if the patient has diabetes. This model would make very strong assumptions
about the other parameters not affecting the outcome. You can also think of it as a model predicting
a simple relationship when the datapoints clearly indicate a more complex relationship.
Mathematically, let the input variables be X and a target
variable Y. We map the relationship between the two using a
function f.
Therefore,
Y = f(X) + e
Here ‘e’ is the error that is normally distributed. The aim of

our model f'(x) is to predict values as close to f(x) as
when there is a high bias error, it results possible. Here, the Bias of the model is:
in a very simplistic model that does not Bias[f'(X)] = E[f'(X) – f(X)]
consider the variations very well. Since
it does not learn the training data very
29
well, it is called Underfitting.
VARIANCE
Variance is when the model takes into account the fluctuations in the data i.e. the noise as well.
So, what happens when our model has a high variance?
The model will still consider the variance as something to learn from. That is, the model learns too
much from the training data, so much so, that when confronted with new (testing) data, it is
unable to predict accurately based on it.
Mathematically, the variance error in the model is:

Variance[f(x))=E[X^2]−E[X]^2
Since in the case of high variance, the model learns too much from the training data, it is
called overfitting.
To summarise,
A model with a high bias error underfits data and makes very simplistic assumptions on it
A model with a high variance error overfits the data and learns too much from it
A good model is where both Bias and Variance errors are balanced 30
Bias-Variance Tradeoff
In our model, say, for, k = 1, the point closest to the datapoint in question will be considered.
Here, the prediction might be accurate for that particular data point so the bias error will be less.
However, the variance error will be high since only the one nearest point is considered and this
doesn’t take into account the other possible points. What scenario do you think this corresponds
to? Yes, you are thinking right, this means that our model is overfitting.
To achieve a balance between the Bias error and the Variance error, we need a value of k such
that the model neither learns from the noise (overfit on data) nor makes sweeping assumptions on
the data(underfit on data). To keep it simpler, a balanced model would look like this:
Though some points are classified incorrectly,

the model generally fits most of the datapoints
accurately. The balance between the Bias error
and the Variance error is the Bias-Variance
Tradeoff.
31
bulls-eye diagram explains the tradeoff better:
So, what do you think is the optimum value for k?

From the above explanation, we can conclude that the k for which
the testing score is the highest, and
both the test score and the training score are close to each other
is the optimal value of k. So, even though we are compromising on a lower training score, we still
32
get a high score for our testing data which is more crucial – the test data is after all unknown data.
Parameter and Hyper parameters
Model parameters are configuration variables that are internal to the model, and a model learns
them on its own. For example, W Weights or Coefficients of independent variables in the Linear
regression model. Weights or Coefficients of independent variables SVM, weight, and biases of a
neural network, cluster centroid in clustering.
Hyperparameters in Machine learning are those parameters that are explicitly defined by the
user to control the learning process. These hyperparameters are used to improve the learning of
the model, and their values are set before starting the learning process of the model.
Some examples of Hyperparameters in Machine Learning
•The k in kNN or K-Nearest Neighbour algorithm
•Learning rate for training a neural network
•Train-test split ratio
•Batch Size
•Number of Epochs
•Branches in Decision Tree
•Number of clusters in Clustering Algorithm
33
Case Study:
1) Boston Dynamics collaboration with hyundai
2) DeepMind-Acquired by google, AlphaFold, WaveNet, ChatGPT
3) AlphaGo- game
34
35

DL - Unit - 1 - Foundations of Deep Learning

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

DL - Unit - 1 - Foundations of Deep Learning

Uploaded by

Copyright:

Available Formats

SPPU FOP B.

Machine Learning Deep Learning

+ Good results with small – Requires very large data

– Need to try different + Learns features and

The development of the basics of a continuous Back Propagation Model is credited to

UCLA researchers built an advanced microscope that yields a high-dimensional

Increased computing power

How A Deep Neural Network Learns

Biological Neural Network Artificial Neural Network Fig:BNN

•During the training phase, RBMs calculate the

Full network training is then applied by

The DSN consists of a set of modules, each of which is a

We can make the following conclusions from the above plot:

Here ‘e’ is the error that is normally distributed. The aim of

Mathematically, the variance error in the model is:

Though some points are classified incorrectly,

So, what do you think is the optimum value for k?

You might also like