Professional Documents
Culture Documents
CHATPER 1
Q1 a ) MACHINE LEARNING
Machine learning is a Field of study that allow a Computer to learn without
being explicitly programmed. Using machine learning we don9t provide explicit
instruction to computer for reading to some special situations . We need to
provide training to the computer to find real time solutions for specific Problem
. The Chess game is a famous example where machine learning is being used to
play game.
Machine learning algorithms are used in a wide variety of applications, such as
in medicine, email filtering, speech recognition, and computer vision, where it
is difficult or unfeasible to develop conventional algorithms to perform the
needed tasks.[3]
Need For Machine Learning
• Ever since the technical revolution, we9ve been generating an immeasurable
amount of data.
• With the availability of so much data, it is finally possible to build
predictive models that can study and analyse complex data to find useful
insights and deliver more accurate results.
• Top Tier companies such as Netflix and Amazon build such Machine Learning
models by using tons of data in order to identify profitable opportunities
and avoid unwanted risks.
Important Terms of
Machine Learning
• Algorithm: A Machine Learning algorithm is a set of rules and statistical
techniques used to learn patterns from data and draw significant
information from it. It is the logic behind a Machine Learning model.
An example of a Machine Learning algorithm is the Linear Regression
algorithm.
• Model: A model is the main component of Machine Learning. A model is
trained by using a Machine Learning Algorithm. An algorithm maps all the
decisions that a model is supposed to take based on the given input, in
order to get the correct output.
• Predictor Variable: It is a feature(s) of the data that can be used to predict
the output.
• Response Variable: It is the feature or the output variable that needs to be
predicted by using the predictor variable(s).
• Training Data: The Machine Learning model is built using the training data.
The training data helps the model to identify key trends and patterns
essential to predict the output.
• Testing Data: After the model is trained, it must be tested to evaluate
how accurately it can predict an outcome. This is done by the testing
data set.
Robotics: Robotics is one of the fields that always gain the interest of
researchers as well as the common. Researchers all over the world are still
working on creating robots that mimic the human brain. They are using neural
networks, AI, ML, computer vision, and many other technologies in this
research.
Types of Regression
Linear Regression:
o Linear regression is a statistical regression method which is used for
predictive analysis.
o It is one of the very simple and easy algorithms which works on
regression and shows the relationship between the continuous
variables.
o It is used for solving the regression problem in machine learning.
o Linear regression shows the linear relationship between the
independent variable (X-axis) and the dependent variable (Y-axis), hence
called linear regression.
Linear regression can be further divided into two types of the algorithm:
o Simple-Linear-Regression:
If a single independent variable is used to predict the value of a
numerical dependent variable, then such a Linear Regression algorithm
is called Simple Linear Regression.
o Multiple-Linear-regression:
If more than one independent variable is used to predict the value of a
numerical dependent variable, then such a Linear Regression algorithm
is called Multiple Linear Regression.
Logistic Regression:
o Logistic regression is another supervised learning algorithm which is
used to solve the classification problems. In classification problems, we
have dependent variables in a binary or discrete format such as 0 or 1.
o Logistic regression algorithm works with the categorical variable such as
0 or 1, Yes or No, True or False, Spam or not spam, etc.
o It is a predictive analysis algorithm which works on the concept of
probability.
Polynomial Regression:
o Polynomial Regression is a type of regression which models the non-
linear dataset using a linear model.
o It is similar to multiple linear regression, but it fits a non-linear curve
between the value of x and corresponding conditional values of y.
1. Data cleaning
2. Data integration
3. Data reduction
Data cleaning:
Data cleaning is the process to remove incorrect data, incomplete data and
inaccurate data from the datasets, and it also replaces the missing values.
Data integration:
The process of combining multiple sources into a single dataset. The Data
integration process is one of the main components in data management.
Data reduction:
This process helps in the reduction of the volume of the data which makes the
analysis easier yet produces the same or almost the same result. This reduction
also helps to reduce storage space.
Data Transformation:
The change made in the format or the structure of the data is called data
transformation. This step can be simple or complex based on the
requirements.
In unsupervised learning, the models are trained with the data that is neither
classified nor labelled, and the model acts on that data without any
supervision.
So, now the machine will discover its patterns and differences, such as colour
difference, shape difference, and predict the output when it is tested with the
test dataset.
) Reinforcement Learning
10
the agent, as it learns from its own experience without any human
intervention.
Example: Suppose there is an AI agent present within a maze
environment, and his goal is to find the diamond. The agent interacts
with the environment by performing some actions, and based on those
actions, the state of the agent gets changed, and it also receives a
reward or penalty as feedback.
The agent continues doing these three things (take action, change
state/remain in the same state, and get feedback), and by doing these
actions, he learns and explores the environment.
The agent learns that what actions lead to positive feedback or rewards
and what actions lead to negative feedback penalty. As a positive
reward, the agent gets a positive point, and as a penalty, it gets a
negative point.
11
12
In this example, a scientist just claims that UV rays are harmful to the eyes, but
we assume they may cause blindness. However, it may or may not be possible.
Hence, these types of assumptions are called a hypothesis.
Say suppose we have test data for which we have to determine the outputs
or results. The test data is as shown below:
13
But note here that we could have divided the coordinate plane as:
The way in which the coordinate would be divided depends on the data,
algorithm and constraints.
All these legal possible ways in which we can divide the coordinate plane to
predict the outcome of the test data composes of the Hypothesis Space.
Each individual possible way is known as the hypothesis.
Hence, in this example the hypothesis space would be like:
14
UNIT 2
15
This step function or Activation function plays a vital role in ensuring that
output is mapped between required values (0,1) or (-1,1).
Based on the layers, Perceptron models are divided into two types. These are
as follows:
16
Threshold = 1.5
x1 * w1 = 1 * 0.7 = 0.7
x2 * w2 = 0 * 0.6 = 0
x3 * w3 = 1 * 0.5 = 0.5
x4 * w4 = 0 * 0.3 = 0
x5 * w5 = 1 * 0.4 = 0.4
Return true if the sum > 1.5 ("Yes I will go to the Concert")
17
The main features of Backpropagation are the iterative, recursive and efficient
method through which it calculates the updated weight to improve the
network until it is not able to perform the task for which it is being trained.
18
1. Input layer
2. Hidden layer
3. Output layer
Each layer has its own way of working and its own way to take action such
that we are able to get the desired results and correlate these scenarios to
our conditions. Let us discuss other details needed to help summarizing this
algorithm
5. Travel back from the output layer to the hidden layer to adjust the
weights such that the error is decreased.
This process is repeated till we get the desired output. The training phase
is done with supervision. Once the model is stable, it is used in
production.
19
20
In Regression, the output variable must In Classification, the output variable must be a
be of continuous nature or real value. discrete value.
The task of the regression algorithm is to The task of the classification algorithm is to map the
map the input value (x) with the input value(x) with the discrete output variable(y).
continuous output variable(y).
Regression Algorithms are used with Classification Algorithms are used with discrete
continuous data. data.
In Regression, we try to find the best fit In Classification, we try to find the decision
line, which can predict the output more boundary, which can divide the dataset into
accurately. different classes.
The regression can be further divided The Classification algorithms can be divided into
into Linear and Non-linear Regression. Binary Classifier and Multi-class Classifier.
21
22
CNN takes an image as input, which is classified and process under a certain
category such as dog, cat, lion, tiger, etc. The computer sees an image as an
array of pixels and depends on the resolution of the image. Based on image
resolution, it will see as h * w * d, where h= height w= width and d=
dimension.
In CNN, each input image will pass through a sequence of convolution layers
along with pooling, fully connected layers, filters (Also known as kernels). After
that, we will apply the Soft-max function to classify an object with probabilistic
values 0 and 1.
23
Convolution Layer
Convolution layer is the first layer to extract features from an input image. By
learning image features using a small square of input data, the convolutional
layer preserves the relationship between pixels. It is a mathematical operation
which takes two inputs such as image matrix and a kernel or filter.
Strides
Stride is the number of pixels which are shift over the input matrix. When the
stride is equaled to 1, then we move the filters to 1 pixel at a time and
similarly, if the stride is equaled to 2, then we move the filters to 2 pixels at a
time. The following figure shows that the convolution would work with a stride
of 2.
Padding
Padding plays a crucial role in building the convolutional neural network. If the
image will get shrink and if we will take a neural network with 100's of layers
on it, it will give us a small image after filtered in the end.
24
It is clear from the above picture that the pixel in the corner will only get
covers one time, but the middle pixel will get covered more than once. It
means that we have more information on that middle pixel, so there are two
downsides:
o Shrinking outputs
o Losing information on the corner of the image.
Pooling Layer
25
In the multi-layer perceptron diagram above, we can see that there are three
inputs and thus three input nodes and the hidden layer has three nodes. The
output layer gives two outputs, therefore there are two output nodes. The
nodes in the input layer take input and forward it for further process, in the
diagram above the nodes in the input layer forwards their output to each of
the three nodes in the hidden layer, and in the same way, the hidden layer
processes the information and passes it to the output layer.
Every node in the multi-layer perception uses a sigmoid activation function.
The sigmoid activation function takes real values as input and converts them
to numbers between 0 and 1 using the sigmoid formula.
The MLP learning procedure is as follows:
Starting with the input layer, propagate data forward to the output layer.
This step is the forward propagation.
Based on the output, calculate the error (the difference between the
predicted and known outcome). The error needs to be minimized.
Backpropagate the error. Find its derivative with respect to each weight in
the network, and update the model.
Repeat the three steps given above over multiple epochs to learn ideal
weights.Finally, the output is taken via a threshold function to obtain the
predicted class labels.
26
UNIT 3
Input Layer– First is the input layer. This layer will accept the data and pass it
to the rest of the network.
Hidden Layer– The second type of layer is called the hidden layer. Hidden
layers are either one or more in number for a neural network. In the above
case, the number is 1. Hidden layers are the ones that are actually responsible
for the excellent performance and complexity of neural networks. They
perform multiple functions at the same time such as data transformation,
automatic feature creation, etc.
Output layer– The last type of layer is the output layer. The output layer holds
the result or the output of the problem. Raw images get passed to the input
layer and we receive output in the output layer.
27
In this case, we are providing an image of a vehicle and this output layer will
provide an output whether it is an emergency or non-emergency vehicle, after
passing through the input and hidden layers of course.
Convolution layer :
It is the first layer to extract features from an input image. By learning image
features using a small square of input data, the convolutional layer preserves
the relationship between pixels. It is a mathematical operation which takes
two inputs such as image matrix and a kernel or filter.
Pooling layer :
28
each map but retains the important information. There are the following types
of spatial pooling:
Max Pooling
Average Pooling
Down-scaling will perform through average pooling by dividing the input into
rectangular pooling regions and computing the average values of each region.
Syntax
layer = averagePooling2dLayer(poolSize)
layer = averagePooling2dLayer(poolSize,Name,Value)
Sum Pooling
The sub-region for sum pooling or mean pooling are set exactly the same as
for max-pooling but instead of using the max function we use sum or mean.
Loss Layer : The Loss Function is one of the important components of Neural
Networks. Loss is nothing but a prediction error of Neural Net. And the
method to calculate the loss is called Loss Function. In simple words, the Loss is
29
used to calculate the gradients. And gradients are used to update the weights
of the Neural Net.
Dense Layer: The dense layer is a neural network layer that is connected
deeply, which means each neuron in the dense layer receives input from all
neurons of its previous layer. The dense layer is found to be the most
commonly used layer in the models.
The output generated by the dense layer is an 8m9 dimensional vector. Thus,
dense layer is basically used for changing the dimensions of the vector. Dense
layers also applies operations like rotation, scaling, translation on the vector.
30
For example, let9s say you had a population of one million people, and you
used simple random sampling to get a sample of 1,000 people. You could use
simple random sampling again on the 1,000 people to get a smaller portion of
100 people.
Keras runs on top of open source machine libraries like TensorFlow, Theano or
Cognitive Toolkit (CNTK). Theano is a python library used for fast numerical
computation tasks. TensorFlow is the most famous symbolic math library used
for creating neural networks and deep learning models. TensorFlow is very
flexible and the primary benefit is distributed computing. CNTK is deep
learning framework developed by Microsoft. It uses libraries such as Python,
C#, C++ or standalone machine learning toolkits. Theano and TensorFlow are
very powerful libraries but difficult to understand for creating neural
networks.
Keras is based on minimal structure that provides a clean and easy way to
create deep learning models based on TensorFlow or Theano. Keras is
designed to quickly define deep learning models. Well, Keras is an optimal
choice for deep learning applications.
Features
Keras leverages various optimization techniques to make high level neural
network API easier and more performant. It supports the following features −
Consistent, simple and extensible API.
Minimal structure - easy to achieve the result without any frills.
It supports multiple platforms and backends.
31
32
UNIT 4
RNN have a <memory= which remembers all information about what has
been calculated. It uses the same parameters for each input as it performs
the same task on all the inputs or hidden layers to produce the output. This
reduces the complexity of parameters, unlike other neural networks.
The working of a RNN can be understood with the help of below example:
Example:
Suppose there is a deeper network with one input layer, three hidden layers
and one output layer. Then like other neural networks, each hidden layer will
33
have its own set of weights and biases, let9s say, for hidden layer 1 the
weights and biases are (w1, b1), (w2, b2) for second hidden layer and (w3,
b3) for third hidden layer. This means that each of these layers are
independent of each other, i.e. they do not memorize the previous outputs.
34
b) Explain the Actor critic model. List down what are its advantages in
reinforcement learning.
Ans)Actor Critic Model :
1. The <Critic= estimates the value function. This could be the action-value
(the Q value) or state-value (the V value).
2. The <Actor= updates the policy distribution in the direction suggested by
the Critic (such as with policy gradients).
And both the Critic and Actor functions are parameterized with neural
networks.
Actor-Critics aim to take advantage of all the good stuff from both value-
based and policy-based while eliminating all their drawbacks. And how do
they do this?
The principal idea is to split the model in two: one for computing an action based
on a state and another one to produce the Q values of the action.
The actor takes as input the state and outputs the best action. It essentially
controls how the agent behaves by learning the optimal policy (policy-based).
The critic, on the other hand, evaluates the action by computing the value
function (value based). Those two models participate in a game where they
both get better in their own role as the time passes. The result is that the overall
architecture will learn to play the game more efficiently than the two methods
separately.
How Actor Critic works
Imagine you play a video game with a friend that provides you some feedback.
You9re the Actor and your friend is the Critic.
Figure: 4.11
At the beginning, you don9t know how to play, so you try some action randomly.
The Critic observes your action and provides feedback.
Learning from this feedback, you9ll update your policy and be better at
playing that game.
On the other hand, your friend (Critic) will also update their own way to provide
feedback so it can be better next time.
35
TensorFlow offers multiple levels of abstraction so you can choose the right
one for your needs. Build and train models by using the high-level Keras API,
which makes getting started with TensorFlow and machine learning easy.
36
37
38
Explanation:
39
Q 8 A) What is NLP
Ans) NLP stands for Natural Language Processing, which is a part of Computer
Science, Human language, and Artificial Intelligence. It is the technology that
is used by machines to understand, analyse, manipulate, and interpret
human's languages. It helps developers to organize knowledge for performing
tasks such as translation, automatic summarization, Named Entity
Recognition (NER), speech recognition, relationship extraction, and topic
segmentation.
40
1. Image Recognition:
2. Speech Recognition
3. Traffic prediction:
If we want to visit a new place, we take help of Google Maps, which shows us
the correct path with the shortest route and predicts the traffic conditions.
4. Product recommendations:
41
5. Self-driving cars:
Machine learning is widely used in stock market trading. In the stock market,
there is always a risk of up and downs in shares, so for this machine
learning's long short term memory neural network is used for the prediction
of stock market trends.
Bayesian networks are probabilistic, because these networks are built from
a probability distribution, and also use probability theory for prediction and
anomaly detection.
42
8 b) Explain the concept and role of Support vactor machine in details and
also describe its Application Area
UNIT 5 a .
43
UNIT 1
Questions QB
Q. Short Note On
1. Data Augmentation.
2. Data Normalization.
3. Data Preprocessing **
Q Explain in details about multiple hypothesis testing RGPV may 2019 2018
44
Characteristics
45
For example, consider a data set containing two features, age(x1), and
income(x2). Where age ranges from 0–100, while income ranges from 0–20,000
and higher. Income is about 1,000 times larger than age and ranges from
20,000–500,000. So, these two features are in very different ranges. When we
do further analysis, like multivariate linear regression, for example, the
attributed income will intrinsically influence the result more due to its larger
value. But this doesn9t necessarily mean .
46
Statistical data analysis generally involves some form of statistical tools, which
a layman cannot perform without having any statistical knowledge. There are
various software packages to perform statistical data analysis. This software
includes Statistical Analysis System (SAS), Statistical Package for the Social
Sciences (SPSS), Stat soft, etc.
The basic goal of statistical data analysis is to identify trends, for example, in
the retailing business, this method can be approached to uncover patterns in
unstructured and semi-structured consumer data that can be used for making
more powerful decisions for enhancing customer experience and progressing
sales.
The data in statistical data analysis is basically of 2 types, namely, continuous
data and discreet data. The continuous data is the one that cannot be counted.
For example, intensity of a light can be measured but cannot be counted. The
discreet data is the one that can be counted. For example, the number of bulbs
can be counted.
47
48
UNIT 2
Ans) QB Page No 87
Ans) Before entering into Batch normalization let9s understand the term
<Normalization=.
But what is the reason behind the term <Batch= in batch normalization? A
typical neural network is trained using a collected set of input data
called batch. Similarly, the normalizing process in batch normalization takes
place in batches, not as a single input.
49
UNIT 3
Q) What is SubSampling?
Ans) Fully Connected layers in a neural networks are those layers where all the
inputs from one layer are connected to every activation unit of the next layer.
In most popular machine learning models, the last few layers are full
connected layers which compiles the data extracted by previous layers to form
the final output. It is the second most time consuming layer second to
Convolution Layer.
The diagram below clarifies the statement.
The first/input layer has 3 feature units and there are 4 activation units in the
next hidden layer.
The 1's in each layer are bias units.
a01, a02 and a03 are input values to the neural network.They are basically
features of the training example.
The 4 activation units of first hidden layer is connected to all 3 activation units
of second hidden layer The weights/parameters connect the two layers.
Learning Fully Connected Networks with BackpropagationThe first version of a
fully connected neural network was the Perceptron.
50
UNIT 4
Q)What is Recurrent Neural Network ? What is Actor critic model?**
The aim of attention models is to reduce larger, more complicated tasks into
smaller, more manageable areas of attention to understand and process
sequentially. The models work within neural networks, which are a type of
network model with a similar structure and processing methods as the human
brain for simplifying and processing information. Using attention models
enables the network to focus on a few particular aspects at a time and ignoring
the rest. This allows for efficient and sequential data processing, especially
when the network needs to categorize entire datasets.
51
Attention models evaluate inputs to identify the most critical components and
assign each of them with a weight. For example, if using an attention model to
translate a sentence from one language to another, the model would select
the most important words and assign them a higher weight. Similarly, it assigns
the less significant words a lower value. This helps achieve a more accurate
output prediction.
52
UNIT 5
Q) Discuss Key idea of the support vector machine? RGPV 2018 2020
Ans) Support Vector Machine or SVM is one of the most popular Supervised
Learning algorithms, which is used for Classification as well as Regression
problems. However, primarily, it is used for Classification problems in Machine
Learning.
The goal of the SVM algorithm is to create the best line or decision boundary
that can segregate n-dimensional space into classes so that we can easily put
the new data point in the correct category in the future. This best decision
boundary is called a hyperplane.
SVM chooses the extreme points/vectors that help in creating the hyperplane.
These extreme cases are called as support vectors, and hence algorithm is
termed as Support Vector Machine. Consider the below diagram in which there
are two different categories that are classified using a decision boundary or
hyperplane:
Example: SVM can be understood with the example that we have used in the
KNN classifier. Suppose we see a strange cat that also has some features of
dogs, so if we want a model that can accurately identify whether it is a cat or
dog, so such a model can be created by using the SVM algorithm. We will first
train our model with lots of images of cats and dogs so that it can learn about
different features of cats and dogs, and then we test it with this strange
creature. So as support vector creates a decision boundary between these two
data (cat and dog) and choose extreme cases (support vectors), it will see the
extreme case of cat and dog.
53
You need to remember a thumb rule to identify the right hyper-plane: <Select
the hyper-plane which segregates the two classes better=. In this scenario,
hyper-plane <B= has excellently performed this job.
Here, maximizing the distances between nearest data point (either class) and
hyper-plane will help us to decide the right hyper-plane. This distance is called
as Margin.Let9s_look_at_the_below:
Above, you can see that the margin for hyper-plane C is high as compared to
both A and B. Hence, we name the right hyper-plane as C. Another reason for
selecting the hyper-plane with higher margin is robustness. If we select a
hyper-plane having low margin then there is high chance of miss-classification.
54
Ans)