You are on page 1of 55

lOMoARcPSD|29392272

Machine Learning - question paper solved ML

Computer Science (Rajiv Gandhi Proudyogiki Vishwavidyalaya)

Studocu is not sponsored or endorsed by any college or university


Downloaded by Gim Kadak (gimkadak@gmail.com)
lOMoARcPSD|29392272

CHATPER 1
Q1 a ) MACHINE LEARNING
Machine learning is a Field of study that allow a Computer to learn without
being explicitly programmed. Using machine learning we don9t provide explicit
instruction to computer for reading to some special situations . We need to
provide training to the computer to find real time solutions for specific Problem
. The Chess game is a famous example where machine learning is being used to
play game.
Machine learning algorithms are used in a wide variety of applications, such as
in medicine, email filtering, speech recognition, and computer vision, where it
is difficult or unfeasible to develop conventional algorithms to perform the
needed tasks.[3]
Need For Machine Learning
• Ever since the technical revolution, we9ve been generating an immeasurable
amount of data.
• With the availability of so much data, it is finally possible to build
predictive models that can study and analyse complex data to find useful
insights and deliver more accurate results.
• Top Tier companies such as Netflix and Amazon build such Machine Learning
models by using tons of data in order to identify profitable opportunities
and avoid unwanted risks.
Important Terms of
Machine Learning
• Algorithm: A Machine Learning algorithm is a set of rules and statistical
techniques used to learn patterns from data and draw significant
information from it. It is the logic behind a Machine Learning model.
An example of a Machine Learning algorithm is the Linear Regression
algorithm.
• Model: A model is the main component of Machine Learning. A model is
trained by using a Machine Learning Algorithm. An algorithm maps all the
decisions that a model is supposed to take based on the given input, in
order to get the correct output.
• Predictor Variable: It is a feature(s) of the data that can be used to predict
the output.
• Response Variable: It is the feature or the output variable that needs to be
predicted by using the predictor variable(s).

Downloaded by Gim Kadak (gimkadak@gmail.com)


lOMoARcPSD|29392272

• Training Data: The Machine Learning model is built using the training data.
The training data helps the model to identify key trends and patterns
essential to predict the output.
• Testing Data: After the model is trained, it must be tested to evaluate
how accurately it can predict an outcome. This is done by the testing
data set.

Future Scope of Machine Learning

Automotive Industry:The automotive industry is one of the areas where


Machine Learning is excelling by changing the definition of 8safe9 driving. These
self-driving cars are built using Machine Learning, IoT sensors, high-definition
cameras, voice recognition systems, etc.

Robotics: Robotics is one of the fields that always gain the interest of
researchers as well as the common. Researchers all over the world are still
working on creating robots that mimic the human brain. They are using neural
networks, AI, ML, computer vision, and many other technologies in this
research.

Safer Healthcare: We9ve been seeing significant growth in machine learning


being used to predict and support COVID-19 strategies. The healthcare
industry itself has been long using ML for a wide range of purposes, we believe
that the future scope of machine learning will undertake more complex use
cases.

Downloaded by Gim Kadak (gimkadak@gmail.com)


lOMoARcPSD|29392272

Q) What is Regression ? Explain with its Types


Ans) Regression analysis is a statistical method to model the relationship
between a dependent (target) and independent (predictor) variables with one
or more independent variables. More specifically, Regression analysis helps us
to understand how the value of the dependent variable is changing
corresponding to an independent variable when other independent variables
are held fixed. It predicts continuous/real values such as temperature, age,
salary, price, etc.
Example: Suppose there is a marketing company A, who does various
advertisement every year and get sales on that. The below list shows the
advertisement made by the company in the last 5 years and the corresponding
sales:

Now, the company wants to do the advertisement of $200 in the year


2019 and wants to know the prediction about the sales for this year. So to
solve such type of prediction problems in machine learning, we need
regression analysis.

Types of Regression
Linear Regression:
o Linear regression is a statistical regression method which is used for
predictive analysis.
o It is one of the very simple and easy algorithms which works on
regression and shows the relationship between the continuous
variables.
o It is used for solving the regression problem in machine learning.
o Linear regression shows the linear relationship between the
independent variable (X-axis) and the dependent variable (Y-axis), hence
called linear regression.

Downloaded by Gim Kadak (gimkadak@gmail.com)


lOMoARcPSD|29392272

Types of Linear Regression

Linear regression can be further divided into two types of the algorithm:

o Simple-Linear-Regression:
If a single independent variable is used to predict the value of a
numerical dependent variable, then such a Linear Regression algorithm
is called Simple Linear Regression.
o Multiple-Linear-regression:
If more than one independent variable is used to predict the value of a
numerical dependent variable, then such a Linear Regression algorithm
is called Multiple Linear Regression.

Logistic Regression:
o Logistic regression is another supervised learning algorithm which is
used to solve the classification problems. In classification problems, we
have dependent variables in a binary or discrete format such as 0 or 1.
o Logistic regression algorithm works with the categorical variable such as
0 or 1, Yes or No, True or False, Spam or not spam, etc.
o It is a predictive analysis algorithm which works on the concept of
probability.

Polynomial Regression:
o Polynomial Regression is a type of regression which models the non-
linear dataset using a linear model.
o It is similar to multiple linear regression, but it fits a non-linear curve
between the value of x and corresponding conditional values of y.

Downloaded by Gim Kadak (gimkadak@gmail.com)


lOMoARcPSD|29392272

o Suppose there is a dataset which consists of datapoints which are


present in a non-linear fashion, so for such case, linear regression will
not best fit to those datapoints. To cover such datapoints, we need
Polynomial regression.

Support Vector Regression:

Support Vector Machine is a supervised learning algorithm which can be used


for regression as well as classification problems. So if we use it for regression
problems, then it is termed as Support Vector Regression.

Support Vector Regression is a regression algorithm which works for


continuous variables.

Decision Tree Regression:


o Decision Tree is a supervised learning algorithm which can be used for
solving both classification and regression problems.
o It can solve problems for both categorical and numerical data
o Decision Tree regression builds a tree-like structure in which each
internal node represents the "test" for an attribute, each branch
represent the result of the test, and each leaf node represents the final
decision or result.

Downloaded by Gim Kadak (gimkadak@gmail.com)


lOMoARcPSD|29392272

1 . b) What is the role of preprocessing of data in machine learning? Why it is


needed? Explain the unsupervised model of machine learning in detail with
an example.
Ans ) Pre-processing refers to the transformations applied to our data
before feeding it to the algorithm. Data Preprocessing is a technique that is
used to convert the raw data into a clean data set. In other words, whenever
the data is gathered from different sources it is collected in raw format which
is not feasible for the analysis.

Need of Data Preprocessing


 For achieving better results from the applied model in Machine Learning
projects the format of the data has to be in a proper manner. Some
specified Machine Learning model needs information in a specified format,
for example, Random Forest algorithm does not support null values,
therefore to execute random forest algorithm null values have to be
managed from the original raw data set.
 Another aspect is that the data set should be formatted in such a way that
more than one Machine Learning and Deep Learning algorithm are
executed in one data set, and best out of them is chosen.

Why is Data preprocessing important?


Preprocessing of data is mainly to check the data quality. The quality can be
checked by the following

 Accuracy: To check whether the data entered is correct or not.


 Completeness: To check whether the data is available or not recorded.
 Consistency: To check whether the same data is kept in all the places that do
or do not match.
 Timeliness: The data should be updated correctly.

Downloaded by Gim Kadak (gimkadak@gmail.com)


lOMoARcPSD|29392272

 Believability: The data should be trustable.


 Interpretability: The understandability of the data.

Major Tasks in Data Preprocessing:

1. Data cleaning
2. Data integration
3. Data reduction

Data cleaning:
Data cleaning is the process to remove incorrect data, incomplete data and
inaccurate data from the datasets, and it also replaces the missing values.

Data integration:
The process of combining multiple sources into a single dataset. The Data
integration process is one of the main components in data management.

Data reduction:
This process helps in the reduction of the volume of the data which makes the
analysis easier yet produces the same or almost the same result. This reduction
also helps to reduce storage space.

Data Transformation:
The change made in the format or the structure of the data is called data
transformation. This step can be simple or complex based on the
requirements.

Downloaded by Gim Kadak (gimkadak@gmail.com)


lOMoARcPSD|29392272

Types Of Machine Learning .. / Classification of Machine Learning Models:


 ) Supervised Machine Learning

As its name suggests, Supervised machine learning is based on supervision. It


means in the supervised learning technique, we train the machines using the
"labelled" dataset, and based on the training, the machine predicts the output.
Here, the labelled data specifies that some of the inputs are already mapped to
the output. More preciously, we can say; first, we train the machine with the
input and corresponding output, and then we ask the machine to predict the
output using the test dataset.

Let's understand supervised learning with an example. Suppose we have an


input dataset of cats and dog images. So, first, we will provide the training to
the machine to understand the images, such as the shape & size of the tail of
cat and dog, Shape of eyes, colour, height (dogs are taller, cats are smaller),
etc. After completion of training, we input the picture of a cat and ask the
machine to identify the object and predict the output. Now, the machine is
well trained, so it will check all the features of the object, such as height,
shape, colour, eyes, ears, tail, etc., and find that it's a cat. So, it will put it in the
Cat category. This is the process of how the machine identifies the objects in
Supervised Learning.

 ) Unsupervised Machine Learning

Unsupervised learning is different from the Supervised learning technique; as


its name suggests, there is no need for supervision. It means, in unsupervised
machine learning, the machine is trained using the unlabeled dataset, and the
machine predicts the output without any supervision.

In unsupervised learning, the models are trained with the data that is neither
classified nor labelled, and the model acts on that data without any
supervision.

The main aim of the unsupervised learning algorithm is to group or


categories the unsorted dataset according to the similarities, patterns, and
differences. Machines are instructed to find the hidden patterns from the
input dataset.

Downloaded by Gim Kadak (gimkadak@gmail.com)


lOMoARcPSD|29392272

Let's take an example to understand it more preciously; suppose there is a


basket of fruit images, and we input it into the machine learning model. The
images are totally unknown to the model, and the task of the machine is to
find the patterns and categories of the objects.

So, now the machine will discover its patterns and differences, such as colour
difference, shape difference, and predict the output when it is tested with the
test dataset.

) Reinforcement Learning

 Reinforcement Learning is a feedback-based Machine learning technique


in which an agent learns to behave in an environment by performing the
actions and seeing the results of actions. For each good action, the agent
gets positive feedback, and for each bad action, the agent gets negative
feedback or penalty.
 In Reinforcement Learning, the agent learns automatically using
feedbacks without any labeled data, unlike supervised learning.
 Since there is no labeled data, so the agent is bound to learn by its
experience only.
 RL solves a specific type of problem where decision making is sequential,
and the goal is long-term, such as game-playing, robotics, etc.
 The agent interacts with the environment and explores it by itself. The
primary goal of an agent in reinforcement learning is to improve the
performance by getting the maximum positive rewards.
 The agent learns with the process of hit and trial, and based on the
experience, it learns to perform the task in a better way. Hence, we can
say that "Reinforcement learning is a type of machine learning method
where an intelligent agent (computer program) interacts with the
environment and learns to act within that." How a Robotic dog learns
the movement of his arms is an example of Reinforcement learning.
 It is a core part of Artificial intelligence, and all AI agent works on the
concept of reinforcement learning. Here we do not need to pre-program

Downloaded by Gim Kadak (gimkadak@gmail.com)


lOMoARcPSD|29392272

10

the agent, as it learns from its own experience without any human
intervention.
 Example: Suppose there is an AI agent present within a maze
environment, and his goal is to find the diamond. The agent interacts
with the environment by performing some actions, and based on those
actions, the state of the agent gets changed, and it also receives a
reward or penalty as feedback.
 The agent continues doing these three things (take action, change
state/remain in the same state, and get feedback), and by doing these
actions, he learns and explores the environment.
 The agent learns that what actions lead to positive feedback or rewards
and what actions lead to negative feedback penalty. As a positive
reward, the agent gets a positive point, and as a penalty, it gets a
negative point.

Terms used in Reinforcement Learning


o Agent(): An entity that can perceive/explore the environment and act
upon it.
o Environment(): A situation in which an agent is present or surrounded
by. In RL, we assume the stochastic environment, which means it is
random in nature.
o Action(): Actions are the moves taken by an agent within the
environment.
o State(): State is a situation returned by the environment after each
action taken by the agent.

Downloaded by Gim Kadak (gimkadak@gmail.com)


lOMoARcPSD|29392272

11

o Reward(): A feedback returned to the agent from the environment to


evaluate the action of the agent.
o Policy(): Policy is a strategy applied by the agent for the next action
based on the current state.
o Value(): It is expected long-term retuned with the discount factor and
opposite to the short-term reward.
o Q-value(): It is mostly similar to the value, but it takes one additional
parameter as a current action (a).

Downloaded by Gim Kadak (gimkadak@gmail.com)


lOMoARcPSD|29392272

12

2. a) Discuss linear regression with an example. Explain the role of hypothesis


function in machine learning models.

ANS) This assumption in Machine learning is known as Hypothesis or in other


words The hypothesis is defined as the supposition or proposed explanation
based on insufficient evidence or assumptions. It is just a guess based on some
known facts but has not yet been proven. A good hypothesis is testable, which
results in either true or false.

Example: Let's understand the hypothesis with a common example. Some


scientist claims that ultraviolet (UV) light can damage the eyes then it may also
cause blindness.

In this example, a scientist just claims that UV rays are harmful to the eyes, but
we assume they may cause blindness. However, it may or may not be possible.
Hence, these types of assumptions are called a hypothesis.

Hypothesis function and testing

Say suppose we have test data for which we have to determine the outputs
or results. The test data is as shown below:

Downloaded by Gim Kadak (gimkadak@gmail.com)


lOMoARcPSD|29392272

13

We can predict the outcomes by dividing the coordinate as shown below:


So the test data would yield the following result:

But note here that we could have divided the coordinate plane as:

The way in which the coordinate would be divided depends on the data,
algorithm and constraints.
All these legal possible ways in which we can divide the coordinate plane to
predict the outcome of the test data composes of the Hypothesis Space.
Each individual possible way is known as the hypothesis.
Hence, in this example the hypothesis space would be like:

Downloaded by Gim Kadak (gimkadak@gmail.com)


lOMoARcPSD|29392272

14

UNIT 2

Gradient Descent in Machine Learning

Gradient Descent is known as one of the most commonly used optimization


algorithms to train machine learning models by means of minimizing errors
between actual and expected results. Further, gradient descent is also used to
train Neural Networks.

Gradient Descent is defined as one of the most commonly used iterative


optimization algorithms of machine learning to train the machine learning and
deep learning models. It helps in finding the local minimum of a function.
The best way to define the local minimum or local maximum of a function
using gradient descent is as follows:

o If we move towards a negative gradient or away from the gradient of the


function at the current point, it will give the local minimum of that
function.
o Whenever we move towards a positive gradient or towards the gradient
of the function at the current point, we will get the local maximum of
that function.

Downloaded by Gim Kadak (gimkadak@gmail.com)


lOMoARcPSD|29392272

15

2 b) Explain the concept of perceptron, back propagation and sigmoid


activation function in brief. Differentiate between classification and
regression.
Ans) Perceptron is Machine Learning algorithm for supervised learning of
various binary classification tasks. Further, Perceptron is also understood as an
Artificial Neuron or neural network unit that helps to detect certain input
data computations in business intelligence.
Perceptron model is also treated as one of the best and simplest types of
Artificial Neural networks. However, it is a supervised learning algorithm of
binary classifiers. Hence, we can consider it as a single-layer neural network
with four main parameters, i.e., input values, weights and Bias, net sum, and
an activation function.

How does Perceptron work?

In Machine Learning, Perceptron is considered as a single-layer neural network


that consists of four main parameters named input values (Input nodes),
weights and Bias, net sum, and an activation function. The perceptron model
begins with the multiplication of all input values and their weights, then adds
these values together to create the weighted sum. Then this weighted sum is
applied to the activation function 'f' to obtain the desired output. This
activation function is also known as the step function and is represented by 'f'.

This step function or Activation function plays a vital role in ensuring that
output is mapped between required values (0,1) or (-1,1).

Types of Perceptron Models

Based on the layers, Perceptron models are divided into two types. These are
as follows:

1. Single-layer Perceptron Model


2. Multi-layer Perceptron model

The Perceptron Algorithm

1. Set a threshold value


2. Multiply all inputs with its weights
3. Sum all the results
4. Activate the output

Downloaded by Gim Kadak (gimkadak@gmail.com)


lOMoARcPSD|29392272

16

1. Set a threshold value:

 Threshold = 1.5

2. Multiply all inputs with its weights:

 x1 * w1 = 1 * 0.7 = 0.7
 x2 * w2 = 0 * 0.6 = 0
 x3 * w3 = 1 * 0.5 = 0.5
 x4 * w4 = 0 * 0.3 = 0
 x5 * w5 = 1 * 0.4 = 0.4

3. Sum all the results:

 0.7 + 0 + 0.5 + 0 + 0.4 = 1.6 (The Weighted Sum)

4. Activate the Output:

 Return true if the sum > 1.5 ("Yes I will go to the Concert")

Basic Components of Perceptron

o Input Nodes or Input Layer: This is the primary component of


Perceptron which accepts the initial data into the system for further
processing. Each input node contains a real numerical value.

o Wight and Bias: Weight parameter represents the strength of the


connection between units. This is another most important parameter of
Perceptron components. Further, Bias can be considered as the line of
intercept in a linear equation.

o Activation Function: These are the final and important components


that help to determine whether the neuron will fire or not.

Downloaded by Gim Kadak (gimkadak@gmail.com)


lOMoARcPSD|29392272

17

Q2 b ) Explain the working of back propagation neural network with neat


architecture and flowchart ?

Ans) The algorithm is used to effectively train a neural network through a


method called chain rule. In simple terms, after each forward pass through a
network, backpropagation performs a backward pass while adjusting the
model9s parameters (weights and biases).

In simple terms, after each feed-forward passes through a network, this


algorithm does the backward pass to adjust the model9s parameters based
on weights and biases. A typical supervised learning algorithm attempts to
find a function that maps input data to the right output. Backpropagation
works with a multi-layered neural network and learns internal
representations of input to output mapping.

For a single training example, Backpropagation algorithm calculates the


gradient of the error function. Backpropagation can be written as a function of
the neural network. Backpropagation algorithms are a set of methods used to
efficiently train artificial neural networks following a gradient descent
approach which exploits the chain rule.

The main features of Backpropagation are the iterative, recursive and efficient
method through which it calculates the updated weight to improve the
network until it is not able to perform the task for which it is being trained.

How Backpropagation Algorithm Works

How Backpropagation Algorithm Works


Let us take a look at how backpropagation works. It has four layers: input
layer, hidden layer, hidden layer II and final output layer.

Downloaded by Gim Kadak (gimkadak@gmail.com)


lOMoARcPSD|29392272

18

So, the main three layers are:

1. Input layer
2. Hidden layer
3. Output layer
Each layer has its own way of working and its own way to take action such
that we are able to get the desired results and correlate these scenarios to
our conditions. Let us discuss other details needed to help summarizing this
algorithm

1. Inputs X, arrive through the preconnected path


2. Input is modeled using real weights W. The weights are usually randomly
selected.
3. Calculate the output for every neuron from the input layer, to the
hidden layers, to the output layer.
4. Calculate the error in the outputs

ErrorB= Actual Output – Desired Output

5. Travel back from the output layer to the hidden layer to adjust the
weights such that the error is decreased.

This process is repeated till we get the desired output. The training phase
is done with supervision. Once the model is stable, it is used in
production.

Downloaded by Gim Kadak (gimkadak@gmail.com)


lOMoARcPSD|29392272

19

Activation functions like sigmoid,ReLU RGPV 2009 2008 2016


A neural network is comprised of layers of nodes and learns to map
examples of inputs to outputs. For a given node, the inputs are multiplied by
the weights in a node and summed together. This value is referred to as the
summed activation of the node. The summed activation is then transformed via
an activation function and defines the specific output or <activation= of the
node. It is also known as Transfer Function.
ACTIVATION FUNCTION:
Activation function decides, whether a neuron should be activated or not by
calculating weighted sum and further adding bias with it with the intention to
introduce non-linearity into the output of a neuron.
It is used to determine the output of neural network like yes or no. It maps the
resulting values in between 0 to 1 or -1 to 1 etc. (depending upon the
function).
Activation functions are an extremely important feature of the artificial
neural networks. They basically decide whether a neuron should be
activated or not. Whether the information that the neuron is receiving is
relevant for the given information or should it be ignored.
Mathematically,
Net Input = (Weight*input)+bias
Now the value of net input can be any anything from -inf to +inf. The neuron
doesn9t really know how to bound to value and thus is not able to decide the
firing pattern. Thus the activation function is an important part of an
artificial neural network. They basically decide whether a neuron should be
activated or not. Thus it bounds the value of the net input.
The activation function is a non-linear transformation that we do over the
input before sending it to the next layer of neurons or finalizing it as output.

Downloaded by Gim Kadak (gimkadak@gmail.com)


lOMoARcPSD|29392272

20

Regression Algorithm Classification Algorithm

In Regression, the output variable must In Classification, the output variable must be a
be of continuous nature or real value. discrete value.

The task of the regression algorithm is to The task of the classification algorithm is to map the
map the input value (x) with the input value(x) with the discrete output variable(y).
continuous output variable(y).

Regression Algorithms are used with Classification Algorithms are used with discrete
continuous data. data.

In Regression, we try to find the best fit In Classification, we try to find the decision
line, which can predict the output more boundary, which can divide the dataset into
accurately. different classes.

Regression algorithms can be used to Classification Algorithms can be used to solve


solve the regression problems such as classification problems such as Identification of
Weather Prediction, House price spam emails, Speech Recognition, Identification of
prediction, etc. cancer cells, etc.

The regression can be further divided The Classification algorithms can be divided into
into Linear and Non-linear Regression. Binary Classifier and Multi-class Classifier.

Downloaded by Gim Kadak (gimkadak@gmail.com)


lOMoARcPSD|29392272

21

3. . a) What are the different types of Neural networks? Explain the


convolution neural network model in detail.
ANS) Neural Networks are a subset of Machine Learning techniques which
learn the data and patterns in a different way utilizing Neurons and Hidden
layers. Neural Networks are way more powerful due to their complex structure
and can be used in applications where traditional Machine Learning algorithms
just cannot suffice. neural networks consist of various layers of
interconnected artificial neurons powered by activation functions which
help in switching them ON/OFF. Like traditional machine algorithms, here
too, there are certain values that neural networks learn in the training
phase.

Types of Neural Networks

1. Perceptron - 2 b Vala Sabse Nicha


2. Feed Forward Networks
3. Multi-Layer Perceptron
4. Convolutional Neural Networks
5. Recurrent Neural Networks

Feed Forward Network


The Feed Forward (FF) networks consist of multiple neurons and hidden layers
which are connected to each other. These are called <feed-forward= because
the data flow in the forward direction only, and there is no backward
propagation. Hidden layers might not be necessarily present in the network
depending upon the application.
More the number of layers more can be the customization of the weights. And
hence, more will be the ability of the network to learn. Weights are not
updated as there is no backpropagation. The output of multiplication of
weights with the inputs is fed to the activation function which acts as a
threshold value. For example: The neuron is activated if it is above threshold
(usually 0) and the neuron produces 1 as an output. The neuron is not
activated if it is below threshold (usually 0) which is considered as -1.
FF networks are used in:
 Classification, Speech recognition, Face recognition ,Pattern recognition

Downloaded by Gim Kadak (gimkadak@gmail.com)


lOMoARcPSD|29392272

22

Convolutional Neural Networks

Convolutional Neural Network is one of the main categories to do image


classification and image recognition in neural networks. Scene labeling, objects
detections, and face recognition, etc., are some of the areas where
convolutional neural networks are widely used.

CNN takes an image as input, which is classified and process under a certain
category such as dog, cat, lion, tiger, etc. The computer sees an image as an
array of pixels and depends on the resolution of the image. Based on image
resolution, it will see as h * w * d, where h= height w= width and d=
dimension.

In CNN, each input image will pass through a sequence of convolution layers
along with pooling, fully connected layers, filters (Also known as kernels). After
that, we will apply the Soft-max function to classify an object with probabilistic
values 0 and 1.

Downloaded by Gim Kadak (gimkadak@gmail.com)


lOMoARcPSD|29392272

23

Convolution Layer

Convolution layer is the first layer to extract features from an input image. By
learning image features using a small square of input data, the convolutional
layer preserves the relationship between pixels. It is a mathematical operation
which takes two inputs such as image matrix and a kernel or filter.

Strides

Stride is the number of pixels which are shift over the input matrix. When the
stride is equaled to 1, then we move the filters to 1 pixel at a time and
similarly, if the stride is equaled to 2, then we move the filters to 2 pixels at a
time. The following figure shows that the convolution would work with a stride
of 2.

Padding

Padding plays a crucial role in building the convolutional neural network. If the
image will get shrink and if we will take a neural network with 100's of layers
on it, it will give us a small image after filtered in the end.

If we take a three by three filter on top of a grayscale image and do the


convolving then what will happen?

Downloaded by Gim Kadak (gimkadak@gmail.com)


lOMoARcPSD|29392272

24

It is clear from the above picture that the pixel in the corner will only get
covers one time, but the middle pixel will get covered more than once. It
means that we have more information on that middle pixel, so there are two
downsides:

o Shrinking outputs
o Losing information on the corner of the image.

To overcome this, we have introduced padding to an image. "Padding is an


additional layer which can add to the border of an image."

Pooling Layer

Pooling layer plays an important role in pre-processing of an image. Pooling


layer reduces the number of parameters when the images are too large.
Pooling is "downscaling" of the image obtained from the previous layers. It can
be compared to shrinking an image to reduce its pixel density. Spatial pooling
is also called downsampling or subsampling, which reduces the dimensionality
of each map but retains the important information.

3 b) Explain the multilayer perceptron model in detail with neat diagram

Ans) A multilayer perceptron (MLP) is a feedforward artificial neural network


that generates a set of outputs from a set of inputs. An MLP is characterized by
several layers of input nodes connected as a directed graph between the input
and output layers. MLP uses backpropogation for training the network. MLP is
a deep learning method.
MLP networks are used for supervised learning format. A typical learning
algorithm for MLP networks is also called back propagation's algorithm.
A multi-layer perceptron has one input layer and for each input, there is one
neuron(or node), it has one output layer with a single node for each output
and it can have any number of hidden layers and each hidden layer can have
any number of nodes. A schematic diagram of a Multi-Layer Perceptron
(MLP) is depicted below.

Downloaded by Gim Kadak (gimkadak@gmail.com)


lOMoARcPSD|29392272

25

In the multi-layer perceptron diagram above, we can see that there are three
inputs and thus three input nodes and the hidden layer has three nodes. The
output layer gives two outputs, therefore there are two output nodes. The
nodes in the input layer take input and forward it for further process, in the
diagram above the nodes in the input layer forwards their output to each of
the three nodes in the hidden layer, and in the same way, the hidden layer
processes the information and passes it to the output layer.
Every node in the multi-layer perception uses a sigmoid activation function.
The sigmoid activation function takes real values as input and converts them
to numbers between 0 and 1 using the sigmoid formula.
The MLP learning procedure is as follows:

 Starting with the input layer, propagate data forward to the output layer.
This step is the forward propagation.
 Based on the output, calculate the error (the difference between the
predicted and known outcome). The error needs to be minimized.
 Backpropagate the error. Find its derivative with respect to each weight in
the network, and update the model.
Repeat the three steps given above over multiple epochs to learn ideal
weights.Finally, the output is taken via a threshold function to obtain the
predicted class labels.

Downloaded by Gim Kadak (gimkadak@gmail.com)


lOMoARcPSD|29392272

26

UNIT 3

4 a) Explain the concept of different layers in Neural network. What do you


mean by the term convolution layer, pooling layer, loss layer, dense layer?
Describe each one in brief.

Ans) A neural network is made up of vertically stacked components


called Layers. Each dotted line in the image represents a layer. There are three
types of layers in a NN-

Input Layer– First is the input layer. This layer will accept the data and pass it
to the rest of the network.

Hidden Layer– The second type of layer is called the hidden layer. Hidden
layers are either one or more in number for a neural network. In the above
case, the number is 1. Hidden layers are the ones that are actually responsible
for the excellent performance and complexity of neural networks. They
perform multiple functions at the same time such as data transformation,
automatic feature creation, etc.

Output layer– The last type of layer is the output layer. The output layer holds
the result or the output of the problem. Raw images get passed to the input
layer and we receive output in the output layer.

Downloaded by Gim Kadak (gimkadak@gmail.com)


lOMoARcPSD|29392272

27

In this case, we are providing an image of a vehicle and this output layer will
provide an output whether it is an emergency or non-emergency vehicle, after
passing through the input and hidden layers of course.

Convolution layer :

It is the first layer to extract features from an input image. By learning image
features using a small square of input data, the convolutional layer preserves
the relationship between pixels. It is a mathematical operation which takes
two inputs such as image matrix and a kernel or filter.

The convolutional layer is considered an essential block of the CNN. In a CNN, it


is crucial to understand that the layers9 parameters and channel are comprised
of a set of learnable channels or neurons. These channels have a small
receptive field.

Pooling layer :

It plays an important role in pre-processing of an image. Pooling layer reduces


the number of parameters when the images are too large. Pooling is
"downscaling" of the image obtained from the previous layers. It can be
compared to shrinking an image to reduce its pixel density. Spatial pooling is
also called downsampling or subsampling, which reduces the dimensionality of

Downloaded by Gim Kadak (gimkadak@gmail.com)


lOMoARcPSD|29392272

28

each map but retains the important information. There are the following types
of spatial pooling:

Max Pooling

Max pooling is a sample-based discretization process. Its main objective is to


downscale an input representation, reducing its dimensionality and allowing
for the assumption to be made about features contained in the sub-region
binned.

Max pooling is done by applying a max filter to non-overlapping sub-regions of


the initial representation.

Average Pooling

Down-scaling will perform through average pooling by dividing the input into
rectangular pooling regions and computing the average values of each region.

Syntax

layer = averagePooling2dLayer(poolSize)
layer = averagePooling2dLayer(poolSize,Name,Value)
Sum Pooling

The sub-region for sum pooling or mean pooling are set exactly the same as
for max-pooling but instead of using the max function we use sum or mean.

Loss Layer : The Loss Function is one of the important components of Neural
Networks. Loss is nothing but a prediction error of Neural Net. And the
method to calculate the loss is called Loss Function. In simple words, the Loss is

Downloaded by Gim Kadak (gimkadak@gmail.com)


lOMoARcPSD|29392272

29

used to calculate the gradients. And gradients are used to update the weights
of the Neural Net.

Dense Layer: The dense layer is a neural network layer that is connected
deeply, which means each neuron in the dense layer receives input from all
neurons of its previous layer. The dense layer is found to be the most
commonly used layer in the models.

In the background, the dense layer performs a matrix-vector multiplication.


The values used in the matrix are actually parameters that can be trained and
updated with the help of backpropagation.

The output generated by the dense layer is an 8m9 dimensional vector. Thus,
dense layer is basically used for changing the dimensions of the vector. Dense
layers also applies operations like rotation, scaling, translation on the vector.

b) Explain the process of Sub-sampling of input data in neural network


model. Some of the features of Keras framework for implementing neural
networks models

Ans) Subsampling (Fig. 1.36) It is a method that reduces data size by


selecting a subset of the original data. The subset is specified by choosing a
parameter n, specifying that every nth data point is to be extracted. For
example, in structured datasets such as image data and structured grids,
selecting every nth point produces the results shown in Fig. 1.36. Subsampling
modifies the topology of a dataset. When points or cells are not selected, this
leaves a topological <hole.= Dataset topology must be modified to fill the hole.
In structured data, this is simply a uniform selection across the structured i-j-
k coordinates. In structured data, the hole must be filled in by using
triangulation or other complex tessellation schemes. Subsampling is not
typically performed on unstructured data because of its inherent complexity.

Downloaded by Gim Kadak (gimkadak@gmail.com)


lOMoARcPSD|29392272

30

In statistics, a subsample is a sample of a sample. In other words, a sample is


part of a population and a subsample is a part of a sample.

For example, let9s say you had a population of one million people, and you
used simple random sampling to get a sample of 1,000 people. You could use
simple random sampling again on the 1,000 people to get a smaller portion of
100 people.

Keras Frame Works:

Keras runs on top of open source machine libraries like TensorFlow, Theano or
Cognitive Toolkit (CNTK). Theano is a python library used for fast numerical
computation tasks. TensorFlow is the most famous symbolic math library used
for creating neural networks and deep learning models. TensorFlow is very
flexible and the primary benefit is distributed computing. CNTK is deep
learning framework developed by Microsoft. It uses libraries such as Python,
C#, C++ or standalone machine learning toolkits. Theano and TensorFlow are
very powerful libraries but difficult to understand for creating neural
networks.
Keras is based on minimal structure that provides a clean and easy way to
create deep learning models based on TensorFlow or Theano. Keras is
designed to quickly define deep learning models. Well, Keras is an optimal
choice for deep learning applications.
Features
Keras leverages various optimization techniques to make high level neural
network API easier and more performant. It supports the following features −
 Consistent, simple and extensible API.
 Minimal structure - easy to achieve the result without any frills.
 It supports multiple platforms and backends.

Downloaded by Gim Kadak (gimkadak@gmail.com)


lOMoARcPSD|29392272

31

 It is user friendly framework which runs on both CPU and GPU.


 Highly scalability of computation.
Benefits
Keras is highly powerful and dynamic framework and comes up with the
following advantages −
 Larger community support.
 Easy to test.
 Keras neural networks are written in Python which makes things
simpler.
 Keras supports both convolution and recurrent networks.
 Deep learning models are discrete components, so that, you can
combine into many ways.

Downloaded by Gim Kadak (gimkadak@gmail.com)


lOMoARcPSD|29392272

32

UNIT 4

5. a) What do you mean by Recurrent neural network? Explain with the


help of a diagram. In which cases this model is suitable.

Ans) Recurrent Neural Network(RNN) are a type of Neural Network where


the output from previous step are fed as input to the current step. In
traditional neural networks, all the inputs and outputs are independent of
each other, but in cases like when it is required to predict the next word of a
sentence, the previous words are required and hence there is a need to
remember the previous words. Thus RNN came into existence, which solved
this issue with the help of a Hidden Layer. The main and most important
feature of RNN is Hidden state, which remembers some information about a
sequence.

RNN have a <memory= which remembers all information about what has
been calculated. It uses the same parameters for each input as it performs
the same task on all the inputs or hidden layers to produce the output. This
reduces the complexity of parameters, unlike other neural networks.

How RNN works

The working of a RNN can be understood with the help of below example:
Example:
Suppose there is a deeper network with one input layer, three hidden layers
and one output layer. Then like other neural networks, each hidden layer will

Downloaded by Gim Kadak (gimkadak@gmail.com)


lOMoARcPSD|29392272

33

have its own set of weights and biases, let9s say, for hidden layer 1 the
weights and biases are (w1, b1), (w2, b2) for second hidden layer and (w3,
b3) for third hidden layer. This means that each of these layers are
independent of each other, i.e. they do not memorize the previous outputs.

Now the RNN will do the following:


 RNN converts the independent activations into dependent activations by
providing the same weights and biases to all the layers, thus reducing the
complexity of increasing parameters and memorizing each previous
outputs by giving each output as input to the next hidden layer.
 Hence these three layers can be joined together such that the weights
and bias of all the hidden layers is the same, into a single recurrent layer.

Downloaded by Gim Kadak (gimkadak@gmail.com)


lOMoARcPSD|29392272

34

b) Explain the Actor critic model. List down what are its advantages in
reinforcement learning.
Ans)Actor Critic Model :

1. The <Critic= estimates the value function. This could be the action-value
(the Q value) or state-value (the V value).
2. The <Actor= updates the policy distribution in the direction suggested by
the Critic (such as with policy gradients).
And both the Critic and Actor functions are parameterized with neural
networks.
Actor-Critics aim to take advantage of all the good stuff from both value-
based and policy-based while eliminating all their drawbacks. And how do
they do this?
The principal idea is to split the model in two: one for computing an action based
on a state and another one to produce the Q values of the action.
The actor takes as input the state and outputs the best action. It essentially
controls how the agent behaves by learning the optimal policy (policy-based).
The critic, on the other hand, evaluates the action by computing the value
function (value based). Those two models participate in a game where they
both get better in their own role as the time passes. The result is that the overall
architecture will learn to play the game more efficiently than the two methods
separately.
How Actor Critic works
Imagine you play a video game with a friend that provides you some feedback.
You9re the Actor and your friend is the Critic.

Figure: 4.11
At the beginning, you don9t know how to play, so you try some action randomly.
The Critic observes your action and provides feedback.
Learning from this feedback, you9ll update your policy and be better at
playing that game.
On the other hand, your friend (Critic) will also update their own way to provide
feedback so it can be better next time.

Downloaded by Gim Kadak (gimkadak@gmail.com)


lOMoARcPSD|29392272

35

6. a) Explain the concept of Reinforcement Learning and its framework in


details.

Ans) Reinforcement Learning : 2020 Q1

TensorFlow offers multiple levels of abstraction so you can choose the right
one for your needs. Build and train models by using the high-level Keras API,
which makes getting started with TensorFlow and machine learning easy.

TensorFlow is an open source machine learning framework for all developers.


It is used for implementing machine learning and deep learning applications.
To develop and research on fascinating ideas on artificial intelligence, Google
team created TensorFlow. TensorFlow is designed in Python programming
language, hence it is considered an easy to understand framework.

b) Describe how principle component analysis is carried out to reduce the


dimensionality of data sets.

ANS. Principal Component Analysis

Principal Component Analysis is an unsupervised learning algorithm that is


used for the dimensionality reduction in machine learning. It is a statistical
process that converts the observations of correlated features into a set of
linearly uncorrelated features with the help of orthogonal transformation.
These new transformed features are called the Principal Components. It is one
of the popular tools that is used for exploratory data analysis and predictive
modeling. It is a technique to draw strong patterns from the given dataset by
reducing the variances.

Principal Component Analysis generally tries to find the lower-dimensional


surface to project the high-dimensional data.

Principal Component Analysis works by considering the variance of each


attribute because the high attribute shows the good split between the classes,
and hence it reduces the dimensionality. Some real-world applications of PCA
are image processing, movie recommendation system, optimizing the power
allocation in various communication channels. It is a feature extraction
technique, so it contains the important variables and drops the least important
variable.

Downloaded by Gim Kadak (gimkadak@gmail.com)


lOMoARcPSD|29392272

36

Steps for Principal Component Analysis algorithm


1. Getting_the_dataset
Firstly, we need to take the input dataset and divide it into two subparts
X and Y, where X is the training set, and Y is the validation set.
2. Representing_data_into_a_structure
Now we will represent our dataset into a structure. Such as we will
represent the two-dimensional matrix of independent variable X. Here
each row corresponds to the data items, and the column corresponds to
the Features. The number of columns is the dimensions of the dataset.
3. Standardizing_the_data
In this step, we will standardize our dataset. Such as in a particular
column, the features with high variance are more important compared
to the features with lower variance.
If the importance of features is independent of the variance of the
feature, then we will divide each data item in a column with the
standard deviation of the column. Here we will name the matrix as Z.
4. Calculating-the-Covariance_of_Z
To calculate the covariance of Z, we will take the matrix Z, and will
transpose it. After transpose, we will multiply it by Z. The output matrix
will be the Covariance matrix of Z.
5. Calculating_the_Eigen_Values_and_Eigen_Vectors
Now we need to calculate the eigenvalues and eigenvectors for the
resultant covariance matrix Z. Eigenvectors or the covariance matrix are
the directions of the axes with high information. And the coefficients of
these eigenvectors are defined as the eigenvalues.
6. Sorting_the_Eigen_Vectors
In this step, we will take all the eigenvalues and will sort them in
decreasing order, which means from largest to smallest. And
simultaneously sort the eigenvectors accordingly in matrix P of
eigenvalues. The resultant matrix will be named as P*.
7. Calculating_the_new_features_Or_Principal_Components
Here we will calculate the new features. To do this, we will multiply the
P* matrix to the Z. In the resultant matrix Z*, each observation is the
linear combination of original features. Each column of the Z* matrix is
independent of each other.
8. Remove less or unimportant features from the new dataset.
The new feature set has occurred, so we will decide here what to keep
and what to remove. It means, we will only keep the relevant or

Downloaded by Gim Kadak (gimkadak@gmail.com)


lOMoARcPSD|29392272

37

important features in the new dataset, and unimportant features will be


removed out.

Applications of Principal Component Analysis


o PCA is mainly used as the dimensionality reduction technique in various
AI applications such as computer vision, image compression, etc.
o It can also be used for finding hidden patterns if data has high
dimensions. Some fields where PCA is used are Finance, data mining,
Psychology, etc.

Q7. a) Describe Q-learning in brief. What is SARSA algorithm? Explain this.

Ans) Q-Learning is a basic form of Reinforcement Learning which uses Q-


values (also called action values) to iteratively improve the behavior of the
learning agent.

Q learning is a value-based method of supplying information to inform which


action an agent should take.

Let9s understand this method by the following example:

 There are five rooms in a building which are connected by doors.


 Each room is numbered 0 to 4
 The outside of the building can be one big outside area (5)
 Doors number 1 and 4 lead into the building from room 5

Downloaded by Gim Kadak (gimkadak@gmail.com)


lOMoARcPSD|29392272

38

Next, you need to associate a reward value to each door:

 Doors which lead directly to the goal have a reward of 100


 Doors which is not directly connected to the target room gives zero
reward
 As doors are two-way, and two arrows are assigned for each room
 Every arrow in the above image contains an instant reward value

Explanation:

In this image, you can view that room represents a state

Agent9s movement from one room to another represents an action

In the below-given image, a state is described as a node, while the arrows


show the action.

For example, an agent traverse from room number 2 to 5

 Initial state = state 2


 State 2-> state 3
 State 3 -> state (2,1,4)
 State 4-> state (0,5,3)
 State 1-> state (5,3)
 State 0-> state 4

Downloaded by Gim Kadak (gimkadak@gmail.com)


lOMoARcPSD|29392272

39

b) Explain the difference between value iteration and policy iteration.


what is markov decision process?

Ans) Markov decision processes (mdps) model decision making in discrete,


stochastic, sequential environments. The essence of the model is that a
decision maker, or agent, inhabits an environment, which changes state
randomly in response to action choices made by the decision maker. The
state of the environment affects the immediate reward obtained by the
agent, as well as the probabilities of future state transitions. The agent's
objective is to select actions to maximize a long-term measure of total
reward. Efficient algorithms for mdps based on dynamic programming and
linear programming, and more recently on compact representations, enable
large planning problems from artificial intelligence, operations research,
economics, robotics, and the behavioral sciences to be modeled and solved.

Q 8 A) What is NLP

Ans) NLP stands for Natural Language Processing, which is a part of Computer
Science, Human language, and Artificial Intelligence. It is the technology that
is used by machines to understand, analyse, manipulate, and interpret
human's languages. It helps developers to organize knowledge for performing
tasks such as translation, automatic summarization, Named Entity
Recognition (NER), speech recognition, relationship extraction, and topic
segmentation.

How does natural language processing work?


NLP enables computers to understand natural language as humans do.
Whether the language is spoken or written, natural language processing uses
artificial intelligence to take real-world input, process it, and make sense of it

Downloaded by Gim Kadak (gimkadak@gmail.com)


lOMoARcPSD|29392272

40

in a way a computer can understand. Just as humans have different sensors --


such as ears to hear and eyes to see -- computers have programs to read and
microphones to collect audio. And just as humans have a brain to process that
input, computers have a program to process their respective inputs. At some
point in processing, the input is converted to code that the computer can
understand.

There are two main phases to natural language processing: data


preprocessing and algorithm development.

b) Application of Machine learning in computer vision?

Ans) Machine learning is a buzzword for today's technology, and it is growing


very rapidly day by day. We are using machine learning in our daily life even
without knowing it such as Google Maps, Google assistant, Alexa, etc. Below
are some most trending real-world applications of Machine Learning:

1. Image Recognition:

Image recognition is one of the most common applications of machine


learning. It is used to identify objects, persons, places, digital images, etc.

2. Speech Recognition

While using Google, we get an option of "Search by voice," it comes under


speech recognition, and it's a popular application of machine learning.

3. Traffic prediction:

If we want to visit a new place, we take help of Google Maps, which shows us
the correct path with the shortest route and predicts the traffic conditions.

4. Product recommendations:

Machine learning is widely used by various e-commerce and entertainment


companies such as Amazon, Netflix, etc., for product recommendation to the
user.

Downloaded by Gim Kadak (gimkadak@gmail.com)


lOMoARcPSD|29392272

41

5. Self-driving cars:

One of the most exciting applications of machine learning is self-driving cars.

Tesla, the most popular car manufacturing company is working on self-driving


car.

7. Virtual Personal Assistant:

We have various virtual personal assistants such as Google


assistant, Alexa, Cortana, Siri.

9. Stock Market trading:

Machine learning is widely used in stock market trading. In the stock market,
there is always a risk of up and downs in shares, so for this machine
learning's long short term memory neural network is used for the prediction
of stock market trends.

iii) Bayesian networks

Ans) "A Bayesian network is a probabilistic graphical model which represents a


set of variables and their conditional dependencies using a directed acyclic
graph."

It is also called a Bayes network, belief network, decision network,


or Bayesian model.

Bayesian networks are probabilistic, because these networks are built from
a probability distribution, and also use probability theory for prediction and
anomaly detection.

Real world applications are probabilistic in nature, and to represent the


relationship between multiple events, we need a Bayesian network. It can also
be used in various tasks including prediction, anomaly detection, diagnostics,
automated insight, reasoning, time series prediction, and decision making
under uncertainty.

A Bayesian network graph is made up of nodes and Arcs (directed links),


where:

Downloaded by Gim Kadak (gimkadak@gmail.com)


lOMoARcPSD|29392272

42

o Each node corresponds to the random variables, and a variable can


be continuous or discrete.
o Arc or directed arrows represent the causal relationship or conditional
probabilities between random variables. These directed links or arrows
connect the pair of nodes in the graph.
These links represent that one node directly influence the other node,
and if there is no directed link that means that nodes are independent
with each other
o In the above diagram, A, B, C, and D are random variables
represented by the nodes of the network graph.
o If we are considering node B, which is connected with node A by
a directed arrow, then node A is called the parent of Node B.
o Node C is independent of node A.

8 b) Explain the concept and role of Support vactor machine in details and
also describe its Application Area

Ans) Answer is on Down

UNIT 5 a .

Downloaded by Gim Kadak (gimkadak@gmail.com)


lOMoARcPSD|29392272

43

UNIT 1

Questions QB

Q What Is Machine Learning? Explain Application Of Ml& Tools.

Q. Explain Scope of Machine Learning?

Q. Explain Regration and Types of Regration?

Q) What is hypothesis function??

Q. Short Note On

1. Data Augmentation.

2. Data Normalization.

3. Data Preprocessing **

Q Explain in details about multiple hypothesis testing RGPV may 2019 2018

Ans) What is Hypothesis Testing?


Any data science project starts with exploring the data. When we perform an
analysis on a sample through exploratory data analysis and inferential statistics
we get information about the sample. Now, we want to use this information to
predict values for the entire population.

Through hypothesis testing, we can determine whether we have enough


statistical evidence to conclude if the hypothesis about the population is true
or not.

Mutiple Hypothesis : The multiple hypothesis testing problem occurs when a


number of individual hypothesis tests are considered simultaneously. In this
case, the significance or the error rate of individual tests no longer represents
the error rate of the combined set of tests. Multiple hypothesis testing
methods correct error rates for this issue.

Downloaded by Gim Kadak (gimkadak@gmail.com)


lOMoARcPSD|29392272

44

Characteristics

In conventional hypothesis testing, the level of significance or type I error


rate (the probability of wrongly rejecting the null hypothesis) for a single test
is less than the probability of making an error on at least one test in a
multiple hypothesis testing situation. While this is typically not an issue when
testing a small number of preplanned hypotheses, the likelihood of making
false discoveries is greatly increased when there are large numbers of
unplanned or exploratory tests conducted based on the significance level or
type I error rate from a single test.

Q)Different type of machine learning?

Ans) Supervised , unsupervised and Refronment..

QWhat do you mean by term normal distribution? RGPV may 2019

Ans) A normal distribution is an arrangement of a data set in which most


values cluster in the middle of the range and the rest taper off symmetrically
toward either extreme. Height is one simple example of something that
follows a normal distribution pattern: Most people are of average height the
numbers of people that are taller and shorter than average are fairly equal
and a very small (and still roughly equivalent) number of people are either
extremely tall or extremely short.Here's an example of a normal distribution
curve:

A graphical representation of a normal distribution is sometimes called a bell


curve because of its flared shape. The precise shape can vary according to the
distribution of the population but the peak is always in the middle and the
curve is always symmetrical. In a normal distribution the mean mode and
median are all the same.

Downloaded by Gim Kadak (gimkadak@gmail.com)


lOMoARcPSD|29392272

45

The normal distribution is the most common type of distribution assumed in


technical stock market analysis and in other types of statistical analyses. The
assumption of a normal distribution is applied to asset prices as well as price
action. Traders may plot price points over time to fit recent price action into a
normal distribution. The graphs are commonly used in mathematics, statistics
and corporate data analytics.

Q ) Define the following ? RGPV june 2016

i) Probability Function. : Probality Density Function


ii) Probality Mass function.

Q)What is Data Normalization?

Ans) Normalization is one of the most frequently used data preparation


techniques, which helps us to change the values of numeric columns in the
dataset to use a common scale.

Normalization is a technique often applied as part of data preparation for


machine learning. The goal of normalization is to change the values of numeric
columns in the dataset to a common scale, without distorting differences in the
ranges of values. For machine learning, every dataset does not require
normalization. It is required only when features have different ranges.

For example, consider a data set containing two features, age(x1), and
income(x2). Where age ranges from 0–100, while income ranges from 0–20,000
and higher. Income is about 1,000 times larger than age and ranges from
20,000–500,000. So, these two features are in very different ranges. When we
do further analysis, like multivariate linear regression, for example, the
attributed income will intrinsically influence the result more due to its larger
value. But this doesn9t necessarily mean .

Downloaded by Gim Kadak (gimkadak@gmail.com)


lOMoARcPSD|29392272

46

Q) Explain the term statistical analysis of data ? RGPV may 2019

Ans) Statistics is basically a science that involves data collection, data


interpretation and finally, data validation. Statistical data analysis is a
procedure of performing various statistical operations. It is a kind of
quantitative research, which seeks to quantify the data, and typically, applies
some form of statistical analysis. Quantitative data basically involves
descriptive data, such as survey data and observational data.

Statistical data analysis generally involves some form of statistical tools, which
a layman cannot perform without having any statistical knowledge. There are
various software packages to perform statistical data analysis. This software
includes Statistical Analysis System (SAS), Statistical Package for the Social
Sciences (SPSS), Stat soft, etc.

Data in statistical data analysis consists of variable(s). Sometimes the data is


univariate or multivariate. Depending upon the number of variables, the
researcher performs different statistical techniques.

In the context of business applications, it is a very crucial technique for


business intelligence organizations that need to operate with large data
volumes.

The basic goal of statistical data analysis is to identify trends, for example, in
the retailing business, this method can be approached to uncover patterns in
unstructured and semi-structured consumer data that can be used for making
more powerful decisions for enhancing customer experience and progressing
sales.
The data in statistical data analysis is basically of 2 types, namely, continuous
data and discreet data. The continuous data is the one that cannot be counted.
For example, intensity of a light can be measured but cannot be counted. The
discreet data is the one that can be counted. For example, the number of bulbs
can be counted.

Downloaded by Gim Kadak (gimkadak@gmail.com)


lOMoARcPSD|29392272

47

The continuous data in statistical data analysis is distributed under continuous


distribution function, which can also be called the probability density function,
or simply pdf.

The discreet data in statistical data analysis is distributed under discreet


distribution function, which can also be called the probability mass function or
simple pmf.

Q) What do you mean by probality density function? RGPV may 2019

Ans) In probability theory, a probability density function (PDF) is used to


define the random variable9s probability coming within a distinct range of
values, as opposed to taking on any one value. The function explains the
probability density function of normal distribution and how mean and
deviation exists. The standard normal distribution is used to create a database
or statistics, often used in science to represent the real-valued variables whose
distribution is unknown.

What is the Probability Density Function?


The Probability Density Function(PDF) defines the probability function
representing the density of a continuous random variable lying between a
specific range of values. In other words, the probability density function
produces the likelihood of values of the continuous random
variable. Sometimes it is also called a probability distribution function or just a
probability function. However, this function is stated in many other sources as
the function over a broad set of values. Often it is referred to as cumulative
distribution function or sometimes as probability mass function(PMF).
However, the actual truth is PDF (probability density function ) is defined for
continuous random variables, whereas PMF (probability mass function) is
defined for discrete random variables.

Downloaded by Gim Kadak (gimkadak@gmail.com)


lOMoARcPSD|29392272

48

UNIT 2

Q) Explain the neural network architecture? RGPV june 2014

Ans) QB Page No 87

Q) What is Activation Function **

Q) . Explain In Detail Back Propagation. **

Q) . Explain About Multilayer Neural Network?

Q) . What Do You Understand By Batch Normalization?

Ans) Before entering into Batch normalization let9s understand the term
<Normalization=.

Normalization is a data pre-processing tool used to bring the numerical data to


a common scale without distorting its shape.

Generally, when we input the data to a machine or deep learning algorithm we


tend to change the values to a balanced scale. The reason we normalize is
partly to ensure that our model can generalize appropriately.

Now coming back to Batch normalization, it is a process to make neural


networks faster and more stable through adding extra layers in a deep neural
network. The new layer performs the standardizing and normalizing operations
on the input of a layer coming from a previous layer.

But what is the reason behind the term <Batch= in batch normalization? A
typical neural network is trained using a collected set of input data
called batch. Similarly, the normalizing process in batch normalization takes
place in batches, not as a single input.

Downloaded by Gim Kadak (gimkadak@gmail.com)


lOMoARcPSD|29392272

49

UNIT 3

Q) . Explain Convolutional Neural Network.**

Q) What are different types of neural network?**

Q) What is Multilayer Precptron? **

Q) What is SubSampling?

Q) . What Is Fully Connected Layer/ Discuss Its Limitation.

Ans) Fully Connected layers in a neural networks are those layers where all the
inputs from one layer are connected to every activation unit of the next layer.
In most popular machine learning models, the last few layers are full
connected layers which compiles the data extracted by previous layers to form
the final output. It is the second most time consuming layer second to
Convolution Layer.
The diagram below clarifies the statement.

In the above model:

 The first/input layer has 3 feature units and there are 4 activation units in the
next hidden layer.
 The 1's in each layer are bias units.
 a01, a02 and a03 are input values to the neural network.They are basically
features of the training example.
 The 4 activation units of first hidden layer is connected to all 3 activation units
of second hidden layer The weights/parameters connect the two layers.
Learning Fully Connected Networks with BackpropagationThe first version of a
fully connected neural network was the Perceptron.

Downloaded by Gim Kadak (gimkadak@gmail.com)


lOMoARcPSD|29392272

50

UNIT 4
Q)What is Recurrent Neural Network ? What is Actor critic model?**

Q. . Explain The Term BLEU Score. – QB pageno 129

Q. Write Brief Note On Attention Based Model.

Q) What is Qlearning and SARSA algorithm?

Q. Discuss About The Attentional Model.

Ans) What are Attention Models?

Attention models, or attention mechanisms, are input processing techniques


for neural networks that allows the network to focus on specific aspects of a
complex input, one at a time until the entire dataset is categorized. The goal is
to break down complicated tasks into smaller areas of attention that are
processed sequentially. Similar to how the human mind solves a new problem
by dividing it into simpler tasks and solving them one by one.

Attention models require continuous reinforcement or backpopagation


training to be effective.

The aim of attention models is to reduce larger, more complicated tasks into
smaller, more manageable areas of attention to understand and process
sequentially. The models work within neural networks, which are a type of
network model with a similar structure and processing methods as the human
brain for simplifying and processing information. Using attention models
enables the network to focus on a few particular aspects at a time and ignoring
the rest. This allows for efficient and sequential data processing, especially
when the network needs to categorize entire datasets.

How do Attention Models work?

Attention models involve focusing on the most important components while


perceiving some of the additional information. This is similar to the visual
attention mechanism that the human brain uses. For example, the human
brain may initially focus on a particular aspect image with a higher resolution
focus and view the surrounding areas with a lower resolution. However, as the
brain begins to understand the image, it adjusts the focal point to understand
all aspects thoroughly.

Downloaded by Gim Kadak (gimkadak@gmail.com)


lOMoARcPSD|29392272

51

Attention models evaluate inputs to identify the most critical components and
assign each of them with a weight. For example, if using an attention model to
translate a sentence from one language to another, the model would select
the most important words and assign them a higher weight. Similarly, it assigns
the less significant words a lower value. This helps achieve a more accurate
output prediction.

Application of Attention Model – QB 134 PageNO

Downloaded by Gim Kadak (gimkadak@gmail.com)


lOMoARcPSD|29392272

52

UNIT 5

Q) Discuss Key idea of the support vector machine? RGPV 2018 2020

Ans) Support Vector Machine or SVM is one of the most popular Supervised
Learning algorithms, which is used for Classification as well as Regression
problems. However, primarily, it is used for Classification problems in Machine
Learning.

The goal of the SVM algorithm is to create the best line or decision boundary
that can segregate n-dimensional space into classes so that we can easily put
the new data point in the correct category in the future. This best decision
boundary is called a hyperplane.

SVM chooses the extreme points/vectors that help in creating the hyperplane.
These extreme cases are called as support vectors, and hence algorithm is
termed as Support Vector Machine. Consider the below diagram in which there
are two different categories that are classified using a decision boundary or
hyperplane:

Example: SVM can be understood with the example that we have used in the
KNN classifier. Suppose we see a strange cat that also has some features of
dogs, so if we want a model that can accurately identify whether it is a cat or
dog, so such a model can be created by using the SVM algorithm. We will first
train our model with lots of images of cats and dogs so that it can learn about
different features of cats and dogs, and then we test it with this strange
creature. So as support vector creates a decision boundary between these two
data (cat and dog) and choose extreme cases (support vectors), it will see the
extreme case of cat and dog.

Downloaded by Gim Kadak (gimkadak@gmail.com)


lOMoARcPSD|29392272

53

How does it work?

 Identify the right hyper-plane (Scenario-1): Here, we have three hyper-planes


(A, B, and C). Now, identify the right hyper-plane to classify stars and circles.

You need to remember a thumb rule to identify the right hyper-plane: <Select
the hyper-plane which segregates the two classes better=. In this scenario,
hyper-plane <B= has excellently performed this job.

 Identify the right hyper-plane (Scenario-2): Here, we have three hyper-


planes (A, B, and C) and all are segregating the classes well. Now, How
can we identify the right hyper-plane?

Here, maximizing the distances between nearest data point (either class) and
hyper-plane will help us to decide the right hyper-plane. This distance is called
as Margin.Let9s_look_at_the_below:

Above, you can see that the margin for hyper-plane C is high as compared to
both A and B. Hence, we name the right hyper-plane as C. Another reason for
selecting the hyper-plane with higher margin is robustness. If we select a
hyper-plane having low margin then there is high chance of miss-classification.

Downloaded by Gim Kadak (gimkadak@gmail.com)


lOMoARcPSD|29392272

54

Q) Explain the Bays classifier with example? RGPV 2019

Ans)

Downloaded by Gim Kadak (gimkadak@gmail.com)

You might also like