Machine Learning

💜
MACHINE LEARNING
1. Define machine learning. Which are different applications of ML? what
is difference between traditional programming and ML?
💡 Machine learning is an application of AI that enables systems to learn

and improve from experience without being explicitly programmed.
Machine learning focuses on developing computer programs that can
access data and use it to learn for themselves.
applications of ML
https://www.javatpoint.com/applications-of-machine-learning
difference between traditional programming and ML
https://www.enjoyalgorithms.com/blog/introduction-to-machine-learning
2. Which are different methods of learning? Give one example of each

method.
https://www.javatpoint.com/machine-learning-techniques
3. Compare classification and regression.
https://www.javatpoint.com/regression-vs-classification-in-machine-learning
4. Define the terms variance and bias. Explain trade-off between variance
and bias?
bias
difference between the average prediction of our model and the correct
value which we are trying to predict.
high bias ⇒ over simplified data

Variance
MACHINE LEARNING 1
variability of model prediction for a given data point or a value which tells
us spread of our data
high variance ⇒ no generalization of data

mathematically
💡 formula
Bias and variance using bulls-eye diagram
Bias Variance Tradeoff
MACHINE LEARNING 2
high bias and low variance ⇒ simple model with few parameters
high variance and low bias ⇒ overfitting model with large number of
parameters
good balance between the two cases ⇒ tradeoff

5. Describe linear regression and non-linear regression.
linear regression
https://www.javatpoint.com/linear-regression-in-machine-learning
non-linear regression
What is Nonlinear Regression
💡 a regression analysis where the regression model portrays a

nonlinear relationship between a dependent variable and
independent variables.
experimental data are mapped to a model
mathematical function representing variables in a nonlinear relationship

is formed and optimized
flexible
no assumption of data linearity
accommodates diverse types of curves
parametric or non-parametric
accommodate multiple response variables
Example
non linear relationship between gold and US CPI inflation and
currency depreciation in many countries
gold price ⇒ dependent variable

inflation ⇒ independent variable
result ⇒ inflation impacts the gold price
MACHINE LEARNING 3
gold prices are affected the most by inflation
gold prices can control inflation instability too
Application
forestry research ⇒
power function to relate tree volume or weight in
relation to its diameter or height
chemistry ⇒ wide-range colorless gas

research & development ⇒ formulation of the problem and deriving
statistical solutions
insurance ⇒ computation of IBNR reserves

agriculture ⇒ crops and soil processes
6. Explain multivariate regression. Write down steps of multivariate

regression. What are advantages and disadvantages of multivariate
regression?
multivariate regression
💡 a technique that estimates a single regression model with more than

one outcome variable. When there is more than one predictor variable
in a multivariate regression model, the model is a multivariate multiple
regression.
steps of multivariate regression

Step 1: Select the features
select that one feature
responsible for the change in your dependent variable
Step 2: Normalize the feature
scale them in a certain range
Step 3: Select loss function and formulate a hypothesis
means a predicted value of the response variable
loss function ⇒ calculated loss when wrong value is predicted
MACHINE LEARNING 4
cost function ⇒ cost for these wrong predictions
Step 4: Minimize the cost and loss function
dependent on each other
use minimization algorithms
Step 5: Test the hypothesis
test set is used to check the accuracy and correctness
Advantages
helps you find a relationship between multiple variables
defines the correlation between variables.
Disadvantages
requires high-level mathematical calculations
complex
output is difficult to analyze
loss uses errors in output
good for large datasets
7. What are MSE and RMSE?

https://www.i2tutorials.com/differences-between-mse-and-rmse/
8. Compare linear regression and logistic regression.
https://www.geeksforgeeks.org/ml-linear-regression-vs-logistic-regression/
9. What is VIF? How do you calculate it?
https://www.investopedia.com/terms/v/variance-inflation-factor.asp
10. What is Gradient descent?
What is gradient descent?
MACHINE LEARNING 5
💡 optimization algorithm which is commonly-used to train machine
learning models and neural networks
How does gradient descent work?
based on convex function
starting point = arbitrary point
find slope from starting point
steepness measure ⇒ tangent line

slope⇒ inform updates to parameters
goal ⇒ minimize the cost function
Learning rate
size of the steps that are taken to reach the minimum
high learning rate⇒ larger steps and overshooting minimum

low learning rate ⇒ small step sizes
cost function
measures the error
improves ML model’s efficacy to adjust the error
iterations till cost function is close to 0
MACHINE LEARNING 6
calculates avg error for the entire training set
Types of Gradient Descent

Batch gradient descent
sums the error for each point in a training set, updating the model only
after all training examples have been evaluated
long processing time
a stable error gradient and convergence
Stochastic gradient descent
runs a training epoch for each example within the dataset and it updates
each training example's parameters one at a time
more detailed and speed
loss in computational efficiency
may result in noisy gradients
Mini-batch gradient descent
combination of both
splits the training dataset into small batch sizes and performs updates
on each of those batches.
balance between efficiency and speed
Challenges with gradient descent

1. Local minima and saddle points
a. nonconvex problems ⇒ struggle to find the global minimum

b. Local minima mimic the shape of a global minimum where the slope
of the cost function increases on either side of the current point.
c. with saddle points, the negative gradient only exists on one side of
the point, reaching a local maximum on one side and a local
minimum on the other.
2. Vanishing and Exploding Gradients
a. Vanishing gradients:
i. occurs when the gradient is too small
MACHINE LEARNING 7
ii. gradient continues to become smaller
iii. results in slow learning
b. Exploding gradients:
i. gradient is too large
ii. model weights will grow too large
iii. unstable model
iv. solution ⇒ leverage a dimensionality reduction

11. What are the disadvantages of linear regression?
https://www.geeksforgeeks.org/ml-advantages-and-disadvantages-of-linear-
regression/
12. What is overfitting? What is the use of regularization?
💡 common link for both - https://www.geeksforgeeks.org/underfitting-

and-overfitting-in-machine-learning/
💡 Overfitting
is a modeling error that occurs when a function or model is too closely
fit the training set and getting a drastic difference of fitting in test set.
If our model does much better on the training set than on the test set,
then we’re likely overfitting.
How to prevent Overfitting?
1. Training with more data
2. Data Augmentation
3. Cross-Validation
4. Feature Selection
5. Regularization
regularization
MACHINE LEARNING 8
https://www.javatpoint.com/regularization-in-machine-learning
13. Explain SVM algorithm for classification.
https://www.javatpoint.com/machine-learning-support-vector-machine-algorithm
14. What is Linear discriminant analysis and PCA?
💡 youtube link https://youtu.be/azXCzI57Yfc
LDA
focuses on maximizing separability among known categories
e.g., gene analysis for cancer drug.
creation of new axis ⇒ maximize distance between two means for 2

categories
minimize the variations
more than 2 dimensions ⇒ same procedure for creating graph
💡 https://www.geeksforgeeks.org/ml-linear-discriminant-analysis/
PCA [principal component analysis]
💡 youtube - https://youtu.be/83x5X66uWK0
overfitting problem resolution
reducing dimensionality ⇒ purpose

find principal components ⇒ find views
no. of principal components <= no. of attributes
high priority ⇒ PC1

orthogonal property ⇒ PCs must be independent of each other
MACHINE LEARNING 9
https://www.simplilearn.com/tutorials/machine-learning-tutorial/principal-
component-analysis#:~:text=The Principal Component Analysis
is,plotting in 2D and 3D.
15. Why is LDA important?
💡 find the answer on youtube
16. Write names of different dimensionality reduction methods? Explain

any one method.
https://www.upgrad.com/blog/top-dimensionality-reduction-techniques-for-
machine-learning/
17.Compare between single layer perceptron and multi layer perceptron.
https://www.i2tutorials.com/what-is-single-layer-perceptron-and-difference-
between-single-layer-vs-multilayer-perceptron/
18. How does Gradient descent help in minimizing the cost function?
https://towardsdatascience.com/minimizing-the-cost-function-gradient-descent-
a5dd6b5350e1
19. Write Back propagation algorithm.
https://towardsdatascience.com/understanding-backpropagation-algorithm-
7bb3aa2f95fd
20. Describe MLE and MAP.
MLE
https://analyticsindiamag.com/how-is-maximum-likelihood-estimation-
used-in-machine-learning/#:~:text=By Sourabh Mehta-,Maximum
Likelihood Estimation (MLE) is a probabilistic based approach to,panel
data and discrete data.
MAP
💡 https://youtu.be/TSMJ-QRnk54
MACHINE LEARNING 10
https://towardsdatascience.com/what-is-map-understanding-the-statistic-
of-choice-for-comparing-object-detection-models-1ea4f67a9dbd
21. Write down the applications of ANN.
https://www.geeksforgeeks.org/artificial-neural-networks-and-its-applications/
22. Define learning rate in neural network. How to choose learning rate for
optimization problem?
💡 The learning rate, denoted by the symbol α, is a hyper-parameter

used to govern the pace at which an algorithm updates or learns the
values of a parameter estimate.
https://towardsdatascience.com/learning-rate-a6e7b84f1658
23. Define the terms Training, Activation function, Weights and loss
function in ANN.
Training
💡 A machine learning training model is a process in which a machine

learning (ML) algorithm is fed with sufficient training data to learn from.
Activation function
💡 https://www.geeksforgeeks.org/activation-functions-neural-networks/
Weights
MACHINE LEARNING 11
💡 Weight is the parameter within a neural network that transforms input
data within the network's hidden layers. A neural network is a series of
nodes, or neurons. Within each node is a set of inputs, weight, and a
bias value
Loss function
💡 https://www.geeksforgeeks.org/ml-common-loss-functions/
24. Explain feed forward neural network.

https://www.turing.com/kb/mathematical-formulation-of-feed-forward-neural-
network
25. What is activation function in ANN? Describe the sigmoid activation

function and Tanh activation function used in ANN.
💡 https://www.geeksforgeeks.org/activation-functions-neural-networks/
26. How does gradient descent help in minimizing the cost function?
https://towardsdatascience.com/machine-leaning-cost-function-and-gradient-
descend-75821535b2ef
27. How does the decision tree algorithm works? Give one example.
https://www.geeksforgeeks.org/decision-tree-introduction-example/
28. Which are the attribute selection measures in decision tree? Explain.
MACHINE LEARNING 12
https://www.kdnuggets.com/2020/01/decision-tree-algorithm-explained.html
29. What is mean by pruning? Which are different techniques used for
pruning?
https://www.kdnuggets.com/2022/09/decision-tree-pruning-hows-whys.html
https://analyticsindiamag.com/what-is-pruning-in-tree-based-ml-models-and-
why-is-it-done/
30. Which are the advantages and disadvantages of decision tree?
https://www.jigsawacademy.com/blogs/data-science/decision-tree-in-
machine-learning/
31. Define the terms overfitting, underfitting, regularization.
overfitting & underfitting:
https://www.javatpoint.com/overfitting-and-underfitting-in-machine-
learning#:~:text=Overfitting occurs when our machine,and accuracy of
the model.
regularization
https://www.javatpoint.com/regularization-in-machine-learning
32. Which are different cross validation methods? Explain two cross
validation methods.
https://www.geeksforgeeks.org/cross-validation-machine-learning/
33. What is Bootstrapping? Which steps are used in bootstrapping?

Explain parametric and non parametric bootstrapping with example.
https://analyticssteps.com/blogs/bootstrapping-method-types-working-and-
applications
parametric bootstrap example -
model the uncertainty about the population mean using parametric

bootstrapping.
https://www.vosesoftware.com/riskwiki/TheparametricBootstrap.php
non parametric bootstrap example -
MACHINE LEARNING 13
To estimate the uncertainty about the population standard deviation
using non-parametric bootstrap,
https://www.vosesoftware.com/riskwiki/ThenonparametricBootstrap.php
34. Explain different ensemble learning techniques.
https://www.analyticsvidhya.com/blog/2018/06/comprehensive-guide-for-
ensemble-models/
35. What are the advantages and disadvantages of random forest learning
algorithm?
💡 read advantages and disadvantages only
https://www.mygreatlearning.com/blog/random-forest-algorithm/
36. Write an algorithm for partition clustering and hierarchical clustering.

Mention example of each method.
k means clustering
💡 K-means clustering can be used in almost every domain, ranging

from banking to recommendation engines, cyber security,
document clustering to image segmentation. It is typically
applied to data that has a smaller number of dimensions, is
numeric, and is continuous.
MACHINE LEARNING 14
💡 YouTube - https://youtu.be/CLKW6uWJtTc
💡 for algorithm - https://www.tutorialspoint.com/what-are-

the-types-of-the-partitional-algorithm
💡 for flowchart and info -

https://www.geeksforgeeks.org/partitioning-method-k-
mean-in-data-mining/
hierarchical clustering
https://www.geeksforgeeks.org/hierarchical-clustering-in-data-mining/
💡 YouTube - https://youtu.be/7enWesSofhg
37. Write following algorithms
Birch algorithm
https://www.javatpoint.com/birch-in-data-mining
HMM algorithm
https://www.jigsawacademy.com/blogs/data-science/hidden-markov-
model
CURE algorithm
https://www.geeksforgeeks.org/basic-understanding-of-cure-algorithm/
38. Let’s say you are building a model that detects whether a person
has diabetes or not. After the train-test split, you got a test set of length 100,
out of which 70 data points are labelled positive (1), and 30 data points are
labelled negative (0). Draw confusion matrix based on the given data.
Calculate True positive rate, True negative rate, False positive rate and False
negative rate.
MACHINE LEARNING 15
https://www.kdnuggets.com/2020/09/performance-machine-learning-
model.html
39. Design a system for human activity recognition.
https://www.geeksforgeeks.org/human-activity-recognition-using-deep-
learning-model/
40. What is reinforcement learning? Explain working of reinforcement

learning. Write an algorithm for reinforcement learning.
what is reinforcement learning
💡 Reinforcement learning is an area of Machine Learning. It is about

taking suitable action to maximize reward in a particular situation. It is
employed by various software and machines to find the best possible
behavior or path it should take in a specific situation.
working of reinforcement learning
https://www.synopsys.com/ai/what-is-reinforcement-
learning.html#:~:text=How Does Reinforcement Learning
Work,maximization of expected cumulative reward.
algorithm for reinforcement learning
https://www.guru99.com/reinforcement-learning-
tutorial.html#reinforcement-learning-algorithms
💡 check this out on youtube
41. Write working of expectation maximization (EM) algorithm. What is

convergence in the EM algorithm? What are advantages and disadvantages
of EM?
algorithm - https://www.geeksforgeeks.org/ml-expectation-maximization-
algorithm
convergence - https://arxiv.org/pdf/1611.00519.pdf
MACHINE LEARNING 16
42. Write an algorithm for GMM.
https://towardsdatascience.com/gaussian-mixture-modelling-gmm-
833c88587c7f
43. What are ensemble methods? Which are different types of ensemble
methods?
https://towardsdatascience.com/ensemble-methods-in-machine-learning-
what-are-they-and-why-use-them-68ec3f9fef5f
44. Design a neural network to solve XOR problem.
https://towardsdatascience.com/how-neural-networks-solve-the-xor-problem-
59763136bdd7
MACHINE LEARNING 17

Machine Learning

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Machine Learning

Uploaded by

Copyright:

Available Formats

💜

💡 Machine learning is an application of AI that enables systems to learn

difference between traditional programming and ML

2. Which are different methods of learning? Give one example of each

3. Compare classification and regression.

high bias ⇒ over simplified data

high variance ⇒ no generalization of data

Bias and variance using bulls-eye diagram

Bias Variance Tradeoff

good balance between the two cases ⇒ tradeoff

What is Nonlinear Regression

💡 a regression analysis where the regression model portrays a

experimental data are mapped to a model

mathematical function representing variables in a nonlinear relationship

no assumption of data linearity

accommodates diverse types of curves

accommodate multiple response variables

gold price ⇒ dependent variable

result ⇒ inflation impacts the gold price

gold prices can control inflation instability too

chemistry ⇒ wide-range colorless gas

insurance ⇒ computation of IBNR reserves

6. Explain multivariate regression. Write down steps of multivariate

💡 a technique that estimates a single regression model with more than

steps of multivariate regression

select that one feature

responsible for the change in your dependent variable

Step 2: Normalize the feature

scale them in a certain range

Step 3: Select loss function and formulate a hypothesis

means a predicted value of the response variable

loss function ⇒ calculated loss when wrong value is predicted

dependent on each other

use minimization algorithms

Step 5: Test the hypothesis

test set is used to check the accuracy and correctness

defines the correlation between variables.

output is difficult to analyze

loss uses errors in output

good for large datasets

7. What are MSE and RMSE?

8. Compare linear regression and logistic regression.

9. What is VIF? How do you calculate it?

10. What is Gradient descent?

What is gradient descent?

How does gradient descent work?

based on convex function

starting point = arbitrary point

find slope from starting point

steepness measure ⇒ tangent line

size of the steps that are taken to reach the minimum

high learning rate⇒ larger steps and overshooting minimum

measures the error

improves ML model’s efficacy to adjust the error

iterations till cost function is close to 0

Types of Gradient Descent

long processing time

a stable error gradient and convergence

Stochastic gradient descent

more detailed and speed

loss in computational efficiency

may result in noisy gradients

Mini-batch gradient descent

balance between efficiency and speed